Mar 24, 2023 13:00:00

Pointing out that 'OpenAI's policy will make nearly 100 papers on AI unreproducible'

In March 2023, the AI research organization OpenAI announced that it would end support for Codex, an AI system that automatically generates code from input natural language. In response, AI researchers Sayash Kapoor and Professor Arvind Narayanan , both doctoral students at Princeton University , argued that 'the discontinuation of support for Codex, which is used in approximately 100 papers, will undermine the reproducibility of research.'

OpenAI's policies hinder reproducible research on language models
https://aisnakeoil.substack.com/p/openais-policies-hinder-reproducible

Codex, developed by OpenAI, is an enhanced version of GitHub Copilot, a source code completion AI tool built and released in partnership with GitHub in July 2021, and can interpret natural language and output appropriate code. Unlike other AI models released by OpenAI, Codex is not open source, so users who want to use it had to apply to OpenAI for access to the model.

However, on March 20, 2023 local time, OpenAI sent an email to users informing them that 'Codex support will end on March 23.'

OAI will discontinue support for Codex models starting March 26. And just like that, all papers and ideas built atop codex (> 200 on ArXiv) will not be replicable or usable as is. Do we still think openness doesn't matter? pic.twitter.com/CEzBgdP1ps
— Delip Rao ???? (@deliprao) March 21, 2023

'Codex is used in approximately 100 academic papers, so if OpenAI were to end support for Codex and users were to lose access, the reproducibility of these academic papers would be lost. The fact that the period from notice to the end of service was less than one week is also extremely short compared to general software practices.'

'Independent researchers will no longer be able to assess the validity of papers and build upon their findings,' Kapoor and his colleagues wrote. 'Furthermore, developers building applications using OpenAI models will no longer be able to guarantee that their applications will continue to work as expected.'

In language modeling research, small changes to the model can affect the results, so ensuring reproducibility requires access to the exact model used in the study. If the results cannot be reproduced when only the new model is available, it becomes impossible to determine whether the difference is due to a difference in the model or whether the study itself was flawed.

The ability of research results to be reproduced by others is important for ensuring the accuracy of scientific research, but in recent years, the decline in reproducibility in scientific research has become a problem.

Scientific 'reproducibility' is in danger - GIGAZINE

In response to the feedback, OpenAI has launched a program to continue supporting researchers' access to the Codex.

Thanks for all the feedback on this, we will continue to support Codex access via our Researcher Access Program. If you are not already part of it, we encourage you to apply in order to maintain access to Codex: https://t.co/9OwWyR54NE
— Logan.GPT (@OfficialLoganK) March 22, 2023

However, Kapoor and his colleagues point out that the application process for the Codex access program is opaque and it is unclear how long access to the Codex will be maintained. Furthermore, OpenAI regularly updates its latest models, such as GPT-3.5 and GPT-4, and only maintains access to previous versions for three months, which impairs the reproducibility of research using the latest models. This means that not only researchers but also developers who create applications using OpenAI models are uncertain about whether their applications will function with future models.

Kapoor and his colleagues point out that language models have become critical infrastructure, and OpenAI's policy of not providing version-controlled models is a blow to the reproducibility of research. While various factors should be considered when open-sourcing large-scale language models, they argue that open-source language models are an important step in ensuring the reproducibility of research.

On the social news site Hacker News, comments such as 'If they end support for old models, they should open source them,' and 'OpenAI is concerned about the dangers of AI and may delay release in consideration of the risks' were posted.

OpenAI's policies hinder reproducible research on language models | Hacker News
https://news.ycombinator.com/item?id=35269304

In addition, upon the release of GPT-4, OpenAI has not disclosed the datasets and training methods used to build it. OpenAI's chief scientist and co-founder, Ilya Satskivar, stated, 'If you believe, as we do, that AI and AGI (artificial general intelligence) will be incredibly powerful, then open sourcing them is pointless and a bad idea. I think it will become clear to everyone in a few years that it is not wise to open source AI.' OpenAI is taking a stance of keeping AI closed.

OpenAI co-founder says 'We were wrong', shifting focus to keeping data private due to AI risks - GIGAZINE

Related Posts:

Mar 24, 2023 13:00:00 in AI, Software, Web Service, Posted by log1h_ik