The rapid advances we have experienced in Artificial Intelligence over the past few years can be partially attributed to the open-source community (OSC). Open source software (OSS, MLFlow, Pandas, PyTorch) has allowed organizations to leverage the power of collaboration and benefit from the multiplier effect of open source. Essentially no one wants to reinvent the wheel for enabling technologies so they incorporate OSS into their commercial products and services. It is sad to say that most of this OSS work that others are profiting from is done by dedicated hobbyists in their spare time. It would be silly of me not to acknowledge that some organizations contribute code and even host projects they rely on.
However, with the arrival of large language models (LLM) like ChatGPT, things have changed. As much as the OSC would like to continue to provide cutting-edge models, there are several constraints. The constraints are talent, compute resources and sustained funding. Large corporations, however, have the means to overcome these constraints. Talent is in short supply because those that participate in open-source projects are often recruited heavily by major corporations. As LLMs grow in number of parameters and are cross-bred with ‘reinforcement learning with human feedback’ or RLHF like ChatGPT, the construction of such a model requires sustained funding and access to very expensive compute resources. For instance, GPT-3 which ChatGPT was fine-tuned from required 3.14e23 flops of computing in order for it to be trained.
Due to the technical and financial constraints, the Open Source community would have more difficulty competing with OpenAI, Microsoft, Google, or other large organizations, thus I propose that these models should be open-sourced in order to aid in future innovation. Large organizations out of social responsibility should be providing financial support to the OSS projects that they utilize. Furthermore, in order to compete with large corporations OSC needs to partner with non-profits (for instance, NumFocus), private businesses, and governments (e.g. NASA’s Open Source Science Initiative) to acquire compute resources and sustainable funding. To illustrate how this could work in the future, consider the open-sourced LLM BLOOM. With its 176 billion parameters, BLOOM is able to generate text in 46 natural languages and 13 programming languages. This is the culmination of a year of work involving over 1000 researchers from 70+ countries and 250+ institutions, training the BLOOM model on the Jean Zay supercomputer in the south of Paris, France thanks to a compute grant worth an estimated €3M from French research agencies CNRS and GENCI.
As individual contributors, what can we do? By supporting and contributing to open-source projects, you can play a crucial role in shaping the future of AI and ensuring it is accessible to all. Let’s work together to make a positive impact on the world and take our place at the forefront of AI innovation.