OpenAI has announced the development of GPT-4, a large-scale, multimodal model that can process both text and image inputs and produce text outputs. GPT-4 was developed using transformers just as its predecessors, a method which predicts the next token in a document, and was fine-tuned using Reinforcement Learning from Human Feedback (RLHF) before being tested on various professional and academic benchmarks.
One of the main goals of developing such models is to improve their ability to understand and generate natural language text in more complex and nuanced scenarios. To test the capabilities of GPT-4, it was evaluated on a variety of exams originally designed for humans, including a simulated bar exam, on which it achieved a score that falls in the top 10% of test takers. This represents a significant improvement over GPT-3.5, which scored in the bottom 10%. GPT-4 also outperforms previous large language models and most state-of-the-art systems on a suite of traditional NLP benchmarks, including the MMLU benchmark, which covers 57 subjects in multiple languages.
One of the significant improvements in GPT-4 is its ability to accept image inputs, which lets users specify any vision or language task (hence its multimodality). This feature allows GPT-4 to generate text outputs given inputs consisting of interspersed text and images. GPT-4 exhibits similar capabilities on image and text inputs across various domains, including documents with text and photographs, diagrams, or screenshots.
The development of GPT-4 involved six months of iterative alignment, using lessons from OpenAI’s adversarial testing program and ChatGPT. But while GPT-4 has many capabilities, it still has some limitations. For instance, it can “hallucinate” facts and make reasoning errors. OpenAI has made significant progress in reducing these hallucinations, and GPT-4 scores 40% higher than GPT-3.5 on adversarial factuality evaluations. Still, GPT-4 can have biases in its outputs, and it can be confidently wrong in its predictions.
To mitigate these risks, OpenAI has engaged over 50 experts from various domains to adversarially test the model. These experts provided feedback that fed into OpenAI’s mitigations and improvements for the model. OpenAI has also incorporated an additional safety reward signal during RLHF training to reduce harmful outputs.
The capabilities and limitations of GPT-4 create significant and novel safety challenges that must be carefully studied and addressed to mitigate potential harms from its deployment, including bias, disinformation, over-reliance, privacy, cybersecurity, proliferation, and more. Careful study of these challenges is an important area of research given the potential societal impact.
A key challenge of the GPT-4 project was developing deep learning infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed the team to reliably predict some aspects of GPT-4’s performance based on models trained using 1/1,000th the compute of GPT-4. This predictable scaling is important for safety, and the team plans to refine these methods and register performance predictions across various capabilities before large model training begins.
GPT-4 is currently available via ChatGPT Plus and the API with a waitlist. OpenAI plans to release GPT-4’s image input capability for wider availability soon (at this time it only cooperates with a single partner).
Overall, GPT-4 represents a significant milestone in OpenAI’s efforts to scale up deep learning and create large-scale, multimodal models that can process both text and image inputs and produce text outputs. It demonstrates significant improvements in performance and capabilities over its predecessors, although there are still limitations and risks to be mitigated. As OpenAI continues to focus on reliable scaling and safety, it will be interesting to see what future AI systems will look like and how they will impact society. Careful consideration and research will be necessary to ensure that the benefits of such systems are maximized while the risks and limitations are mitigated. Nonetheless, GPT-4 is a remarkable achievement that paves the way for even more impressive developments in the future.