- Headline Edit
- Posts
- OpenAI "On the Cusp" of Level 2 AGI, Generalist Robotics Model, New TTT Architecture, and Better Tiny Local Models
OpenAI "On the Cusp" of Level 2 AGI, Generalist Robotics Model, New TTT Architecture, and Better Tiny Local Models
OpenAI, SkildAI, Meta, AGI, Test-Time Training
This week's AI developments make up for a slow week last week. We see notable commentary, research, and applications. First, OpenAI is allegedly making strides towards Level 2 AI, which they have labeled as "Reasoners," and defined as performing human-level problem-solving tasks. Research on Test-Time Training (TTT) layers shows that models can adapt and improve in real-time, potentially outperforming traditional models in long-context tasks. Skild AI's recent funding underscores the buy-in and investment in generalist AI models to drive robotics forward, and finally, Meta's MobileLLM models demonstrate efficient and capable, on-device AI solutions that address limitations in mobile technology. Happy reading!
--Sasha Krecinic
According to a recent Bloomberg article, OpenAI executives have said the company is “on the cusp” of Level 2 AGI in its five-tier AI progress tracking system. It is also rumored that OpenAI demonstrated GPT-4 with improved reasoning capabilities at a recent all-hands meeting. Level 2, involves AI systems performing problem-solving tasks at the level of a human with a doctorate-level education without using any tools. The scale also includes the following stages of artificial intelligence:
Level 1: Chatbots - AI with conversational language abilities
Level 2: Reasoners - AI with human-level problem-solving capabilities
Level 3: Agents - Systems that can take actions
Level 4: Innovators - AI that can aid in invention
Level 5: Organizations - AI that can perform the work of an entire organization
[OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving The company believes its technology is approaching the second level of five on the path to artificial general intelligence] Share this story by email
OpenAI researcher Noam Brown, a specialist in AI reasoning, tweeted, "When I joined @OpenAI a year ago, I feared ChatGPT's success might shift focus from long-term research to incremental product tweaks. But it quickly became clear that wasn't the case. @OpenAI excels at placing big bets on ambitious research directions driven by strong conviction. They remain committed to ambitious research despite the success of ChatGPT." This comment emphasizes that OpenAI continues to prioritize long-term research over incremental product tweaks. It also suggests that they are not solely measuring themselves against existing benchmarks but are focused on paradigm-shifting developments through research, such as Noam Brown's work. This stance contrasts with the messaging of the former alignment team, who recently departed OpenAI. [via @polynoamial] Share this story by email
According to a recent research paper, Test-Time Training (TTT) layers match or exceed the performance of strong Transformers and Mamba RNNs in long-context tasks. RNNs stands for Recurrent Neural Networks. They are a type of artificial neural network designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or numerical time series data. TTT is a method where a model continues to update its parameters during the testing phase using self-supervised learning, allowing it to adapt to new data in real-time and improve performance dynamically. The paper highlights that TTT-Linear is faster than Transformers at 8k context and matches Mamba RNNs in wall-clock time. Wall-clock time refers to the actual elapsed time it takes to complete a task, as opposed to the number of operations or computational steps, and is crucial for evaluating the real-world efficiency of algorithms. Despite facing memory I/O challenges, TTT-MLP shows significant potential. TTT layers uniquely update their hidden state through self-supervised learning even during test sequences, making them highly adaptive and efficient for long-context tasks. This innovative approach of using a machine learning model as the hidden state allows TTT-Linear to outperform Transformers in speed at 8k context, demonstrating its potential for scalable applications. [GitHub - test-time-training/ttt-lm-pytorch: Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States] Share this story by email
Pittsburgh-based robotics startup Skild AI has raised $300 million at a $1.5 billion valuation in a Series A funding round led by Lightspeed Ventures, Softbank, Coatue, and Jeff Bezos. Skild AI's models enable robots to perform tasks in unfamiliar environments, such as climbing stairs and recovering objects that slip out of hand. The robots demonstrated emergent capabilities, showcasing abilities they weren't explicitly taught. The AI model was trained on a database 1,000 times larger than those used by competitors, using diverse data collection techniques. According to the company's press release: "Skild’s model serves as a shared, general-purpose brain for a diverse embodiment of robots, scenarios, and tasks, including manipulation, locomotion, and navigation. From resilient quadrupeds mastering adverse physical conditions to vision-based humanoids performing dexterous manipulation of objects for complex household and industrial tasks, the company’s model will enable the use of low-cost robots across a broad range of industries and applications." [This $1.5 Billion AI Company Is Building A ‘General Purpose Brain’ For Robots] Share this story by email
Meta has introduced MobileLLM models designed for efficient on-device large language models (LLMs). With fewer than a billion parameters, these models perform competitively with larger models in specific tasks, showing significant improvements in chat benchmarks and API calling tasks. They use deep and thin architectures with embedding sharing and grouped-query attention mechanisms, enhancing accuracy without increasing model size. The practicality of these small models is highlighted for mobile devices, addressing concerns such as memory capacity and energy consumption, making them suitable for common on-device use cases. [MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases] Share this story by email