Headline Edit
Posts
OpenAI "On the Cusp" of Level 2 AGI, Generalist Robotics Model, New TTT Architecture, and Better Tiny Local Models

OpenAI "On the Cusp" of Level 2 AGI, Generalist Robotics Model, New TTT Architecture, and Better Tiny Local Models

OpenAI, SkildAI, Meta, AGI, Test-Time Training

Sasha Krecinic
July 16, 2024

This week's AI developments make up for a slow week last week. We see notable commentary, research, and applications. First, OpenAI is allegedly making strides towards Level 2 AI, which they have labeled as "Reasoners," and defined as performing human-level problem-solving tasks. Research on Test-Time Training (TTT) layers shows that models can adapt and improve in real-time, potentially outperforming traditional models in long-context tasks. Skild AI's recent funding underscores the buy-in and investment in generalist AI models to drive robotics forward, and finally, Meta's MobileLLM models demonstrate efficient and capable, on-device AI solutions that address limitations in mobile technology. Happy reading!

--Sasha Krecinic

OpenAI "On the Cusp" of Level 2 AGI

According to a recent Bloomberg article, OpenAI executives have said the company is “on the cusp” of Level 2 AGI in its five-tier AI progress tracking system. It is also rumored that OpenAI demonstrated GPT-4 with improved reasoning capabilities at a recent all-hands meeting. Level 2, involves AI systems performing problem-solving tasks at the level of a human with a doctorate-level education without using any tools. The scale also includes the following stages of artificial intelligence:

Level 1: Chatbots - AI with conversational language abilities

Level 2: Reasoners - AI with human-level problem-solving capabilities

Level 3: Agents - Systems that can take actions

Level 4: Innovators - AI that can aid in invention

Level 5: Organizations - AI that can perform the work of an entire organization

[OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving The company believes its technology is approaching the second level of five on the path to artificial general intelligence] Share this story by email

OpenAI Researcher Comments That Company Focus is Still on Ambitious Research

OpenAI researcher Noam Brown, a specialist in AI reasoning, tweeted, "When I joined @OpenAI a year ago, I feared ChatGPT's success might shift focus from long-term research to incremental product tweaks. But it quickly became clear that wasn't the case. @OpenAI excels at placing big bets on ambitious research directions driven by strong conviction. They remain committed to ambitious research despite the success of ChatGPT." This comment emphasizes that OpenAI continues to prioritize long-term research over incremental product tweaks. It also suggests that they are not solely measuring themselves against existing benchmarks but are focused on paradigm-shifting developments through research, such as Noam Brown's work. This stance contrasts with the messaging of the former alignment team, who recently departed OpenAI. [via @polynoamial] Share this story by email

TTT Layers Match Performance of Transformers and Mamba RNNs

According to a recent research paper, Test-Time Training (TTT) layers match or exceed the performance of strong Transformers and Mamba RNNs in long-context tasks. RNNs stands for Recurrent Neural Networks. They are a type of artificial neural network designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or numerical time series data. TTT is a method where a model continues to update its parameters during the testing phase using self-supervised learning, allowing it to adapt to new data in real-time and improve performance dynamically. The paper highlights that TTT-Linear is faster than Transformers at 8k context and matches Mamba RNNs in wall-clock time. Wall-clock time refers to the actual elapsed time it takes to complete a task, as opposed to the number of operations or computational steps, and is crucial for evaluating the real-world efficiency of algorithms. Despite facing memory I/O challenges, TTT-MLP shows significant potential. TTT layers uniquely update their hidden state through self-supervised learning even during test sequences, making them highly adaptive and efficient for long-context tasks. This innovative approach of using a machine learning model as the hidden state allows TTT-Linear to outperform Transformers in speed at 8k context, demonstrating its potential for scalable applications. [GitHub - test-time-training/ttt-lm-pytorch: Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States] Share this story by email

Skild AI Raises $300 Million in Series A Funding To Develop Universal Robots

Pittsburgh-based robotics startup Skild AI has raised $300 million at a $1.5 billion valuation in a Series A funding round led by Lightspeed Ventures, Softbank, Coatue, and Jeff Bezos. Skild AI's models enable robots to perform tasks in unfamiliar environments, such as climbing stairs and recovering objects that slip out of hand. The robots demonstrated emergent capabilities, showcasing abilities they weren't explicitly taught. The AI model was trained on a database 1,000 times larger than those used by competitors, using diverse data collection techniques. According to the company's press release: "Skild’s model serves as a shared, general-purpose brain for a diverse embodiment of robots, scenarios, and tasks, including manipulation, locomotion, and navigation. From resilient quadrupeds mastering adverse physical conditions to vision-based humanoids performing dexterous manipulation of objects for complex household and industrial tasks, the company’s model will enable the use of low-cost robots across a broad range of industries and applications." [This $1.5 Billion AI Company Is Building A ‘General Purpose Brain’ For Robots] Share this story by email

Meta's MobileLLM Models Boost On-Device Accuracy Without Increasing Size

Meta has introduced MobileLLM models designed for efficient on-device large language models (LLMs). With fewer than a billion parameters, these models perform competitively with larger models in specific tasks, showing significant improvements in chat benchmarks and API calling tasks. They use deep and thin architectures with embedding sharing and grouped-query attention mechanisms, enhancing accuracy without increasing model size. The practicality of these small models is highlighted for mobile devices, addressing concerns such as memory capacity and energy consumption, making them suitable for common on-device use cases. [MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases] Share this story by email