• Headline Edit
  • Posts
  • What If Our Assumptions On Compute Requirements Are Wrong? How A Tiny 8B Model Outperforms GPT-4 (6.21.24)

What If Our Assumptions On Compute Requirements Are Wrong? How A Tiny 8B Model Outperforms GPT-4 (6.21.24)

Neuromorphic Computing, SakanaAI, AIResearch, HippoRAG

This week's edition focuses on the AI/Nature parallels, and one of the hottest fields of research: neuromorphic computing (the method of computer engineering in which elements of a computer are modeled after systems in the human brain and nervous system). 

While nature isn't a perfect model, it provides some upper and lower bounds for us to benchmark what is theoretically possible. The human brain uses something like 25 watts of energy for cognitive functions per hour. A single GPT-4 query is estimated to use something like 300 watts. This chasm of consumption hints that nature has found a way to solve for 'compute' on a budget. Algorithmic optimization and neuromorphic computing, using nature as inspiration and a research catalyst, have the potential to solve our ever-increasing hunger for computing and dynamically trained models.

This week, we have some interesting developments on this front, raising an important question: "What if our assumptions about the computing requirements to train and run these models are wrong?" Researchers from the Shanghai AI Lab published a paper that sheds some light, achieving results from a tiny 8B parameter model that outperforms GPT-4 in math and coding. We also see a RAG methodology called HippoRAG that emulates aspects of the hippocampus in the human brain. Additionally, we see an LLM training methodology from the team at SakanaAI, who use large language models for discovering and optimizing new training algorithms, akin to natural evolution.

A final thought: Is this AI wave just hype? Have we “used up all the data for training”? Will we run out of compute? There is a lot of commentary like this saying that AI is “losing steam.” However, there are two different things that people often conflate. On the surface, this notion might appear to be true if you measure it by today's commercial applications (which still leverage legacy technology often). From a "frontier" research perspective (which I would posit is the rate-limiting step here), the developments we cover this month, in tandem with commentary from the founders I have spoken to, paint a very different picture. 

To paraphrase some sentiments from experienced founders, "The frontier is moving so quickly that you have to choose what to build very carefully so you are not made obsolete by the rapid movements at the frontier." Sadly, the investment and hype at the wrapper/surface layer are overshadowing the large strides that continue to be made in the research space. Tying this back to this week’s title: the research is how an 8B parameter model outperformed GPT-4.

A small 8B Llama-3 model combined with Monte Carlo Tree Search (MCTS) reportedly outperforms GPT-4 in complex mathematical reasoning tasks. The MCTS algorithm enhances mathematical reasoning by combining LLMs with Monte Carlo Tree Search (MCTS). It systematically explores and refines solutions using heuristic methods (practical techniques used to solve complex problems efficiently, relying on rules of thumb, educated guesses, and intuitive judgments to find satisfactory solutions quickly, even if they are not perfect or optimal). The algorithm builds a search tree by selecting, refining, evaluating options, and optimizing decisions using an enhanced Upper Confidence Bound (UCB) formula. Testing shows that MCTS significantly improves success rates on challenging math problems, advancing LLMs in complex tasks for more accurate and reliable AI-driven decision-making. [twitter.com]

By leveraging LLMs to automatically create and test new optimization algorithms, Sakana AI developed Discovered Preference Optimization (DiscoPOP), a novel method combining logistic and exponential losses. This approach, which showed state-of-the-art performance, marks a step towards using AI to advance AI, reducing human intervention and computational resources. The research highlights the potential of LLM-driven discovery to continuously improve AI models, opening new avenues for innovation and efficiency in AI development. [Sakana AI]

New research from Ohio State University introduces HippoRAG, a retrieval framework inspired by human memory, which they say outperforms existing methods by up to 20%, while being 10-30 times cheaper and 6-13 times faster. The study highlights HippoRAG's integration of LLMs, knowledge graphs, and the Personalized PageRank algorithm, demonstrating improvements in multi-hop question answering and single-step retrieval. HippoRAG mimics the human brain's memory processes by integrating large language models (LLMs) with knowledge graphs and the Personalized PageRank algorithm, similar to how the hippocampus and neocortex work together to store and integrate knowledge efficiently and effectively. [HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models ]

Ryan Greenblatt says he achieved 50% accuracy on the ARC-AGI public test set using GPT-4o, surpassing the previous state-of-the-art of 34%. He claims his solution reached 72% accuracy on a subset of the train set, compared to human performance of 85%, by using specialized few-shot prompts and better grid representations. [Getting 50% (SoTA) on ARC-AGI with GPT-4o]

DeepSeek-Coder-V2, an open-source model, has reportedly outperformed GPT4-Turbo in coding and math, supporting 338 programming languages and extending context length to 128K. According to a paper posted on GitHub, it achieved 90.2% on HumanEval and 75.7% on MATH, surpassing GPT-4-Turbo-0409. [DeepSeek-Coder-V2/paper.pdf at main · deepseek-ai/DeepSeek-Coder-V2]