• Headline Edit
  • Posts
  • Smaller Models Get Better and China Invests in Chip Independence

Smaller Models Get Better and China Invests in Chip Independence

China, Mistral, Llama, Andrej Karpathy, Sonic

This week's newsletter has a theme: the miniaturization of models while maintaining comparable results to larger models. This once again shows a potential path to locally hosted models. We have also intentionally left out all the drama/arguing happening on X between Elon and Yann, as it lacked substance. We also spotlight China's huge $47 billion USD investment in its semiconductor industry in an attempt to gain independence from US chip manufacturers and regulations.

— Sasha

The team from AmbientGPT has released Llama3-V, a new AI model developed on the Llama3 platform. Llama3-V has been launched and shows a 10-20% improvement in benchmarks over Llama, the current state-of-the-art model for multimodal understanding. It is also 100 times smaller than the current SOTA models. The team has also released a Mac interface, which uses the context from the screen to help improve prompt responses and can run using local or cloud models. [Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars]

According to a recent post, Andrej Karpathy has replicated the GPT-2 (124M) model in approximately 90 minutes of training at a cost of around $20. The training utilized the FineWeb dataset with 10 billion tokens and achieved a HellaSwag accuracy of 29.9, surpassing the previous score of the original GPT-2 model of 29.4. The exercise highlights the replicability of GPT models and the limited costs of training small models. [Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20 · karpathy llm.c · Discussion #481]

Mistral AI introduces Codestral, a 22B open-weight generative AI model designed for code generation, fluent in over 80 programming languages, and equipped with a 32k context window. Codestral, now available under a Non-Production License and accessible via an API, outperforms competitors in benchmarks of similar sizes and, based on user testimony, even challenges models of significantly larger sizes. [Codestral: Hello, World!]

Cartesia has launched Sonic, a new voice model with a latency of 135ms, designed for real-time applications such as customer support and entertainment. According to Cartesia, Sonic has demonstrated superior performance in tests, achieving twice the accuracy in audio generation and delivering initial audio output 1.5 times faster than existing Transformer models, with a fourfold increase in processing speed. [Sonic Demo]

China has launched its largest state-backed investment fund to date, totaling 344B yuan ($47B USD), to bolster its semiconductor industry, according to official sources. The fund, established on May 24, includes significant contributions from China's finance ministry and major Chinese banks, leading to a 3% increase in the CES CN Semiconductor Index [China sets up third fund with $47.5 bln to boost semiconductor sector]

Chatbot Arena has launched a new "Hard Prompts" category on its leaderboard, responding to community interest in more complex challenges for AI language models. The harder prompts also expose models with more robust capabilities and show that not all models' performance degrades equally as prompts become harder. Overfitting and fine-tuning towards the benchmark prompts are often issues, so the prompts were carefully curated in this instance to get a richer picture of model performance. [Introducing Hard Prompts Category in Chatbot Arena | LMSYS Org]