Table of Contents
French AI startup Mistral is shaking up the industry with its new AI model customization options, including paid plans, allowing developers and enterprises to fine-tune their generative models for specific use cases.
Introduction of New AI Model Customization Options
Self-Service Fine Tuning
Mistral has unveiled a software development kit (SDK) called Mistral-Finetune, designed for fine-tuning its models on various setups, including workstations, servers, and small datacenter nodes. According to the readme for the SDK’s GitHub repository, it is optimized for multi-GPU setups but can scale down to a single Nvidia A100 or H100 GPU for fine-tuning smaller models like Mistral 7B. Mistral claims that fine tuning on a dataset like UltraChat, which contains 1.4 million dialogues with OpenAI’s ChatGPT, can be completed in around half an hour using Mistral-Finetune across eight H100 GPUs.
Managed Fine-Tuning Services
For developers and companies seeking a more managed solution, Mistral offers fine-tuning services accessible through its API. Currently compatible with two of Mistral’s models, Mistral Small and Mistral 7B, these services will soon support more models. This managed approach simplifies the fine tuning process, allowing organizations to create highly specialized and optimized models for their specific domains.
Custom Training Services
Mistral also introduces custom training services, available to select customers, enabling the fine-tuning of any Mistral model for an organization’s applications using their data. This approach allows for the creation of highly specialized and optimized models tailored to specific domain needs.
Mistral’s Ambitious Growth Plans
My colleague Ingrid Lunden recently reported that Mistral is seeking to raise around $600 million at a $6 billion valuation from investors, including DST, General Catalyst, and Lightspeed Venture Partners. This ambitious funding effort highlights Mistral’s intention to grow its revenue as it faces significant competition in the generative AI space.
Since launching its first generative model in September 2023, Mistral has released several more models, including a code-generating model, and rolled out paid APIs. However, the company has not disclosed user numbers or revenue figures.
The Importance of Fine-Tuning
Fine-tuning is essential for improving large language model (LLM) outputs and customizing them to meet specific enterprise needs. When done correctly, fine-tuning can lead to more accurate and useful model responses, allowing organizations to derive greater value and precision from their generative AI applications. However, fine-tuning can be expensive, presenting challenges for some enterprises.
Mistral’s Approach to Fine Tuning
Mistral, a prominent player in the open-source AI model space, offers new customization capabilities on its AI developer platform, La Plateforme. These tools are designed to lower training costs and decrease barriers to entry.
Tailoring Mistral Models
Mistral’s new tools enable efficient fine-tuning, reducing deployment costs and improving application speed. Customers can tailor Mistral models on La Plateforme, using their own infrastructure through open-source code provided by Mistral on GitHub, or via custom training services.
For those looking to work on their infrastructure, Mistral released the lightweight codebase mistral-finetune, based on the LoRA paradigm, which reduces the number of trainable parameters a model requires.
For serverless fine-tuning, Mistral offers new services using refined techniques through R&D. LoRA adapters under the hood help prevent models from forgetting base model knowledge while allowing for efficient serving.
New Offerings and Capabilities
Mistral’s fine-tuning services are compatible with the company’s 7.3B parameter model Mistral 7B and Mistral Small. Current users can use Mistral’s API to customize their models, with more models to be added to the fine-tuning services soon.
Custom training services fine-tune Mistral AI models on a customer’s specific applications using proprietary data, often proposing advanced techniques like continuous pretraining to include proprietary knowledge within model weights.
AI Fine Tuning Hackathon
To complement the launch, Mistral has kicked off an AI fine-tuning hackathon, running through June 30, allowing developers to experiment with the startup’s new fine-tuning API.
Mistral’s Meteoric Rise
Founded just 14 months ago by former Google DeepMind and Meta employees Arthur Mensch, Guillaume Lample, and Timothée Lacroix, Mistral has experienced unprecedented growth. The company secured a record-setting $118 million seed round and established partnerships with IBM and others. In February, it released Mistral Large through a deal with Microsoft to offer it via Azure cloud.
Recently, SAP and Cisco announced their backing of Mistral, and the company introduced Codestral, its first-ever code-centric LLM. Mistral is reportedly closing in on a $600 million funding round, aiming for a $6 billion valuation.
Mistral Large is a direct competitor to OpenAI and Meta’s Llama 3, and it is considered the world’s second most capable commercial language model behind OpenAI’s GPT-4. Mistral 7B, introduced in September 2023, outperforms Llama on numerous benchmarks and approaches CodeLlama 7B performance on code.
The Future of Mistral
What will Mistral release next? With its track record of innovation and rapid growth, we can expect more groundbreaking developments soon.