FlashLabs Releases Chroma 1.0, An Open-Source Real-Time Voice AI

FlashLabs

FlashLabs, an AI research lab, has launched Chroma 1.0. This is the first open-source, real-time speech-to-speech AI model with personalized voice cloning. Chroma transforms how we use voice-based AI. It helps developers and businesses create chat systems that respond as quickly as people do.

Chroma directly addresses one of the most persistent challenges in human-AI interaction: latency. Traditional voice assistants use a multi-step process. First, they turn speech into text. Then, they process it with a language model. Finally, they convert responses back to speech. This creates delays that interrupt natural conversation. Chroma works natively in voice. This removes delays and creates smooth, quick interactions that feel natural and easy.

“Voice is the most universal interface in the world, yet it has remained closed, fragmented, and delayed,” said Yi Shi, Founder and Chief Research & Engineering at FlashLabs. “With Chroma, we’re open-sourcing real-time voice intelligence so builders, researchers, and companies can create AI systems that truly work at human speed.”

Also Read: Microsoft Research Unveils Rho-alpha to Advance AI for the Physical World

Built for Real-Time, Not Post-Processing

Chroma is different from traditional voice systems. It’s built as a native speech-to-speech system, designed for live interaction. Key capabilities include:

End-to-end time-to-first-token (TTFT) under 150 milliseconds

Natural conversational turn-taking

Low-latency emotional and prosodic voice control

Stable real-time inference without cascading system delays

With Day-0 support for SGLang, Chroma further enhances performance, achieving approximately 135 milliseconds end-to-end TTFT and real-time factors tuned for production-grade, live deployments.

High-Fidelity Voice Cloning in Seconds

Chroma introduces fast, high-quality voice cloning using only a few seconds of reference audio. This feature enables users to create realistic, personalized voices without needing large datasets or lengthy fine-tuning.

In internal evaluations, Chroma showed:

Speaker similarity score (SIM): 0.817

A +10.96% improvement over the human baseline (0.73)

Top performance in both open and proprietary benchmarks

These results mark a big leap in real-time voice AI. Expressive and personalized speech generation has become easier. This benefits research, product development, and large-scale commercial use.

With Chroma 1.0, FlashLabs is setting a new standard for open, low-latency voice intelligence—unlocking faster, more natural, and more human-like AI interactions across a wide range of real-time applications.