Cartesia, whose pioneering state-space models (SSMs) are shaping the next wave of innovation in generative AI, announced $22 million in new funding led by Index Ventures, bringing their total capital raised to $27 million. The new funds will allow Cartesia to expand and accelerate their mission of building real-time, multimodal intelligence available on any device. Cartesia’s SSM technology enables developers to build highly-efficient AI applications for a wide range of verticals like customer service, sales & marketing, robotics, healthcare, transportation, education, gaming, defense, security, and more.
Developed by world-class researchers from Stanford’s PhD AI lab, Cartesia’s SSM architecture offers clear advantages over transformers as they scale linearly with sequence length and enable cheap, high-throughput inference. While transformers have revolutionized AI and support many of the applications we see and use today, these models are limited as they scale quadratically in context length, leading to slower inference. By contrast, Cartesia’s models are highly efficient, with better long-term memory, lower latency, and the ability to run locally on any device. While transformers attend to every past token, SSMs update the model’s state and discard previous tokens as they stream in, making them the ideal architecture for real-time inference. The widely cited Mamba architecture from Cartesia’s founding team demonstrates that SSMs can already match transformer performance with fewer resources, making them a more efficient and cost-effective alternative for developers building real-time AI applications.
“It’s well-known that today’s foundation models fall far short of the standard set by human intelligence,” says Karan Goel, Cartesia’s co-founder and CEO. “Not only do these models lack the depth of understanding that humans possess, they’re slow and computationally expensive in a way that restricts their development and use to only the largest companies. At Cartesia, we believe the next generation of AI requires a phase shift in how we think about model architectures and machine learning. That includes SSMs that bring intelligence directly to the device, where it can operate efficiently, in real-time, without reliance on data centers.”
In May 2024, Cartesia released Sonic, their low-latency voice model that generates expressive, lifelike speech, showcasing the power of their SSM architecture for real-time AI use cases. In addition to being the fastest text-to-speech model with < 90 ms latency to first audio, Sonic outperforms the best existing models on the market on voice quality, stability, and accuracy, when compared head to head in blind human preference tests by third party evaluation like Labelbox. Due to the underlying SSM architecture, Sonic has been able to bring never-before seen features to the market, such as an on-device product that can run locally with no internet connection, and advanced controllability features like emotion, speed and prompting. Built in just a few months, the Sonic API already supports a variety of real-time use cases — customer service, debt collection, interview screening, voiceovers, interactive character voices — with hundreds of customers ranging from new startups to public companies.
Sonic is particularly well-suited for a new wave of startups building real time voice agents.The interactive voice response (IVR) market alone is worth $6 billion and expected to grow fourfold in the near term due to improvements pioneered by emerging AI models like Sonic. This is just one sliver of Sonic’s current customer base.
Cartesia plans to build on the success of Sonic with a long-term roadmap that includes developing multimodal AI models capable of ingesting and processing different inputs such as text, audio, video, images, and time-series data, with the goal of creating real-time intelligence that can reason over massive contexts across a wide range of applications. By building the next wave of foundation models with long-term memory and low latency, Cartesia aims to transform industries ranging from healthcare to robotics to gaming, paving the way for ubiquitous, interactive, and real-time AI available to anyone, on any device.
“Transformers have provided a step-change in model performance and fueled much of the recent AI mania, but given their limitations there is opportunity for a fundamentally new and different architecture to unlock the next wave of AI innovation,” says Mike Volpi, Partner at Index Ventures. “We believe Cartesia’s SSMs can be that new architecture, allowing developers to build real-time applications that benefit users on any device. We’re excited to support this team of incredible researchers and engineers who are not only redefining AI performance but also making it more accessible and scalable for businesses of all sizes.”
Cartesia is led by a group of Stanford researchers that includes Goel, his former labmates Albert Gu (named one of Time’s 100 most influential people in AI), Arjun Desai, and Brandon Yang, along with their former professor Chris Ré. Recognized globally for their development of SSMs, the team is situated at the epicenter of a rich ecosystem of talented PhDs and academic partners, with Ré’s Stanford lab in particular serving as a hotbed of research and multiple billion dollar startups in recent years like SambaNova, Snorkel AI, and Together AI. They’re joined by a diverse and well-rounded product team that brings experience from companies like DoorDash, Salesforce, Meta, Scale AI, Microsoft, Google Brain, and Zoom, ensuring that Cartesia is equipped to deliver real-world value to businesses across a range of industries.
SOURCE: PRWeb