TensorOpera and Aethir Team Up to Advance Massive-Scale LLM Training on Decentralized Cloud

TensorOpera, the company providing “Your Generative AI Platform at Scale,” has partnered with Aethir, a distributed cloud infrastructure provider, to accelerate its newest foundation model, TensorOpera Fox-1, highlighting the first mass-scale LLM training use case on a decentralized physical infrastructure network.

Introduced last week, TensorOpera Fox-1 is a cutting-edge open-source small language model (SLM) with 1.6 billion parameters, outperforming other models in its class from tech giants like Apple, Google, and Alibaba. This decoder-only transformer was trained from scratch on three trillion tokens using a novel 3-stage curriculum. It features an innovative architecture that is 78% deeper than comparable models such as Google’s Gemma 2B and surpasses competitors in standard LLM benchmarks like GSM8k and MMLU, even with significantly fewer parameters.

The partnership with Aethir equips TensorOpera with advanced GPU resources necessary for training Fox-1. Aethir’s collaboration with NVIDIA Cloud Partners, Infrastructure Funds, and various enterprise-grade hardware providers has established a global, large-scale GPU cloud. This network ensures the delivery of cost-effective and scalable GPU resources, essential for high-throughput, substantial memory capacity, and efficient parallel processing capabilities. With the support of Aethir’s decentralized cloud infrastructure, TensorOpera obtains the necessary tools for facilitating streamlined AI development that requires high network bandwidth and ample amounts of GPU power.

Also Read: Datadobi Unveils StorageMAP 7.0 to Drive Unstructured Data Insights, Empower Decision-Making, and Optimize Hybrid Cloud

Through this collaboration, TensorOpera is further integrating a pool of GPU resources from Aethir that can be used seamlessly via TensorOpera’s AI platform for a variety of jobs, such as model deployment and serving, fine-tuning, and full training. With Aethir’s distributed GPU cloud network, dynamically adjusting GPU power consumption for AI platforms on the go is no issue. Together, Aethir and TensorOpera aim to empower the next generation of large language model (LLM) training and give AI developers the assets they need to create powerful models and applications.

“I am thrilled about our partnership with Aethir,” said Salman Avestimehr, Co-Founder and CEO of TensorOpera. “In the dynamic landscape of generative AI, the ability to efficiently scale up and down during various stages of model development and in-production deployment is essential. Aethir’s decentralized infrastructure offers this flexibility, combining cost-effectiveness with high-quality performance. Having experienced these benefits firsthand during the training of our Fox-1 model, we decided to deepen our collaboration by integrating Aethir’s GPU resources into TensorOpera‘s AI platform to empower developers with the resources necessary for pioneering the next generation of AI technologies.”

Aethir’s operational model is based on a globally distributed network of top-shelf GPUs capable of effectively servicing enterprise clients in the AI and machine learning industry regardless of their physical locations. To effectively provide lag-free, highly scalable GPU power worldwide, Aethir’s GPU resources are decentralized across a multitude of locations in smaller clusters. Instead of pooling resources in a few massive data centers like in the case of traditional, centralized cloud service providers, Aethir distributes its infrastructure to cover the network’s edge and cut the physical distance between GPU resources and end-users.

SOURCE: BusinessWire

Archives

Categories

Meta