Grass and Inference Launch Video Annotation Model Outperforming Claude 4

Grass and Inference Launch Video Annotation Model Outperforming Claude 4

Grass and Inference.net announced the launch of ClipTagger-12b, a new video annotation model built to identify actions, objects, and logos in video with high accuracy and detail. Applicable across domains from autonomous vehicles to warehouse robotics, it strengthens the perception capabilities that many AI systems rely on.

In benchmark tests, ClipTagger-12b outperforms Claude 4 and GPT-4.1 on annotation metrics like ROUGE and BLEU, while running up to 17x cheaper.

Developed through a collaboration between Grass and Inference.net, ClipTagger-12b was trained by Inference on a subset of over 1 billion videos collected from the public web by Grass and is hosted on Inference’s distributed compute network.

Also Read: FPT Launches AI-First Platform FleziPT, Accelerating Global Enterprise Transformation

“It’s entirely possible to train low-cost, state-of-the-art models with the right data and good engineering,” said Sam Hogan, CEO at Inference.net.

“We believe the future of AI depends on keeping the web open and building the infrastructure needed to turn it into something models can learn from. This was a step in that direction,” said Andrej Radonjic, CEO at Wynd Labs.

The collaboration shows how specialized teams can build and deploy high-performance models once limited to large AI labs, making advanced video annotation accessible to more developers and businesses.

SOURCE: BusinessWire