Untether AI, the leader in energy-centric AI inference acceleration, announced the availability of early access (EA) of its imAIgine Software Development Kit (SDK) supporting the speedAI inference acceleration solutions.
The imAIgine SDK provides a push-button flow, streamlining the process of converting trained neural network models into optimized, inference-ready models to be run on speedAI acceleration solutions. This latest EA release supports the speedAI family of devices and PCIe accelerator cards, which set a new industry benchmark of energy efficiency and 2000 TFLOPs of AI inference performance per device.
“Providing the early access version of the imAIgine SDK enables users to prepare their neural networks for the upcoming shipment of speedAI devices and cards,” said Philip Lewer, Sr. Director of Product at Untether AI. “With an extensive array of model garden and kernel support, automated compilation, and sophisticated analysis tools, this EA release gives users everything they need to easily deploy their models on the revolutionary speedAI family of inference acceleration solutions.”
Push-button flow for simple model deployment
The imAIgine SDK provides an automated path to running neural networks on Untether AI’s inference acceleration solutions, with push-button quantization, optimization, physical allocation, and multi-chip partitioning. Supporting either TensorFlow or PyTorch, a few simple python commands quantize, lower, physically allocate, and run the models on speedAI hardware in a matter of minutes. With a comprehensive model garden library and kernel support users can quickly run classification, object detection, semantic segmentation, or natural language processing (NLP) models on speedAI hardware. Sophisticated, automated quantization techniques convert the neural network to the preferred datatype. For the utmost in accuracy, post-quantization training (PQT) and knowledge distillation algorithms are available to maintain accuracy after quantization. During compilation the imAIgine SDK performs layer-fusion optimizations, graph-lowering, kernel mapping, and physical allocation to provide an optimal implementation result.
Power-user flow for low-level optimizations
With the power-user flow, users can directly develop optimized “bare metal” kernels for the over 1,400 RISC-V processors and over 350,000 at-memory compute processing elements in speedAI devices. Analogous to CUDA, but written in familiar C/C++, these kernels are directly compiled using a modified version of LLVM, enhanced to take advantage of the over 30 custom instructions Untether AI has added to the instruction set for its ultra-efficient at-memory compute architecture. Users can then manually place the kernels in any topology on the memory banks of the speedAI spatial architecture.
Extensive suite of analysis tools including virtual hardware
Within the imAIgine SDK there are several tools to analyze how networks are running on the speedAI devices, providing a virtual hardware view prior to receiving actual devices. The Model Explorer shows the entire floorplan of how the neural network is mapped to the silicon, enabling interactive inspection of connection topology, socket depth, and performance estimates. This can be enhanced by the Analysis Dashboard to provide information on processor activity, packet exchanges, and utilization. All of these tools provide a virtual hardware environment to help guide the user for optimal efficiency and performance.
Untether AI invites prospective customers and partners to explore the transformative potential of speedAI and the imAIgine SDK.
SOURCE: BusinessWire