MangoBoost Sets New Standards in AI Training Storage with Record-Breaking Performance in MLPerf Storage v2.0
MangoBoost, a leader in cutting-edge system solutions for maximizing compute efficiency and scalability, has achieved a groundbreaking milestone in the latest MLPerf Storage v2.0 benchmark with its Mango StorageBoost™ solution. This accomplishment underscores the company’s leadership in DPU-accelerated NVMe/TCP storage systems, showcasing unparalleled performance, efficiency, and scalability for AI training workloads. The results not only highlight MangoBoost’s technological prowess but also redefine what is possible in distributed AI storage architectures.
Record-Breaking Performance Across Key Metrics
In the Fabric-attached Block Storage category of MLPerf Storage v2.0, MangoBoost’s Mango StorageBoost™ solution delivered exceptional results, setting new benchmarks for performance and scalability. The solution, which includes the Mango StorageBoost™ NVMe/TCP Initiator (NTI) and Target (NTT), demonstrated line-rate throughput over a 400G Ethernet fabric. This performance rivals that of local SSDs, providing near-native speeds for demanding AI workloads such as 3D-UNet on both NVIDIA A100 and H100 GPUs.
Key highlights from MangoBoost’s submission include:
- 6.2x GPU Scalability: Compared to alternative solutions, MangoBoost achieved 6.2x better GPU scalability for 3D-UNet workloads on NVIDIA A100 GPUs and a range of 1.25x to 7.5x on H100 GPUs.
- Higher Throughput per Bandwidth: MangoBoost delivered 1.57x higher throughput per 400G bandwidth on A100 GPUs and up to 2.05x on H100 GPUs, outperforming competing solutions.
- Near-Local SSD Performance: The system closely matched the performance of local Solidigm D7-PS1030 drives, proving its ability to emulate high-performance SSDs in distributed environments.
- Superior Cost Efficiency: MangoBoost’s solution outperformed NVIDIA’s BlueField-3 DPU in terms of throughput while offering significant reductions in total cost of ownership (TCO), making it a more cost-effective choice for large-scale deployments.
These results underscore MangoBoost’s ability to deliver best-in-class performance without compromising on efficiency or scalability, addressing the growing demands of AI training workloads.
Unlocking New Possibilities in AI Storage Architecture
The architecture behind MangoBoost’s submission demonstrates its innovative approach to AI storage. The solution deployed the NTI on the host and the NTT on the storage server, connected via a 400G Ethernet switch. This configuration enabled the system to handle demanding AI workloads across multiple GPUs with near-zero CPU overhead and maximum bandwidth utilization.
One of the standout features of MangoBoost’s solution is its ability to outpace even NVIDIA’s BlueField-3 DPU systems running both NVMe/TCP and NVMe/RDMA in equivalent test conditions. Not only did MangoBoost achieve higher throughput, but it also delivered substantial reductions in cost of ownership as systems scale. This combination of performance and cost efficiency makes MangoBoost’s solution an attractive option for organizations looking to optimize their AI infrastructure.
The Technology Behind the Results
At the core of MangoBoost’s success lies its tightly integrated suite of technologies, each designed to maximize performance and efficiency:
- NVMe/TCP Initiator (NTI): The NTI offloads the entire NVMe/TCP stack to hardware, delivering full-duplex line-rate performance without consuming any CPU resources. This ensures maximum efficiency and eliminates bottlenecks caused by software-based processing.
- NVMe/TCP Target (NTT): The NTT fully accelerates TCP/IP and NVMe-oF processing, enabling seamless storage disaggregation over standard Ethernet networks. By eliminating CPU involvement, the NTT ensures that storage servers can operate at peak performance without being constrained by computational overhead.
- GPU Storage Boost (GSB): The GSB enables direct DMA transfers between GPU memory and local or remote storage, bypassing the CPU entirely. This feature significantly improves I/O efficiency, reducing latency and enhancing overall system performance for AI workloads.
Together, these components form a cohesive solution that addresses the unique challenges of AI training storage, including high bandwidth requirements, low latency, and efficient resource utilization.
Why MangoBoost’s Achievement Matters
As AI workloads become increasingly complex and data-intensive, the need for high-performance storage solutions has never been greater. Traditional storage architectures often struggle to keep pace with the demands of modern AI applications, leading to bottlenecks and reduced efficiency. MangoBoost’s achievement in MLPerf Storage v2.0 demonstrates how DPU-accelerated NVMe/TCP storage systems can overcome these challenges, delivering the speed, scalability, and reliability required for next-generation AI workloads.
Moreover, MangoBoost’s focus on cost efficiency ensures that organizations can deploy cutting-edge storage solutions without breaking the bank. By offering superior performance-to-cost ratios compared to alternatives like NVIDIA’s BlueField-3 DPU, MangoBoost makes advanced AI storage accessible to a broader range of enterprises, from startups to large-scale data centers.
A Vision for the Future of AI Storage
MangoBoost’s record-breaking performance in MLPerf Storage v2.0 is more than just a technical achievement—it represents a significant step forward in the evolution of AI storage architecture. By combining hardware acceleration, intelligent design, and seamless integration, MangoBoost has created a solution that not only meets today’s demands but also sets the stage for future innovations.
As AI continues to transform industries, the importance of efficient, scalable, and cost-effective storage solutions cannot be overstated. MangoBoost’s Mango StorageBoost™ solution exemplifies how technology can be leveraged to unlock new possibilities, empowering organizations to push the boundaries of what’s possible in AI research and development.
About MangoBoost
MangoBoost is a provider of cutting-edge, full-stack system solutions for maximizing compute efficiency and scalability. At the heart of the solutions is the MangoBoost Data Processing Unit (DPU), which ensures full compatibility with general-purpose GPUs, accelerators, and storage devices, enabling cost-efficient, standardized AI infrastructure. Founded in 2022 on a decade of research, MangoBoost is rapidly expanding its operations in the U.S., Canada, and Korea.