Tensormesh Launches to Cut AI Inference Costs and Latency 10x

Tensormesh Launches to Cut AI Inference Costs and Latency by 10x

Tensormesh, a company pioneering caching-accelerated inference optimization for enterprise AI, has officially emerged from stealth mode with $4.5 million in seed funding led by Laude Ventures. The company’s breakthrough technology tackles one of the biggest bottlenecks in AI deployment—redundant computation during inference—reducing both latency and GPU spending by up to 10x, while maintaining enterprise control over data and infrastructure.

Academic Roots and Proven Research

Tensormesh was founded by a team of leading faculty members and PhD researchers from the University of Chicago, UC Berkeley, and Carnegie Mellon University, building upon years of research in distributed systems and AI infrastructure. The company is led by Junchen Jiang, a University of Chicago faculty member and the co-creator of LMCache, the leading open-source KV (key-value) caching project. LMCache has earned over 5,000 GitHub stars and contributions from more than 100 developers worldwide.

LMCache’s influence already spans major AI ecosystems—it’s integrated into popular frameworks such as vLLM and NVIDIA Dynamo, and is actively used by top organizations including Bloomberg, Red Hat, Redis, Tencent, GMI Cloud, and WEKA.

Bringing Caching to Enterprise AI

While caching has long been a staple of web and database performance optimization, Tensormesh is the first company to commercialize caching specifically for large-scale AI inference. Its platform combines LMCache-inspired techniques with enterprise-grade capabilities for usability, scalability, security, and manageability.

According to Jiang, the company’s mission is to give enterprises freedom and flexibility in AI deployment without compromising performance or privacy.

“Enterprises today must either send their most sensitive data to third parties or hire entire engineering teams to rebuild infrastructure from scratch,” said Junchen Jiang, Founder and CEO of Tensormesh. “Tensormesh offers a third path: run AI wherever you want, with state-of-the-art optimizations, cost savings, and performance built in.”

Solving the AI Cost Crisis

AI inference is now one of the most expensive components of modern AI workloads. With large language models (LLMs) being deployed at scale, GPU infrastructure costs have skyrocketed. Tensormesh’s solution directly addresses this problem.

Enterprises everywhere are wrestling with the huge costs of AI inference,” said Ion Stoica, advisor to Tensormesh and Co-Founder and Executive Chairman of Databricks. “Tensormesh’s approach delivers a fundamental breakthrough in efficiency and is poised to become essential infrastructure for any company betting on AI.”

The company’s innovation lies in sharing KV-cache data across nodes in a distributed cluster, significantly improving throughput while cutting resource waste. Tensormesh integrates with existing storage backends to enable low-latency, high-throughput deployments, allowing teams to scale efficiently without rearchitecting their systems.

Strategic Collaborations and Industry Validation

Redis, a global leader in real-time data platforms, has been an early collaborator in Tensormesh’s distributed cache sharing solution.

We’ve closely collaborated with Tensormesh to deliver an impressive solution for distributed LLM KVCache sharing across multiple servers,” said Rowan Trollope, CEO of Redis. “Redis combined with Tensormesh delivers a scalable solution for low-latency, high-throughput LLM deployments. Our joint benchmarks demonstrated remarkable improvements in both performance and efficiency. We believe Tensormesh will set a new bar for LLM hosting performance.”

Other enterprise partners are already seeing results. WEKA, a leading data platform company, integrated LMCache technology into its Augmented Memory Grid solution, helping the AI community improve inference efficiency.

“Our partnership with Tensormesh and integration with LMCache played a critical role in helping WEKA open-source aspects of our breakthrough Augmented Memory Grid,” said Callan Fox, Lead Product Manager at WEKA. “It enables the broader AI community to tackle some of the toughest challenges in inference today.”

Cloud-Agnostic and Enterprise-Ready

Tensormesh is designed with flexibility and control in mind. The platform is cloud-agnostic and available both as a SaaS offering and standalone software, giving enterprises the ability to deploy on any infrastructure—public cloud, private data center, or hybrid environment. This adaptability allows organizations to start small and scale as their AI workloads grow, while maintaining strong security and cost efficiency.

Building the Next Layer of the AI Stack

As global AI adoption accelerates, enterprises face mounting pressure to optimize inference efficiency without sacrificing performance. Tensormesh aims to become the “efficiency layer” of the AI infrastructure stack—bringing caching into the mainstream for AI workloads.

“Caching is one of the most underutilized levers in AI infrastructure, and this team has found a smart, practical way to apply it at scale,” said Pete Sonsini, Co-Founder and General Partner at Laude Ventures. “This is the moment to define a critical layer in the AI stack, and Tensormesh is well positioned to own it.”

With strong academic foundations, open-source credibility, and early traction among top enterprises, Tensormesh is setting a new benchmark for AI infrastructure performance—empowering organizations to deploy AI faster, cheaper, and more securely than ever before.

Source link

Share your love