Gimlet Labs Accelerates AI: A New Era of Multichip Inference Cloud

March 25, 2026, 3:31 pm

Menlo Ventures

Gimlet Labs

Artificial IntelligenceBuildingCloudInfrastructureLabProductResearch

Gimlet Labs secured $80M in Series A funding. The startup offers a unique multi-silicon inference cloud. It addresses critical AI inference bottlenecks. The platform runs AI workloads across diverse chip types simultaneously. This boosts efficiency significantly. It reduces idle chip time. Inference speeds improve 3-10x for the same cost and power. Gimlet targets large AI model labs and data centers. Key partners include Nvidia, AMD, and Intel. This funding fuels rapid expansion. It solidifies heterogeneous computing as AI's future. Gimlet's innovation promises to reshape global AI infrastructure. The company already boasts eight-figure revenue. Its customer base has doubled recently.

The artificial intelligence landscape evolves rapidly. AI models grow in complexity. Their deployment presents significant challenges. Running these advanced models, a process called inference, demands immense computational power. Current infrastructure often struggles. Bottlenecks emerge. These limit AI's full potential.

Traditional inference clouds typically rely on uniform hardware. A single chip type handles various tasks. This creates inefficiencies. Different AI tasks have distinct hardware requirements. A single processor cannot optimize for all. This leads to idle resources. It wastes vast sums. This is where Gimlet Labs steps in.

Gimlet Labs pioneers a new approach. It built the world's first multi-silicon inference cloud. This revolutionary platform orchestrates AI workloads. It distributes them across diverse processing units. Central processing units (CPUs) handle certain tasks. Graphics processing units (GPUs) manage others. Specialized memory-heavy processors join the fleet.

Consider an autonomous AI agent. It performs complex operations. It chains dozens of model calls. It executes retrieval steps. Tool invocations also occur. Each step demands specific hardware attributes. Prefill tasks are compute-bound. Decode operations are memory-bound. Tool calls are network-bound. No single chip excels at all three. Heterogeneous computing provides the answer.

Gimlet Labs' software layer makes this possible. It intelligently splits AI tasks. It assigns them to the most suitable chip. Compute-intensive batch inference thrives on GPUs. Latency-sensitive workloads benefit from specialized processors. Groq, Cerebras, and d-Matrix offer exceptional speed for these. Orchestration and tool use generally run better on CPUs. This intelligent allocation maximizes hardware utilization.

The impact is substantial. Gimlet Labs dramatically improves efficiency. It slashes chip idle time. Inference workloads accelerate significantly. Performance gains range from three to ten times. This occurs for the same cost. Power consumption also remains constant. The platform can even partition AI models themselves. Different parts run on different chips. This pushes efficiency boundaries further.

Existing hardware often operates inefficiently. Utilization rates hover between 15% and 30%. This represents a colossal waste of resources. Gimlet Labs aims to elevate AI workload efficiency. Its goal is a tenfold improvement. This ambition reshapes the economics of AI deployment.

Gimlet Labs targets specific clientele. Its software is not for general developers. It serves the largest AI model labs. It caters to expansive data centers. These organizations manage immense computational needs. They demand peak performance and cost-effectiveness. Gimlet Labs delivers.

Strategic partnerships underpin its success. Gimlet collaborates with major chipmakers. Nvidia Corp. is a partner. Advanced Micro Devices Inc. also participates. Intel Corp. joins the roster. Arm Holdings Plc and Cerebras Systems Inc. complete the group. These collaborations ensure broad hardware compatibility. They solidify industry integration.

Gimlet offers key products. The Gimlet cloud provides serverless inference. It supports AI agents. Users can run simple agents. Complex multi-agent systems also deploy seamlessly. Custom logic and data sources integrate easily. The platform handles scheduling. It manages orchestration. Optimization is automatic. Users focus on agent capabilities. It supports existing agentic pipelines. It chains multiple models. Non-model stages, like search, also integrate.

Another innovation is kforge. This tool autonomously generates optimized low-level kernels. It works directly from PyTorch. Kforge employs an innovative multi-agent system. It uses shared memory. This system explores diverse designs. It enforces strict correctness checks. It automatically identifies the fastest kernels. This accelerates both training and inference. It supports CUDA, ROCm, and Metal backends.

Gimlet Labs shows impressive traction. The company launched its platform recently. It already generates eight-figure revenue. Its customer base has doubled. This growth occurred in just four months. A major model maker is a client. An extremely large cloud computing company also uses its services.

This success attracted significant investment. Gimlet Labs recently closed an $80 million Series A funding round. Menlo Ventures led the investment. Factory participated. Eclipse, Prosperity7, and Triatomic also joined. This brings total funding to $92 million.

The new capital fuels expansion. Gimlet Labs needs resources to scale. It plans to grow its team. The inference cloud will expand. This meets surging demand. High-speed, efficient multichip inference is becoming essential. This funding ensures Gimlet can deliver.

Gimlet Labs' vision extends beyond current capabilities. It aims for multichip inference to become the industry standard. This paradigm shift will redefine AI infrastructure. It will make advanced AI more accessible. It will unlock new possibilities for AI agents.

The future of AI computing is heterogeneous. No single chip can solve all problems. Gimlet Labs recognizes this fundamental truth. Its platform provides the crucial software layer. It unifies diverse hardware. It maximizes AI performance.

This innovation positions Gimlet Labs at the forefront. It addresses one of AI's biggest bottlenecks. By optimizing resource use, it accelerates development. It lowers operational costs. It pushes the boundaries of artificial intelligence. Gimlet Labs paves the way for a more efficient AI future. It makes powerful AI models practical and scalable for global deployment.