Operations
Scale
Society, Ethics & Sustainabilty

Escaping the Cloud: High-Performance AI in your Browser

Session Abstract

Server-side inference is the bottleneck of modern AI, creating costs and privacy hurdles. But what if the solution is scaling down to the browser? This session investigates Client-Side AI using WebGPU, ONNX Runtime, and Transformers.js. We’ll explore the reality of hardware access, model size, and the 2026 trade-offs of browser based execution.

Session Description

Server-side inference is the bottleneck of modern AI. It introduces network latency, creates massive operational costs, and forces complex privacy compliance. But what if we could push the compute entirely to the edge, specifically, the browser tab?

This session explores the architecture of Client-Side AI, where the strategy is to distribute the workload to the user’s own hardware.

We will investigate the modern browser-based ML stack:

  • The Runtime: How ONNX Runtime provides a near-native execution environment for models trained in PyTorch or TensorFlow.
  • The Hardware Access: Leveraging WebGPU to unlock direct access to the client’s GPU, bypassing the limitations of legacy WebGL.
  • The Pipeline: A technical look at optimizing transformer models (quantization, caching) for delivery over the wire using libraries like Transformers.js.

But most of all, we will look at actual demos of LLMs, speech and computer vision models all running in the browser. We’ll be honest about the trade-offs: memory limits, model size constraints, and the reality of browser compatibility in 2026.

Join us to see if the future of AI scaling is actually… no servers at all.