SGLang

Inference & servingBreakout · Distribution-by-integration

Signal summary

Category	Inference & serving
Breakout	Distribution-by-integration
Launched via	arXiv paper, Company blog, Show HN, GitHub-only (no announcement)
Owned	Brand X, Company blog, Docs-as-SEO, Slack community
Distribution	GitHub repo, PyPI
Integrations	DeepSeek, PyTorch, OpenAI-API-compatible
Amplifiers	AI-lab official account, swyx / Latent Space, DeepSeek, AMD, NVIDIA

Overview

Open-source high-performance serving / inference framework for large language models and multimodal models, originally built around RadixAttention (automatic KV-cache reuse) and a structured-generation frontend language. Current scale: 29,030 GitHub stars and 6,539 forks as of 2026-06-15 (github.com/sgl-project/sglang, verified via GitHub API). The project self-reports running on "over 400,000 GPUs worldwide" and "generating trillions of tokens in production each day" (their claim, README, Nov 2025). Spun out a commercial entity, RadixArk, in May 2026 ($100M seed at ~$400M valuation, Accel-led).

First public appearance

Earliest public artifact is the arXiv preprint, "SGLang: Efficient Execution of Structured Language Model Programs," submitted 2023-12-12 (arxiv.org/abs/2312.07104), which led with the claim of "up to 6.4x higher throughput compared to state-of-the-art inference systems." The public marketing launch, however, was the LMSYS blog post 2024-01-17, "Fast and Expressive LLM Inference with RadixAttention and SGLang" (lmsys.org/blog/2024-01-17-sglang/). Format: research-lab blog post (LMSYS, the same group behind Chatbot Arena and Vicuna). The KSP they led with was a cleaner, rounder hero number than the paper: "up to 5 times higher throughput compared to existing systems, namely Guidance and vLLM," with Hugging Face TGI as an additional baseline. The post introduced RadixAttention as "a technique for automatic and efficient KV cache reuse across multiple LLM generation calls" (a radix-tree data structure mapping token sequences to cached KV tensors), and closed with a GitHub link and "We invite the community to try SGLang and provide us with feedback." Authors included Lianmin Zheng, Ying Sheng, Ion Stoica, Joseph Gonzalez, and Clark Barrett (UC Berkeley / Stanford lineage).

Launch sequence

2023-12-12: arXiv preprint posted (arxiv.org/abs/2312.07104). Headline academic claim: "up to 6.4x higher throughput." Low public visibility at this stage.
2024-01-08: GitHub repo sgl-project/sglang created (GitHub API created_at). Code-first: repo public before the marketing blog.
2024-01-17: LMSYS launch blog ("5x higher throughput with RadixAttention") (lmsys.org/blog/2024-01-17-sglang/). Cross-posted to Hacker News the same day (news.ycombinator.com/item?id=39030452). Visible response: muted, 11 points, 0 comments. A second HN submission of the same blog on 2024-01-19 got 2 points (item id 39055004). The launch did NOT go viral on HN.
2024-01: SGLang chosen to power the serving demo for the official LLaVA v1.6 release (per repo README "News"). First marquee model-team adoption; a credibility signal more than a campaign.
2024-02-05: Follow-up blog: "3x faster JSON decoding with compressed finite state machine" (lmsys.org/blog/2024-02-05-compressed-fsm/). Pattern established: ship-a-feature, post-a-benchmark-blog.
2024-07-25: v0.2 blog: "Achieving Faster Open-Source Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM)" (lmsys.org/blog/2024-07-25-sglang-llama3/). Hero claim: "consistently outperforms vLLM, achieving up to 3.1x higher throughput on Llama-70B," and "often matches or sometimes outperforms TensorRT-LLM." Positioned head-to-head against the two leaders (NVIDIA's TensorRT-LLM and vLLM). HN cross-post got 4 points (item id 41073838).
2024-09-04: v0.3 blog: "7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision" (lmsys.org/blog/2024-09-04-sglang-v0-3/). First DeepSeek-specific optimization headline.
2024-10: First SGLang Online Meetup (community-building motion begins).
2024-12-04: v0.4 blog: "Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs" (lmsys.org/blog/2024-12-04-sglang-v0-4/).
2024-12-15 (pre-DeepSeek baseline)
Repo at 6.5k stars / 578 forks / 59 watchers / 187 contributors, with v0.4.0 (Dec 4 2024) as the latest release, per the Wayback repo snapshot 20241215184337. This is the immediate pre-DeepSeek-V3 baseline: the README "Adoption" wall at this point listed AMD, Baseten, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, Meituan, NVIDIA, RunPod, Stanford, UC Berkeley, xAI and 01.AI (no Azure/Cursor/Oracle yet, those landed during/after the DeepSeek wave).
2024-12-26: The pivotal moment: LMSYS announces SGLang v0.4.1 is "the officially recommended inference solution" for the just-released DeepSeek V3, built jointly by the SGLang and DeepSeek teams with day-0 FP8 support on NVIDIA and AMD GPUs (x.com/lmsysorg/status/1872251875070021831).
2025-01: DeepSeek R1 wave: day-0 support continues, AMD ships a co-marketed technical article ("AMD Instinct GPUs Power DeepSeek-V3 ... with SGLang," amd.com), and LMSYS publicizes that 10+ companies adopted SGLang for DeepSeek serving (x.com/lmsysorg/status/1887262321636221412, Feb 2025).
2025-03-19: "SGLang Joins PyTorch Ecosystem" (pytorch.org/blog/sglang-joins-pytorch/). Institutional legitimacy: noted as "the top choice for serving DeepSeek models by dozens of companies," with xAI serving Grok 3 on it and Microsoft Azure serving DeepSeek R1 on AMD via it.
2025-05-05: "Deploying DeepSeek with PD Disaggregation and Large-scale Expert Parallelism on 96 H100 GPUs" (lmsys.org/blog/2025-05-05-large-scale-ep/). Heavyweight systems-engineering content that built reputation among serious infra teams.
2025-06: a16z Open Source AI Grant (third batch) awarded to SGLang (a16z.com/advancing-open-source-ai-through-benchmarks-and-bold-experimentation/). Plus GB200 NVL72 throughput blogs (Part I June, Part II Sept).
2025-08: Day-0 support for OpenAI's gpt-oss model; SGLang x AMD SF Meetup (8/22) with talks by AMD, xAI, and SGLang.
2025-09 / 2025-10: Day-0 support for DeepSeek-V3.2; SGLang x NVIDIA SF Meetup (10/2); PyTorch Conference 2025 talk; SGLang-Jax (TPU backend).
2026-05-05: RadixArk commercial spinout announced: $100M seed at ~$400M post-money valuation (businesswire press release). (Note: TechCrunch ran a "sources" leak in Jan 2026 ahead of the official launch, techcrunch.com.)

Channels & accounts

GitHub: github.com/sgl-project/sglang: 29,030 stars, 6,539 forks, 900+ contributors (2026-06-15). The primary growth and distribution surface. Org also runs mini-sglang and sgl-learning-materials (slides/learning repo).
Blog: Posted on the LMSYS blog (lmsys.org/blog/) rather than a standalone site for the first ~2 years; this piggybacked on LMSYS's existing audience (Chatbot Arena, Vicuna). Docs at docs.sglang.ai / docs.sglang.io.
X/Twitter: Primary social megaphone is @lmsysorg (the parent org account), used for launch/day-0/meetup announcements. The account has ~15,497 followers (Apify scrape, June 13 2026), with the handle's createdAt of Aug 11 2024, i.e. the @lmsysorg account itself postdates SGLang's Jan 2024 launch (LMSYS's earlier social presence ran under different handles). SGLang has no dedicated standalone X handle; the org account carries the social presence.
Slack: Public community Slack at slack.sglang.io (developer support + coordination).
Dev meeting / roadmap: Public bi-weekly development meeting (meeting.sglang.ai) and public roadmap (roadmap.sglang.io). Unusually open governance for an inference engine.
Meetups / conferences: Recurring in-person meetups co-hosted with AMD and NVIDIA (SF, 2025); PyTorch Conference 2025 talk.
Email: sglang@lmsys.org for enterprise/partnership inquiries.; Not observed as owned channels: no dedicated YouTube, Telegram, Discord, newsletter, or Reddit presence found. Reddit and HN mentions are third-party/organic.

Amplification & KOLs

DeepSeek (the model lab): The single biggest amplifier. By co-launching day-0 support and having SGLang named the "officially recommended inference solution" for V3/R1, DeepSeek effectively routed a global wave of attention to SGLang during the Dec 2024 - Jan 2025 DeepSeek hype cycle. Earned/collaborative, not paid.
AMD: Repeated co-marketed technical articles on its developer/ROCm blogs ("AMD Instinct GPUs Power DeepSeek-V3 ... with SGLang," DeepSeek-R1 on MI300X). Earned/partner.
NVIDIA, Microsoft Azure, xAI, LinkedIn, Cursor, Oracle Cloud, Baseten, Nebius: Named adopters/sponsors in the README "Adoption and Sponsorship" section; their usage functions as logo-credibility amplification. xAI "serving Grok 3 on SGLang" and Azure "serving DeepSeek R1" are the most-cited proof points. Earned.
a16z: Open Source AI Grant (June 2025) is both funding and a reputational endorsement from a top VC. Earned.
PyTorch / Meta: The "joins PyTorch Ecosystem" blog (Mar 2025) is institutional amplification to the broad PyTorch developer base. Earned.
swyx / Latent Space: Covered "Mission Critical Inference with DeepSeek v3, SGLang" (Jan 2025), an influential AI-engineering newsletter/podcast. Earned.; RadixArk's seed round named angels including John Schulman (OpenAI co-founder), Soumith Chintala (PyTorch creator), and Thomas Wolf (Hugging Face co-founder) per coverage (their/press claim), reinforcing insider credibility.

Traction inflection

The breakout was the DeepSeek V3/R1 day-0 support, Dec 2024 - Jan 2025, with the trigger being the 2024-12-26 announcement that SGLang was DeepSeek's "officially recommended inference solution."

Evidence (star curve, from Wayback snapshots of the repo): ~2.6k stars on 2024-06-06; 6.5k stars on 2024-12-15 (11 days before the DeepSeek V3 recommendation); 8.2k stars on 2025-01-30 (i.e. +1.7k in the ~6 weeks spanning the DeepSeek V3/R1 launches); 20.5k on 2025-11-30; ~29.0k on 2026-06-15. The new 2024-12-15 checkpoint tightens the inflection: the repo had only reached 6.5k in its first ~11 months, then the DeepSeek window drove the steepest relative acceleration, not the original Jan-2024 launch. Forks moved in lockstep (164 → 578 → 794 → 3.6k → 6.5k).

Corroborating signals: the PyTorch blog (Mar 2025) explicitly calls SGLang "the top choice for serving DeepSeek models by dozens of companies"; LMSYS publicized "10+ companies" adopting it for DeepSeek serving in early 2025; AMD/Azure/xAI proof points all cluster in this period.

Why not the original launch: the Jan-2024 "5x throughput" blog landed softly (HN: 11 points, 0 comments), and the repo grew only to ~2.6k stars over the first ~5 months. The benchmark-beating launch built the foundation and credibility, but it did not itself cause the breakout.

Confidence: HIGH. The star-curve inflection, the timing of the DeepSeek recommendation, and multiple independent corroborations (PyTorch, AMD, LMSYS's own "10+ companies" count) all point to the same cause. The clean mechanism: SGLang positioned itself as the fastest open serving path for the most hyped open model on the planet, at the exact moment that model went viral, and was endorsed by the model's own creators.

Techniques & tactics

Lead with a single bold benchmark number ("up to 5x throughput") against named, recognizable rivals (vLLM, Guidance, TGI), then escalate to beating the market leaders (TensorRT-LLM, vLLM) in later releases.
Ship-a-feature / post-a-benchmark-blog cadence: nearly every release (v0.2, v0.3, v0.4, large-scale EP, GB200) came with a quantified blog. Performance numbers ARE the content.
Piggyback on an existing audience: published on the LMSYS blog and @lmsysorg account (Chatbot Arena / Vicuna credibility) rather than building a new brand from zero.
Code-first, research-backed: arXiv paper + open Apache-2.0 repo before the marketing blog; technical legitimacy underpinned the claims.
Day-0 model support as a growth engine: racing to support each hot new model (DeepSeek V3/R1/V3.2, Llama 3, gpt-oss, Qwen) on launch day, ideally co-announced with the model team. This is the highest-leverage tactic they ran.
Logo / adoption wall: a prominently maintained "Adoption and Sponsorship" list (xAI, NVIDIA, AMD, Azure, LinkedIn, Cursor, Oracle) used as social proof.
Hardware-vendor co-marketing: recurring joint blogs and meetups with AMD and NVIDIA, who have their own incentive to promote a fast open engine on their silicon.
Open community infrastructure: public Slack, public bi-weekly dev meetings, public roadmap, recurring meetups, learning-materials repo. Low-friction contribution funnel.
Institutional badges: PyTorch Ecosystem membership, a16z Open Source AI Grant, NeurIPS 2024 paper, conference talks.
Notably: coverage repeatedly frames SGLang as having "become popular ... without any marketing or sales team" (their/press framing): growth was technically-driven and adoption-led rather than ad/paid-driven.