Fakta
"vLLM Serving: High‑Throughput LLM APIs with PagedAttention and KV Cache Tuning"
Built for experienced ML systems engineers, platform architects, and performance-minded practitioners, this book is a deep technical guide to serving large language models with vLLM at production scale. Rather than treating inference as a black box, it explains the real control surfaces behind throughput, latency, and memory efficiency. Readers who already know LLM fundamentals but want to reason rigorously about serving behavior will find an internals-first, systems-oriented treatment.
At the core of the book are the mechanisms that make vLLM distinctive: PagedAttention, continuous batching, KV cache design, and scheduler-driven execution. You will learn how request flow, cache allocation, sequence length, prefix reuse, quantized KV storage, and offloading strategies interact to determine concurrency limits and user-visible performance. The book also covers OpenAI-compatible API serving, streaming semantics, realistic benchmarking, and disciplined troubleshooting, so readers can move from conceptual understanding to evidence-based tuning and operational decisions.
The emphasis throughout is on advanced mental models, trade-offs, and production diagnostics rather than introductory walkthroughs. This is a focused guide for readers comfortable with GPU inference, transformer decoding, and performance measurement who want a precise framework for designing, tuning, and operating high-throughput LLM APIs with confidence.
© 2026 NobleTrex Press (E-bog): 6610001219406
Udgivelsesdato
E-bog: 10. maj 2026
Over 1 million titler
Download og nyd titler offline
Eksklusive titler + Mofibo Originals
Børnevenligt miljø (Kids Mode)
Det er nemt at opsige når som helst
For dig som lytter og læser ofte.
129 kr. /måned
Eksklusivt indhold hver uge
Fri lytning til podcasts
Ingen binding
For dig som lytter og læser ubegrænset.
159 kr. /måned
Eksklusivt indhold hver uge
Fri lytning til podcasts
Ingen binding
For dig som ønsker at dele historier med familien.
Fra 179 kr. /måned
Fri lytning til podcasts
Kun 39 kr. pr. ekstra konto
Ingen binding
179 kr. /måned
For dig som vil prøve Mofibo.
89 kr. /måned
Gem op til 100 ubrugte timer
Eksklusivt indhold hver uge
Fri lytning til podcasts
Ingen binding
Har du en rabatkode?
Indtast koden her