<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Quantization on Learn by Tanhdev</title><link>https://learn.tanhdev.com/tags/quantization/</link><description>Recent content in Quantization on Learn by Tanhdev</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Tue, 26 May 2026 08:00:00 +0700</lastBuildDate><atom:link href="https://learn.tanhdev.com/tags/quantization/index.xml" rel="self" type="application/rss+xml"/><item><title>Tối Ưu vLLM Serving: So Sánh AWQ, GPTQ và GGUF</title><link>https://learn.tanhdev.com/series/slm-playbook/part-6-vllm-deployment-evals/</link><pubDate>Tue, 26 May 2026 08:00:00 +0700</pubDate><guid>https://learn.tanhdev.com/series/slm-playbook/part-6-vllm-deployment-evals/</guid><description>Cẩm nang vận hành SLM trên vLLM. So sánh các định dạng lượng tử hóa AWQ, GPTQ, GGUF và thiết lập cấu hình Dynamic LoRA tiết kiệm RAM GPU hiệu quả.</description></item><item><title>Tối Ưu Hóa Inference &amp; Triển Khai vLLM Trên Production</title><link>https://learn.tanhdev.com/series/ai-data-engineering-pipeline/part-8-inference-optimization-vllm/</link><pubDate>Sun, 17 May 2026 12:00:00 +0700</pubDate><guid>https://learn.tanhdev.com/series/ai-data-engineering-pipeline/part-8-inference-optimization-vllm/</guid><description>Vượt qua giới hạn VRAM và tối ưu chi phí Server khi triển khai LLM 70B với vLLM, PagedAttention và Quantization FP8/AWQ.</description></item></channel></rss>