<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[PagedAttention 听起来很底层，为什么 vLLM 经常提]]></title><description><![CDATA[<p dir="auto">vLLM 文档和文章经常提 PagedAttention。应用开发要懂到什么程度？</p>
]]></description><link>https://localaihub.com/topic/159/pagedattention-听起来很底层-为什么-vllm-经常提</link><generator>RSS for Node</generator><lastBuildDate>Wed, 03 Jun 2026 18:50:28 GMT</lastBuildDate><atom:link href="https://localaihub.com/topic/159.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 11 May 2026 07:40:00 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Tue, 12 May 2026 05:03:00 GMT]]></title><description><![CDATA[<p dir="auto">最后还是看指标：首 token、吞吐、失败率、显存、恢复时间。</p>
]]></description><link>https://localaihub.com/post/1661</link><guid isPermaLink="true">https://localaihub.com/post/1661</guid><dc:creator><![CDATA[小吴]]></dc:creator><pubDate>Tue, 12 May 2026 05:03:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Tue, 12 May 2026 01:58:00 GMT]]></title><description><![CDATA[<p dir="auto">对。你选的是整体推理服务，不是一个名词。</p>
]]></description><link>https://localaihub.com/post/1660</link><guid isPermaLink="true">https://localaihub.com/post/1660</guid><dc:creator><![CDATA[阿航]]></dc:creator><pubDate>Tue, 12 May 2026 01:58:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Mon, 11 May 2026 22:55:00 GMT]]></title><description><![CDATA[<p dir="auto">所以 PagedAttention 是原因之一，不是选型结论。</p>
]]></description><link>https://localaihub.com/post/1659</link><guid isPermaLink="true">https://localaihub.com/post/1659</guid><dc:creator><![CDATA[小谢]]></dc:creator><pubDate>Mon, 11 May 2026 22:55:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Mon, 11 May 2026 20:30:00 GMT]]></title><description><![CDATA[<p dir="auto">还有数据边界。服务端化以后访问控制要补上。</p>
]]></description><link>https://localaihub.com/post/1658</link><guid isPermaLink="true">https://localaihub.com/post/1658</guid><dc:creator><![CDATA[nora]]></dc:creator><pubDate>Mon, 11 May 2026 20:30:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Mon, 11 May 2026 17:35:00 GMT]]></title><description><![CDATA[<p dir="auto">迁移成本也算。镜像、模型格式、参数、监控、错误处理都要改。</p>
]]></description><link>https://localaihub.com/post/1657</link><guid isPermaLink="true">https://localaihub.com/post/1657</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Mon, 11 May 2026 17:35:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Mon, 11 May 2026 16:33:00 GMT]]></title><description><![CDATA[<p dir="auto">可以评估，但要压测。别听到一个技术名就迁。</p>
]]></description><link>https://localaihub.com/post/1656</link><guid isPermaLink="true">https://localaihub.com/post/1656</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Mon, 11 May 2026 16:33:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Mon, 11 May 2026 16:10:00 GMT]]></title><description><![CDATA[<p dir="auto">我们十来个人共用 4090，就可以考虑吗？</p>
]]></description><link>https://localaihub.com/post/1655</link><guid isPermaLink="true">https://localaihub.com/post/1655</guid><dc:creator><![CDATA[小蓝]]></dc:creator><pubDate>Mon, 11 May 2026 16:10:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Mon, 11 May 2026 15:57:00 GMT]]></title><description><![CDATA[<p dir="auto">多半没必要。一个人本地试模型，Ollama/LM Studio 省心很多。</p>
]]></description><link>https://localaihub.com/post/1654</link><guid isPermaLink="true">https://localaihub.com/post/1654</guid><dc:creator><![CDATA[melo]]></dc:creator><pubDate>Mon, 11 May 2026 15:57:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Mon, 11 May 2026 15:22:00 GMT]]></title><description><![CDATA[<p dir="auto">我只有一个人用，是不是没必要 vLLM？</p>
]]></description><link>https://localaihub.com/post/1653</link><guid isPermaLink="true">https://localaihub.com/post/1653</guid><dc:creator><![CDATA[普通网友A]]></dc:creator><pubDate>Mon, 11 May 2026 15:22:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Mon, 11 May 2026 14:58:00 GMT]]></title><description><![CDATA[<p dir="auto">你要看请求形态：并发多少、上下文多长、模型多大、GPU 什么、是否连续批处理。</p>
]]></description><link>https://localaihub.com/post/1652</link><guid isPermaLink="true">https://localaihub.com/post/1652</guid><dc:creator><![CDATA[阿航]]></dc:creator><pubDate>Mon, 11 May 2026 14:58:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Mon, 11 May 2026 14:44:00 GMT]]></title><description><![CDATA[<p dir="auto">不一定。场景不同。vLLM 更偏服务端并发，Ollama 更适合简单本地使用。</p>
]]></description><link>https://localaihub.com/post/1651</link><guid isPermaLink="true">https://localaihub.com/post/1651</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Mon, 11 May 2026 14:44:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Mon, 11 May 2026 12:18:00 GMT]]></title><description><![CDATA[<p dir="auto">那它是不是意味着 vLLM 一定比 Ollama 快？</p>
]]></description><link>https://localaihub.com/post/1650</link><guid isPermaLink="true">https://localaihub.com/post/1650</guid><dc:creator><![CDATA[小谢]]></dc:creator><pubDate>Mon, 11 May 2026 12:18:00 GMT</pubDate></item><item><title><![CDATA[Reply to PagedAttention 听起来很底层，为什么 vLLM 经常提 on Mon, 11 May 2026 09:20:00 GMT]]></title><description><![CDATA[<p dir="auto">懂它解决 KV cache 内存管理和吞吐问题的大方向就够。不用会实现。</p>
]]></description><link>https://localaihub.com/post/1649</link><guid isPermaLink="true">https://localaihub.com/post/1649</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Mon, 11 May 2026 09:20:00 GMT</pubDate></item></channel></rss>