<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[KV cache 是不是只和模型有关，应用不用管]]></title><description><![CDATA[<p dir="auto">KV cache 这个东西应用层要关心吗？感觉是推理框架内部的事。</p>
]]></description><link>https://localaihub.com/topic/158/kv-cache-是不是只和模型有关-应用不用管</link><generator>RSS for Node</generator><lastBuildDate>Wed, 03 Jun 2026 18:50:42 GMT</lastBuildDate><atom:link href="https://localaihub.com/topic/158.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 11 May 2026 05:58:00 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Tue, 12 May 2026 01:55:00 GMT]]></title><description><![CDATA[<p dir="auto">这句可以贴在需求评审里。</p>
]]></description><link>https://localaihub.com/post/1647</link><guid isPermaLink="true">https://localaihub.com/post/1647</guid><dc:creator><![CDATA[melo]]></dc:creator><pubDate>Tue, 12 May 2026 01:55:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Tue, 12 May 2026 00:26:00 GMT]]></title><description><![CDATA[<p dir="auto">对。聊天框无限历史是产品爽，推理服务痛。</p>
]]></description><link>https://localaihub.com/post/1646</link><guid isPermaLink="true">https://localaihub.com/post/1646</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Tue, 12 May 2026 00:26:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Mon, 11 May 2026 21:43:00 GMT]]></title><description><![CDATA[<p dir="auto">原来它和产品设计也有关。</p>
]]></description><link>https://localaihub.com/post/1645</link><guid isPermaLink="true">https://localaihub.com/post/1645</guid><dc:creator><![CDATA[abc_1024]]></dc:creator><pubDate>Mon, 11 May 2026 21:43:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Mon, 11 May 2026 19:20:00 GMT]]></title><description><![CDATA[<p dir="auto">还要考虑隐私。缓存复用不能跨用户泄露上下文。</p>
]]></description><link>https://localaihub.com/post/1644</link><guid isPermaLink="true">https://localaihub.com/post/1644</guid><dc:creator><![CDATA[nora]]></dc:creator><pubDate>Mon, 11 May 2026 19:20:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Mon, 11 May 2026 16:22:00 GMT]]></title><description><![CDATA[<p dir="auto">生产上先做限制：单会话最大上下文、最大输出、超时、并发队列。</p>
]]></description><link>https://localaihub.com/post/1643</link><guid isPermaLink="true">https://localaihub.com/post/1643</guid><dc:creator><![CDATA[小吴]]></dc:creator><pubDate>Mon, 11 May 2026 16:22:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Mon, 11 May 2026 15:49:00 GMT]]></title><description><![CDATA[<p dir="auto">有些框架有缓存和复用思路，但别指望落盘解决所有问题。延迟和命中条件都要看。</p>
]]></description><link>https://localaihub.com/post/1642</link><guid isPermaLink="true">https://localaihub.com/post/1642</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Mon, 11 May 2026 15:49:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Mon, 11 May 2026 13:47:00 GMT]]></title><description><![CDATA[<p dir="auto">KV cache 能不能落盘？</p>
]]></description><link>https://localaihub.com/post/1641</link><guid isPermaLink="true">https://localaihub.com/post/1641</guid><dc:creator><![CDATA[普通网友A]]></dc:creator><pubDate>Mon, 11 May 2026 13:47:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Mon, 11 May 2026 11:43:00 GMT]]></title><description><![CDATA[<p dir="auto">这就是应用层问题，不是模型突然变笨。</p>
]]></description><link>https://localaihub.com/post/1640</link><guid isPermaLink="true">https://localaihub.com/post/1640</guid><dc:creator><![CDATA[阿航]]></dc:creator><pubDate>Mon, 11 May 2026 11:43:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Mon, 11 May 2026 09:49:00 GMT]]></title><description><![CDATA[<p dir="auto">我们遇到过聊十几轮越来越慢，后来发现前端一直带全量历史。</p>
]]></description><link>https://localaihub.com/post/1639</link><guid isPermaLink="true">https://localaihub.com/post/1639</guid><dc:creator><![CDATA[小蓝]]></dc:creator><pubDate>Mon, 11 May 2026 09:49:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Mon, 11 May 2026 09:07:00 GMT]]></title><description><![CDATA[<p dir="auto">截短是手段，不是目标。要保留当前任务需要的状态，删掉重复闲聊和过期信息。</p>
]]></description><link>https://localaihub.com/post/1638</link><guid isPermaLink="true">https://localaihub.com/post/1638</guid><dc:creator><![CDATA[melo]]></dc:creator><pubDate>Mon, 11 May 2026 09:07:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Mon, 11 May 2026 08:32:00 GMT]]></title><description><![CDATA[<p dir="auto">那我把历史都截短就行？</p>
]]></description><link>https://localaihub.com/post/1637</link><guid isPermaLink="true">https://localaihub.com/post/1637</guid><dc:creator><![CDATA[abc_1024]]></dc:creator><pubDate>Mon, 11 May 2026 08:32:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Mon, 11 May 2026 07:47:00 GMT]]></title><description><![CDATA[<p dir="auto">应用层决定历史消息塞多少、RAG 片段塞多少、并发怎么排队，所以当然间接影响。</p>
]]></description><link>https://localaihub.com/post/1636</link><guid isPermaLink="true">https://localaihub.com/post/1636</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Mon, 11 May 2026 07:47:00 GMT</pubDate></item><item><title><![CDATA[Reply to KV cache 是不是只和模型有关，应用不用管 on Mon, 11 May 2026 07:00:00 GMT]]></title><description><![CDATA[<p dir="auto">不用实现，但要关心后果。上下文越长、并发越高，KV cache 压力越大。</p>
]]></description><link>https://localaihub.com/post/1635</link><guid isPermaLink="true">https://localaihub.com/post/1635</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Mon, 11 May 2026 07:00:00 GMT</pubDate></item></channel></rss>