<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Speculative Decoding 听起来很美，应用层要关心吗]]></title><description><![CDATA[<p dir="auto">推理优化里经常看到 speculative decoding。做应用层的人要不要关心，还是交给 vLLM/SGLang 这种后端？</p>
]]></description><link>https://localaihub.com/topic/148/speculative-decoding-听起来很美-应用层要关心吗</link><generator>RSS for Node</generator><lastBuildDate>Wed, 03 Jun 2026 19:16:28 GMT</lastBuildDate><atom:link href="https://localaihub.com/topic/148.rss" rel="self" type="application/rss+xml"/><pubDate>Sun, 10 May 2026 10:08:00 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Mon, 11 May 2026 07:14:00 GMT]]></title><description><![CDATA[<p dir="auto">老板问就说：我们先量血压，再决定吃什么药。</p>
]]></description><link>https://localaihub.com/post/1506</link><guid isPermaLink="true">https://localaihub.com/post/1506</guid><dc:creator><![CDATA[半截薯条]]></dc:creator><pubDate>Mon, 11 May 2026 07:14:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Mon, 11 May 2026 05:01:00 GMT]]></title><description><![CDATA[<p dir="auto">懂了。技术方案可以研究，卖点先别写。</p>
]]></description><link>https://localaihub.com/post/1505</link><guid isPermaLink="true">https://localaihub.com/post/1505</guid><dc:creator><![CDATA[小曹]]></dc:creator><pubDate>Mon, 11 May 2026 05:01:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Mon, 11 May 2026 04:48:00 GMT]]></title><description><![CDATA[<p dir="auto">推理优化很重要，但产品承诺要落到用户感知指标。</p>
]]></description><link>https://localaihub.com/post/1504</link><guid isPermaLink="true">https://localaihub.com/post/1504</guid><dc:creator><![CDATA[nora]]></dc:creator><pubDate>Mon, 11 May 2026 04:48:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Mon, 11 May 2026 01:52:00 GMT]]></title><description><![CDATA[<p dir="auto">那先别谈高级优化。先把耗时拆开。</p>
]]></description><link>https://localaihub.com/post/1503</link><guid isPermaLink="true">https://localaihub.com/post/1503</guid><dc:creator><![CDATA[小吴]]></dc:creator><pubDate>Mon, 11 May 2026 01:52:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Mon, 11 May 2026 00:13:00 GMT]]></title><description><![CDATA[<p dir="auto">我们现在只有总耗时。</p>
]]></description><link>https://localaihub.com/post/1502</link><guid isPermaLink="true">https://localaihub.com/post/1502</guid><dc:creator><![CDATA[小曹]]></dc:creator><pubDate>Mon, 11 May 2026 00:13:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Sun, 10 May 2026 21:43:00 GMT]]></title><description><![CDATA[<p dir="auto">应用层应该记录分段耗时：检索、重排、模型排队、首 token、生成、工具调用。不然不知道优化哪。</p>
]]></description><link>https://localaihub.com/post/1501</link><guid isPermaLink="true">https://localaihub.com/post/1501</guid><dc:creator><![CDATA[Grace]]></dc:creator><pubDate>Sun, 10 May 2026 21:43:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Sun, 10 May 2026 20:40:00 GMT]]></title><description><![CDATA[<p dir="auto">我之前就是被首 token 坑了。总 TPS 高，用户还是觉得慢，因为前面等太久。</p>
]]></description><link>https://localaihub.com/post/1500</link><guid isPermaLink="true">https://localaihub.com/post/1500</guid><dc:creator><![CDATA[小蓝]]></dc:creator><pubDate>Sun, 10 May 2026 20:40:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Sun, 10 May 2026 18:52:00 GMT]]></title><description><![CDATA[<p dir="auto">不一定。很多优化主要影响后续 token 生成。首 token 还受排队、prefill、检索和网络影响。</p>
]]></description><link>https://localaihub.com/post/1499</link><guid isPermaLink="true">https://localaihub.com/post/1499</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Sun, 10 May 2026 18:52:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Sun, 10 May 2026 18:40:00 GMT]]></title><description><![CDATA[<p dir="auto">首 token 会变快吗？</p>
]]></description><link>https://localaihub.com/post/1498</link><guid isPermaLink="true">https://localaihub.com/post/1498</guid><dc:creator><![CDATA[普通网友A]]></dc:creator><pubDate>Sun, 10 May 2026 18:40:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Sun, 10 May 2026 16:45:00 GMT]]></title><description><![CDATA[<p dir="auto">而且有些场景瓶颈不在生成。RAG 检索慢、rerank 慢、工具接口慢，开 speculative 也救不了。</p>
]]></description><link>https://localaihub.com/post/1497</link><guid isPermaLink="true">https://localaihub.com/post/1497</guid><dc:creator><![CDATA[阿航]]></dc:creator><pubDate>Sun, 10 May 2026 16:45:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Sun, 10 May 2026 16:36:00 GMT]]></title><description><![CDATA[<p dir="auto">别。用户关心首 token、总耗时、稳定性。技术名词留在内部方案里。</p>
]]></description><link>https://localaihub.com/post/1496</link><guid isPermaLink="true">https://localaihub.com/post/1496</guid><dc:creator><![CDATA[melo]]></dc:creator><pubDate>Sun, 10 May 2026 16:36:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Sun, 10 May 2026 14:03:00 GMT]]></title><description><![CDATA[<p dir="auto">那我们产品要不要把它作为卖点？</p>
]]></description><link>https://localaihub.com/post/1495</link><guid isPermaLink="true">https://localaihub.com/post/1495</guid><dc:creator><![CDATA[小曹]]></dc:creator><pubDate>Sun, 10 May 2026 14:03:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Sun, 10 May 2026 12:51:00 GMT]]></title><description><![CDATA[<p dir="auto">它大概是用小模型先猜，大模型验证，目标是加速生成。具体收益看模型、任务、后端实现。</p>
]]></description><link>https://localaihub.com/post/1494</link><guid isPermaLink="true">https://localaihub.com/post/1494</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Sun, 10 May 2026 12:51:00 GMT</pubDate></item><item><title><![CDATA[Reply to Speculative Decoding 听起来很美，应用层要关心吗 on Sun, 10 May 2026 12:26:00 GMT]]></title><description><![CDATA[<p dir="auto">应用层不用自己实现，但要知道它适合解决什么问题。别以为开了就所有请求都快。</p>
]]></description><link>https://localaihub.com/post/1493</link><guid isPermaLink="true">https://localaihub.com/post/1493</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Sun, 10 May 2026 12:26:00 GMT</pubDate></item></channel></rss>