<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[A&#x2F;B 测提示词，样本量小到像玄学]]></title><description><![CDATA[<p dir="auto">我们想 A/B 两版提示词，但每天真实流量不大。几十次对话能看出区别吗？</p>
]]></description><link>https://localaihub.com/topic/107/a-b-测提示词-样本量小到像玄学</link><generator>RSS for Node</generator><lastBuildDate>Wed, 03 Jun 2026 19:16:53 GMT</lastBuildDate><atom:link href="https://localaihub.com/topic/107.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 07 May 2026 01:04:00 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 23:21:00 GMT]]></title><description><![CDATA[<p dir="auto">我先做盲评表，不拿小样本装大结论。</p>
]]></description><link>https://localaihub.com/post/890</link><guid isPermaLink="true">https://localaihub.com/post/890</guid><dc:creator><![CDATA[小潘同学]]></dc:creator><pubDate>Thu, 07 May 2026 23:21:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 23:13:00 GMT]]></title><description><![CDATA[<p dir="auto">对。总分可以有，但上线决策要看关键维度有没有掉。</p>
]]></description><link>https://localaihub.com/post/889</link><guid isPermaLink="true">https://localaihub.com/post/889</guid><dc:creator><![CDATA[melo]]></dc:creator><pubDate>Thu, 07 May 2026 23:13:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 22:12:00 GMT]]></title><description><![CDATA[<p dir="auto">所以不是一个总分解决所有问题。</p>
]]></description><link>https://localaihub.com/post/888</link><guid isPermaLink="true">https://localaihub.com/post/888</guid><dc:creator><![CDATA[小潘同学]]></dc:creator><pubDate>Thu, 07 May 2026 22:12:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 21:10:00 GMT]]></title><description><![CDATA[<p dir="auto">看场景。客服可能更看可执行和不误导；知识库更看准确和引用。</p>
]]></description><link>https://localaihub.com/post/887</link><guid isPermaLink="true">https://localaihub.com/post/887</guid><dc:creator><![CDATA[Grace]]></dc:creator><pubDate>Thu, 07 May 2026 21:10:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 18:25:00 GMT]]></title><description><![CDATA[<p dir="auto">有时候 A 更准确但啰嗦，B 更短但漏条件，怎么选？</p>
]]></description><link>https://localaihub.com/post/886</link><guid isPermaLink="true">https://localaihub.com/post/886</guid><dc:creator><![CDATA[小蓝]]></dc:creator><pubDate>Thu, 07 May 2026 18:25:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 15:38:00 GMT]]></title><description><![CDATA[<p dir="auto">还要定义评价维度：准确、完整、简洁、合规、可执行。别让评审凭喜好。</p>
]]></description><link>https://localaihub.com/post/885</link><guid isPermaLink="true">https://localaihub.com/post/885</guid><dc:creator><![CDATA[阿航]]></dc:creator><pubDate>Thu, 07 May 2026 15:38:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 14:37:00 GMT]]></title><description><![CDATA[<p dir="auto">盲评。隐藏版本，让评审只看问题和两个回答。</p>
]]></description><link>https://localaihub.com/post/884</link><guid isPermaLink="true">https://localaihub.com/post/884</guid><dc:creator><![CDATA[nora]]></dc:creator><pubDate>Thu, 07 May 2026 14:37:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 14:20:00 GMT]]></title><description><![CDATA[<p dir="auto">人工评价怎么防偏？</p>
]]></description><link>https://localaihub.com/post/883</link><guid isPermaLink="true">https://localaihub.com/post/883</guid><dc:creator><![CDATA[普通网友A]]></dc:creator><pubDate>Thu, 07 May 2026 14:20:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 12:03:00 GMT]]></title><description><![CDATA[<p dir="auto">样例要分层。简单、高频、边界、投诉、长上下文分别看。</p>
]]></description><link>https://localaihub.com/post/882</link><guid isPermaLink="true">https://localaihub.com/post/882</guid><dc:creator><![CDATA[小吴]]></dc:creator><pubDate>Thu, 07 May 2026 12:03:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 09:13:00 GMT]]></title><description><![CDATA[<p dir="auto">我们以前就犯过这个错。上线后发现离线赢的样例都是简单问题。</p>
]]></description><link>https://localaihub.com/post/881</link><guid isPermaLink="true">https://localaihub.com/post/881</guid><dc:creator><![CDATA[半糖]]></dc:creator><pubDate>Thu, 07 May 2026 09:13:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 07:40:00 GMT]]></title><description><![CDATA[<p dir="auto">可以给，但要写清楚置信度低。别把 20 条样例的 55% 胜率说成“新版更优”。</p>
]]></description><link>https://localaihub.com/post/880</link><guid isPermaLink="true">https://localaihub.com/post/880</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Thu, 07 May 2026 07:40:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 04:46:00 GMT]]></title><description><![CDATA[<p dir="auto">业务方想要一个胜率数字。</p>
]]></description><link>https://localaihub.com/post/879</link><guid isPermaLink="true">https://localaihub.com/post/879</guid><dc:creator><![CDATA[小潘同学]]></dc:creator><pubDate>Thu, 07 May 2026 04:46:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 04:29:00 GMT]]></title><description><![CDATA[<p dir="auto">先用历史样例离线跑。线上 A/B 主要看有没有明显事故，不要指望精确比较。</p>
]]></description><link>https://localaihub.com/post/878</link><guid isPermaLink="true">https://localaihub.com/post/878</guid><dc:creator><![CDATA[melo]]></dc:creator><pubDate>Thu, 07 May 2026 04:29:00 GMT</pubDate></item><item><title><![CDATA[Reply to A&#x2F;B 测提示词，样本量小到像玄学 on Thu, 07 May 2026 02:30:00 GMT]]></title><description><![CDATA[<p dir="auto">很难看出稳定结论。小流量更适合离线评测 + 小范围灰度，不要硬做统计显著。</p>
]]></description><link>https://localaihub.com/post/877</link><guid isPermaLink="true">https://localaihub.com/post/877</guid><dc:creator><![CDATA[Grace]]></dc:creator><pubDate>Thu, 07 May 2026 02:30:00 GMT</pubDate></item></channel></rss>