<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[模型评测不要只看准确率]]></title><description><![CDATA[<p dir="auto">我们内部评模型，大家一直问“准确率多少”。但很多生成任务很难算准确率。</p>
]]></description><link>https://localaihub.com/topic/196/模型评测不要只看准确率</link><generator>RSS for Node</generator><lastBuildDate>Wed, 03 Jun 2026 17:51:03 GMT</lastBuildDate><atom:link href="https://localaihub.com/topic/196.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 14 May 2026 08:57:00 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Fri, 15 May 2026 08:21:00 GMT]]></title><description><![CDATA[<p dir="auto">我们先把评测表从一个分数拆成多维。</p>
]]></description><link>https://localaihub.com/post/2174</link><guid isPermaLink="true">https://localaihub.com/post/2174</guid><dc:creator><![CDATA[小蓝]]></dc:creator><pubDate>Fri, 15 May 2026 08:21:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Fri, 15 May 2026 08:03:00 GMT]]></title><description><![CDATA[<p dir="auto">参考，但不要替代本地评测。公开 benchmark 测的是公共任务，你上线的是自己的坑。</p>
]]></description><link>https://localaihub.com/post/2173</link><guid isPermaLink="true">https://localaihub.com/post/2173</guid><dc:creator><![CDATA[melo]]></dc:creator><pubDate>Fri, 15 May 2026 08:03:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Fri, 15 May 2026 06:41:00 GMT]]></title><description><![CDATA[<p dir="auto">benchmark 要不要参考？</p>
]]></description><link>https://localaihub.com/post/2172</link><guid isPermaLink="true">https://localaihub.com/post/2172</guid><dc:creator><![CDATA[小潘同学]]></dc:creator><pubDate>Fri, 15 May 2026 06:41:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Fri, 15 May 2026 05:04:00 GMT]]></title><description><![CDATA[<p dir="auto">评测报告还要带失败样例。只有分数没有样例，没人知道该改什么。</p>
]]></description><link>https://localaihub.com/post/2171</link><guid isPermaLink="true">https://localaihub.com/post/2171</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Fri, 15 May 2026 05:04:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Fri, 15 May 2026 04:50:00 GMT]]></title><description><![CDATA[<p dir="auto">可以模型裁判加人工抽查，但裁判标准要固定。不能今天喜欢短，明天喜欢详细。</p>
]]></description><link>https://localaihub.com/post/2170</link><guid isPermaLink="true">https://localaihub.com/post/2170</guid><dc:creator><![CDATA[Grace]]></dc:creator><pubDate>Fri, 15 May 2026 04:50:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Fri, 15 May 2026 02:32:00 GMT]]></title><description><![CDATA[<p dir="auto">人工判分成本太高。</p>
]]></description><link>https://localaihub.com/post/2169</link><guid isPermaLink="true">https://localaihub.com/post/2169</guid><dc:creator><![CDATA[小蓝]]></dc:creator><pubDate>Fri, 15 May 2026 02:32:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Fri, 15 May 2026 00:44:00 GMT]]></title><description><![CDATA[<p dir="auto">代码能力也不能只看生成结果。能不能解释测试失败、遵守项目约束、少改无关文件，都要看。</p>
]]></description><link>https://localaihub.com/post/2168</link><guid isPermaLink="true">https://localaihub.com/post/2168</guid><dc:creator><![CDATA[阿航]]></dc:creator><pubDate>Fri, 15 May 2026 00:44:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Thu, 14 May 2026 22:54:00 GMT]]></title><description><![CDATA[<p dir="auto">那不一定能上线。企业场景里“不该答时不答”很重要。</p>
]]></description><link>https://localaihub.com/post/2167</link><guid isPermaLink="true">https://localaihub.com/post/2167</guid><dc:creator><![CDATA[nora]]></dc:creator><pubDate>Thu, 14 May 2026 22:54:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Thu, 14 May 2026 22:12:00 GMT]]></title><description><![CDATA[<p dir="auto">我们有模型 A 平均分高，但在拒答题上很差。</p>
]]></description><link>https://localaihub.com/post/2166</link><guid isPermaLink="true">https://localaihub.com/post/2166</guid><dc:creator><![CDATA[小吴]]></dc:creator><pubDate>Thu, 14 May 2026 22:12:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Thu, 14 May 2026 19:53:00 GMT]]></title><description><![CDATA[<p dir="auto">所以要加权。高风险样例权重大，不然平均分会掩盖事故。</p>
]]></description><link>https://localaihub.com/post/2165</link><guid isPermaLink="true">https://localaihub.com/post/2165</guid><dc:creator><![CDATA[melo]]></dc:creator><pubDate>Thu, 14 May 2026 19:53:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Thu, 14 May 2026 18:15:00 GMT]]></title><description><![CDATA[<p dir="auto">还有“坏答案危害”。同样错一句，闲聊错和财务制度错不是一个级别。</p>
]]></description><link>https://localaihub.com/post/2164</link><guid isPermaLink="true">https://localaihub.com/post/2164</guid><dc:creator><![CDATA[半截薯条]]></dc:creator><pubDate>Thu, 14 May 2026 18:15:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Thu, 14 May 2026 16:18:00 GMT]]></title><description><![CDATA[<p dir="auto">用户纠正它以后能不能改回来。很多模型第一轮错了，后面还坚持错。</p>
]]></description><link>https://localaihub.com/post/2163</link><guid isPermaLink="true">https://localaihub.com/post/2163</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Thu, 14 May 2026 16:18:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Thu, 14 May 2026 14:50:00 GMT]]></title><description><![CDATA[<p dir="auto">可恢复性是什么？</p>
]]></description><link>https://localaihub.com/post/2162</link><guid isPermaLink="true">https://localaihub.com/post/2162</guid><dc:creator><![CDATA[普通网友A]]></dc:creator><pubDate>Thu, 14 May 2026 14:50:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Thu, 14 May 2026 12:17:00 GMT]]></title><description><![CDATA[<p dir="auto">我常用：事实正确、引用正确、遗漏、格式、语气、拒答、成本、延迟、可恢复性。</p>
]]></description><link>https://localaihub.com/post/2161</link><guid isPermaLink="true">https://localaihub.com/post/2161</guid><dc:creator><![CDATA[Grace]]></dc:creator><pubDate>Thu, 14 May 2026 12:17:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测不要只看准确率 on Thu, 14 May 2026 09:22:00 GMT]]></title><description><![CDATA[<p dir="auto">对。准确率只适合一部分任务。问答、摘要、代码、客服，要拆指标。</p>
]]></description><link>https://localaihub.com/post/2160</link><guid isPermaLink="true">https://localaihub.com/post/2160</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Thu, 14 May 2026 09:22:00 GMT</pubDate></item></channel></rss>