<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[用强模型裁判评弱模型，会不会偏]]></title><description><![CDATA[<p dir="auto">我们想用 GPT/Claude 当裁判，评 Qwen/DeepSeek 的回答。这样会不会天然偏向自己的风格？</p>
]]></description><link>https://localaihub.com/topic/203/用强模型裁判评弱模型-会不会偏</link><generator>RSS for Node</generator><lastBuildDate>Wed, 03 Jun 2026 19:44:28 GMT</lastBuildDate><atom:link href="https://localaihub.com/topic/203.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 14 May 2026 23:04:00 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 20:30:00 GMT]]></title><description><![CDATA[<p dir="auto">这样比较现实。裁判也是工具，不是法官。</p>
]]></description><link>https://localaihub.com/post/2282</link><guid isPermaLink="true">https://localaihub.com/post/2282</guid><dc:creator><![CDATA[melo]]></dc:creator><pubDate>Fri, 15 May 2026 20:30:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 17:35:00 GMT]]></title><description><![CDATA[<p dir="auto">我们先做模型初筛，人工抽查边界和争议样例。</p>
]]></description><link>https://localaihub.com/post/2281</link><guid isPermaLink="true">https://localaihub.com/post/2281</guid><dc:creator><![CDATA[小谢]]></dc:creator><pubDate>Fri, 15 May 2026 17:35:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 17:03:00 GMT]]></title><description><![CDATA[<p dir="auto">评测报告里要标注哪些分数来自模型裁判，哪些来自人工。</p>
]]></description><link>https://localaihub.com/post/2280</link><guid isPermaLink="true">https://localaihub.com/post/2280</guid><dc:creator><![CDATA[Grace]]></dc:creator><pubDate>Fri, 15 May 2026 17:03:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 14:13:00 GMT]]></title><description><![CDATA[<p dir="auto">可以辅助看风险，但最终还是测试和 review。模型裁判不能替代 CI。</p>
]]></description><link>https://localaihub.com/post/2279</link><guid isPermaLink="true">https://localaihub.com/post/2279</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Fri, 15 May 2026 14:13:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 13:53:00 GMT]]></title><description><![CDATA[<p dir="auto">代码评测可以让裁判看 diff 吗？</p>
]]></description><link>https://localaihub.com/post/2278</link><guid isPermaLink="true">https://localaihub.com/post/2278</guid><dc:creator><![CDATA[小吴]]></dc:creator><pubDate>Fri, 15 May 2026 13:53:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 12:56:00 GMT]]></title><description><![CDATA[<p dir="auto">合规和业务口径别让模型单独判。它不知道你公司真实规则。</p>
]]></description><link>https://localaihub.com/post/2277</link><guid isPermaLink="true">https://localaihub.com/post/2277</guid><dc:creator><![CDATA[nora]]></dc:creator><pubDate>Fri, 15 May 2026 12:56:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 12:09:00 GMT]]></title><description><![CDATA[<p dir="auto">有帮助，但成本更高，也不保证对。最好还有人类校准集。</p>
]]></description><link>https://localaihub.com/post/2276</link><guid isPermaLink="true">https://localaihub.com/post/2276</guid><dc:creator><![CDATA[阿航]]></dc:creator><pubDate>Fri, 15 May 2026 12:09:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 11:35:00 GMT]]></title><description><![CDATA[<p dir="auto">多个裁判投票呢？</p>
]]></description><link>https://localaihub.com/post/2275</link><guid isPermaLink="true">https://localaihub.com/post/2275</guid><dc:creator><![CDATA[普通网友A]]></dc:creator><pubDate>Fri, 15 May 2026 11:35:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 09:34:00 GMT]]></title><description><![CDATA[<p dir="auto">我们遇到过裁判更喜欢啰嗦答案，短但正确的反而低分。</p>
]]></description><link>https://localaihub.com/post/2274</link><guid isPermaLink="true">https://localaihub.com/post/2274</guid><dc:creator><![CDATA[半截薯条]]></dc:creator><pubDate>Fri, 15 May 2026 09:34:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 08:20:00 GMT]]></title><description><![CDATA[<p dir="auto">裁判提示词要固定，评分维度要清楚。不要让它凭“整体质量”打分。</p>
]]></description><link>https://localaihub.com/post/2273</link><guid isPermaLink="true">https://localaihub.com/post/2273</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Fri, 15 May 2026 08:20:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 07:02:00 GMT]]></title><description><![CDATA[<p dir="auto">用，但要控制。标准答案明确的题可以自动判一部分，开放题要人工抽查。</p>
]]></description><link>https://localaihub.com/post/2272</link><guid isPermaLink="true">https://localaihub.com/post/2272</guid><dc:creator><![CDATA[melo]]></dc:creator><pubDate>Fri, 15 May 2026 07:02:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 04:56:00 GMT]]></title><description><![CDATA[<p dir="auto">那还用不用？</p>
]]></description><link>https://localaihub.com/post/2271</link><guid isPermaLink="true">https://localaihub.com/post/2271</guid><dc:creator><![CDATA[小谢]]></dc:creator><pubDate>Fri, 15 May 2026 04:56:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 03:18:00 GMT]]></title><description><![CDATA[<p dir="auto">尤其是风格题。裁判可能偏好更长、更像英文论文的答案，不一定适合中文用户。</p>
]]></description><link>https://localaihub.com/post/2270</link><guid isPermaLink="true">https://localaihub.com/post/2270</guid><dc:creator><![CDATA[Grace]]></dc:creator><pubDate>Fri, 15 May 2026 03:18:00 GMT</pubDate></item><item><title><![CDATA[Reply to 用强模型裁判评弱模型，会不会偏 on Fri, 15 May 2026 02:11:00 GMT]]></title><description><![CDATA[<p dir="auto">会有这个风险。LLM-as-judge 可以省人力，但不能当绝对真理。</p>
]]></description><link>https://localaihub.com/post/2269</link><guid isPermaLink="true">https://localaihub.com/post/2269</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Fri, 15 May 2026 02:11:00 GMT</pubDate></item></channel></rss>