<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[模型评测里“代码能力”到底测什么？]]></title><description><![CDATA[<p dir="auto">我看很多模型都说代码能力强。生产里评测代码能力，除了算法题还要测什么？</p>
]]></description><link>https://localaihub.com/topic/96/模型评测里-代码能力-到底测什么</link><generator>RSS for Node</generator><lastBuildDate>Wed, 03 Jun 2026 17:50:10 GMT</lastBuildDate><atom:link href="https://localaihub.com/topic/96.rss" rel="self" type="application/rss+xml"/><pubDate>Wed, 06 May 2026 02:59:00 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Thu, 07 May 2026 07:41:00 GMT]]></title><description><![CDATA[<p dir="auto">这样选出来的模型才像能进团队，不是只会刷题。</p>
]]></description><link>https://localaihub.com/post/723</link><guid isPermaLink="true">https://localaihub.com/post/723</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Thu, 07 May 2026 07:41:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Thu, 07 May 2026 07:09:00 GMT]]></title><description><![CDATA[<p dir="auto">我会用历史 issue 回放，指标加测试通过、diff 大小、风格一致、是否泄露敏感。</p>
]]></description><link>https://localaihub.com/post/722</link><guid isPermaLink="true">https://localaihub.com/post/722</guid><dc:creator><![CDATA[oneMoreTry]]></dc:creator><pubDate>Thu, 07 May 2026 07:09:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Thu, 07 May 2026 04:48:00 GMT]]></title><description><![CDATA[<p dir="auto">安全也要测。看到 <code>.env</code>、密钥、用户数据时，它会不会乱打印。</p>
]]></description><link>https://localaihub.com/post/721</link><guid isPermaLink="true">https://localaihub.com/post/721</guid><dc:creator><![CDATA[nora]]></dc:creator><pubDate>Thu, 07 May 2026 04:48:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Thu, 07 May 2026 02:41:00 GMT]]></title><description><![CDATA[<p dir="auto">记录人工返工时间。一个模型 70% 一次过，另一个 80% 但每次改一堆，后者未必好。</p>
]]></description><link>https://localaihub.com/post/720</link><guid isPermaLink="true">https://localaihub.com/post/720</guid><dc:creator><![CDATA[leaf_1997]]></dc:creator><pubDate>Thu, 07 May 2026 02:41:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Thu, 07 May 2026 01:16:00 GMT]]></title><description><![CDATA[<p dir="auto">从 git 历史里找真实修复 commit，回到修复前，让模型重做。这个比手写题真实。</p>
]]></description><link>https://localaihub.com/post/719</link><guid isPermaLink="true">https://localaihub.com/post/719</guid><dc:creator><![CDATA[小陈在改bug]]></dc:creator><pubDate>Thu, 07 May 2026 01:16:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Wed, 06 May 2026 22:47:00 GMT]]></title><description><![CDATA[<p dir="auto">我们没有很多历史 bug，怎么造评测？</p>
]]></description><link>https://localaihub.com/post/718</link><guid isPermaLink="true">https://localaihub.com/post/718</guid><dc:creator><![CDATA[阿树]]></dc:creator><pubDate>Wed, 06 May 2026 22:47:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Wed, 06 May 2026 20:30:00 GMT]]></title><description><![CDATA[<p dir="auto">看用途。自动改代码，测试和 review 更重要；教学场景，解释错就是严重问题。</p>
]]></description><link>https://localaihub.com/post/717</link><guid isPermaLink="true">https://localaihub.com/post/717</guid><dc:creator><![CDATA[zeroOne]]></dc:creator><pubDate>Wed, 06 May 2026 20:30:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Wed, 06 May 2026 17:58:00 GMT]]></title><description><![CDATA[<p dir="auto">如果模型写对了，但解释错了，算过吗？</p>
]]></description><link>https://localaihub.com/post/716</link><guid isPermaLink="true">https://localaihub.com/post/716</guid><dc:creator><![CDATA[普通网友A]]></dc:creator><pubDate>Wed, 06 May 2026 17:58:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Wed, 06 May 2026 15:59:00 GMT]]></title><description><![CDATA[<p dir="auto">工具调用稳定性也算代码能力。能不能正确读文件、写 patch、处理测试输出，不只是生成代码。</p>
]]></description><link>https://localaihub.com/post/715</link><guid isPermaLink="true">https://localaihub.com/post/715</guid><dc:creator><![CDATA[rootless]]></dc:creator><pubDate>Wed, 06 May 2026 15:59:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Wed, 06 May 2026 12:58:00 GMT]]></title><description><![CDATA[<p dir="auto">中文代码场景也有差异。注释、README、业务字段是中文时，有些模型理解更快。</p>
]]></description><link>https://localaihub.com/post/714</link><guid isPermaLink="true">https://localaihub.com/post/714</guid><dc:creator><![CDATA[小高]]></dc:creator><pubDate>Wed, 06 May 2026 12:58:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Wed, 06 May 2026 10:57:00 GMT]]></title><description><![CDATA[<p dir="auto">DeepSeek 这类推理模型适合疑难分析，但最终 patch 要看是否贴合代码风格。Qwen/Claude/GPT 都一样，要跑真实任务。</p>
]]></description><link>https://localaihub.com/post/713</link><guid isPermaLink="true">https://localaihub.com/post/713</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Wed, 06 May 2026 10:57:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Wed, 06 May 2026 09:03:00 GMT]]></title><description><![CDATA[<p dir="auto">还要测“不会过度重构”。模型很容易把一个小 bug 改成架构升级。</p>
]]></description><link>https://localaihub.com/post/712</link><guid isPermaLink="true">https://localaihub.com/post/712</guid><dc:creator><![CDATA[mxm]]></dc:creator><pubDate>Wed, 06 May 2026 09:03:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Wed, 06 May 2026 07:24:00 GMT]]></title><description><![CDATA[<p dir="auto">至少四类：读懂现有结构、定位 bug、最小改动、跑测试并根据失败修复。</p>
]]></description><link>https://localaihub.com/post/711</link><guid isPermaLink="true">https://localaihub.com/post/711</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Wed, 06 May 2026 07:24:00 GMT</pubDate></item><item><title><![CDATA[Reply to 模型评测里“代码能力”到底测什么？ on Wed, 06 May 2026 05:54:00 GMT]]></title><description><![CDATA[<p dir="auto">测能不能改旧代码。算法题像面试，生产任务像接手同事留下的项目。</p>
]]></description><link>https://localaihub.com/post/710</link><guid isPermaLink="true">https://localaihub.com/post/710</guid><dc:creator><![CDATA[小陈在改bug]]></dc:creator><pubDate>Wed, 06 May 2026 05:54:00 GMT</pubDate></item></channel></rss>