<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[中文知识库 embedding 用 bge-m3 还是 bge-large-zh？]]></title><description><![CDATA[<p dir="auto">中文内部文档，偶尔有英文 API 名。bge-m3 和 bge-large-zh-v1.5 怎么选？</p>
]]></description><link>https://localaihub.com/topic/55/中文知识库-embedding-用-bge-m3-还是-bge-large-zh</link><generator>RSS for Node</generator><lastBuildDate>Wed, 03 Jun 2026 18:50:31 GMT</lastBuildDate><atom:link href="https://localaihub.com/topic/55.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 02 May 2026 14:58:00 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sun, 03 May 2026 16:19:00 GMT]]></title><description><![CDATA[<p dir="auto">最后要保留失败样例。embedding 选择没有银弹，样例比排名表更有价值。</p>
]]></description><link>https://localaihub.com/post/108</link><guid isPermaLink="true">https://localaihub.com/post/108</guid><dc:creator><![CDATA[小林]]></dc:creator><pubDate>Sun, 03 May 2026 16:19:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sun, 03 May 2026 15:27:00 GMT]]></title><description><![CDATA[<p dir="auto">对比时固定切块、top_k、reranker。一次只改一个变量。</p>
]]></description><link>https://localaihub.com/post/107</link><guid isPermaLink="true">https://localaihub.com/post/107</guid><dc:creator><![CDATA[sora_dev]]></dc:creator><pubDate>Sun, 03 May 2026 15:27:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sun, 03 May 2026 12:46:00 GMT]]></title><description><![CDATA[<p dir="auto">我先用现有文档做 bge-large-zh 和 bge-m3 对比，不直接迁。</p>
]]></description><link>https://localaihub.com/post/106</link><guid isPermaLink="true">https://localaihub.com/post/106</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Sun, 03 May 2026 12:46:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sun, 03 May 2026 10:52:00 GMT]]></title><description><![CDATA[<p dir="auto">中英混合还有一个坑：英文缩写和中文解释不要拆开。embedding 模型再好，也救不了断裂上下文。</p>
]]></description><link>https://localaihub.com/post/105</link><guid isPermaLink="true">https://localaihub.com/post/105</guid><dc:creator><![CDATA[小路灯]]></dc:creator><pubDate>Sun, 03 May 2026 10:52:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sun, 03 May 2026 08:55:00 GMT]]></title><description><![CDATA[<p dir="auto">查询文本 embedding 通常很短，真正慢的是批量入库。把 ingest 和 query 服务拆开。</p>
]]></description><link>https://localaihub.com/post/104</link><guid isPermaLink="true">https://localaihub.com/post/104</guid><dc:creator><![CDATA[nora]]></dc:creator><pubDate>Sun, 03 May 2026 08:55:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sun, 03 May 2026 06:17:00 GMT]]></title><description><![CDATA[<p dir="auto">那吞吐要算。也可以先 CPU 批处理，晚上跑。别把用户提问链路卡在 embed 上。</p>
]]></description><link>https://localaihub.com/post/103</link><guid isPermaLink="true">https://localaihub.com/post/103</guid><dc:creator><![CDATA[米饭]]></dc:creator><pubDate>Sun, 03 May 2026 06:17:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sun, 03 May 2026 04:09:00 GMT]]></title><description><![CDATA[<p dir="auto">现在 8 万个 chunk，更新频率一周两次。GPU 没有，只有 M2。</p>
]]></description><link>https://localaihub.com/post/102</link><guid isPermaLink="true">https://localaihub.com/post/102</guid><dc:creator><![CDATA[陈一]]></dc:creator><pubDate>Sun, 03 May 2026 04:09:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sun, 03 May 2026 03:11:00 GMT]]></title><description><![CDATA[<p dir="auto">如果你们要本地部署，推理延迟也要测。embedding 不是离线一次就结束，本地知识库更新会持续跑。</p>
]]></description><link>https://localaihub.com/post/101</link><guid isPermaLink="true">https://localaihub.com/post/101</guid><dc:creator><![CDATA[阿航]]></dc:creator><pubDate>Sun, 03 May 2026 03:11:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sun, 03 May 2026 01:48:00 GMT]]></title><description><![CDATA[<p dir="auto">我见过最离谱的是新模型写入旧 collection，维度不匹配报错以后临时截断向量。这个千万别做。</p>
]]></description><link>https://localaihub.com/post/100</link><guid isPermaLink="true">https://localaihub.com/post/100</guid><dc:creator><![CDATA[rootless]]></dc:creator><pubDate>Sun, 03 May 2026 01:48:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sun, 03 May 2026 00:58:00 GMT]]></title><description><![CDATA[<p dir="auto">维度不同会影响存储和索引成本。别忘了重建向量库，不要混新旧 embedding。</p>
]]></description><link>https://localaihub.com/post/99</link><guid isPermaLink="true">https://localaihub.com/post/99</guid><dc:creator><![CDATA[小唐]]></dc:creator><pubDate>Sun, 03 May 2026 00:58:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sat, 02 May 2026 22:06:00 GMT]]></title><description><![CDATA[<p dir="auto">bge-m3 还有稀疏和多向量能力，但很多项目只用 dense。别以为换上就自动 hybrid。</p>
]]></description><link>https://localaihub.com/post/98</link><guid isPermaLink="true">https://localaihub.com/post/98</guid><dc:creator><![CDATA[no_signal]]></dc:creator><pubDate>Sat, 02 May 2026 22:06:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sat, 02 May 2026 20:15:00 GMT]]></title><description><![CDATA[<p dir="auto">我们从 text2vec 换到 bge-m3，召回明显好一些，但真正提升来自重切块，不全是 embedding 功劳。</p>
]]></description><link>https://localaihub.com/post/97</link><guid isPermaLink="true">https://localaihub.com/post/97</guid><dc:creator><![CDATA[小满满]]></dc:creator><pubDate>Sat, 02 May 2026 20:15:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sat, 02 May 2026 19:05:00 GMT]]></title><description><![CDATA[<p dir="auto">别只看模型卡。拿你们真实问题做 50 条，人工看 top_5，差异很快出来。</p>
]]></description><link>https://localaihub.com/post/96</link><guid isPermaLink="true">https://localaihub.com/post/96</guid><dc:creator><![CDATA[sora_dev]]></dc:creator><pubDate>Sat, 02 May 2026 19:05:00 GMT</pubDate></item><item><title><![CDATA[Reply to 中文知识库 embedding 用 bge-m3 还是 bge-large-zh？ on Sat, 02 May 2026 16:00:00 GMT]]></title><description><![CDATA[<p dir="auto">如果主要中文，bge-large-zh-v1.5 很稳。中英混杂、多语、长文一点的场景我会先试 bge-m3。</p>
]]></description><link>https://localaihub.com/post/95</link><guid isPermaLink="true">https://localaihub.com/post/95</guid><dc:creator><![CDATA[小林]]></dc:creator><pubDate>Sat, 02 May 2026 16:00:00 GMT</pubDate></item></channel></rss>