<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[RAG 切块不是越碎越安全吗？]]></title><description><![CDATA[<p dir="auto">我们把制度文档切到 300 字一个块，召回数量上去了，但回答反而更飘。是不是 overlap 还不够？</p>
]]></description><link>https://localaihub.com/topic/53/rag-切块不是越碎越安全吗</link><generator>RSS for Node</generator><lastBuildDate>Wed, 03 Jun 2026 17:50:17 GMT</lastBuildDate><atom:link href="https://localaihub.com/topic/53.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 02 May 2026 11:03:00 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sun, 03 May 2026 13:54:00 GMT]]></title><description><![CDATA[<p dir="auto">这个结果可以沉淀。记得把“块命中但答案缺证据”的样例也留着，后面调 rerank 用得上。</p>
]]></description><link>https://localaihub.com/post/78</link><guid isPermaLink="true">https://localaihub.com/post/78</guid><dc:creator><![CDATA[nora]]></dc:creator><pubDate>Sun, 03 May 2026 13:54:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sun, 03 May 2026 11:49:00 GMT]]></title><description><![CDATA[<p dir="auto">隔天补：制度块调到 800-1200 字，FAQ 还是 300-500 字，误答少了。不是最终方案，但方向对。</p>
]]></description><link>https://localaihub.com/post/77</link><guid isPermaLink="true">https://localaihub.com/post/77</guid><dc:creator><![CDATA[小周]]></dc:creator><pubDate>Sun, 03 May 2026 11:49:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sun, 03 May 2026 10:39:00 GMT]]></title><description><![CDATA[<p dir="auto">我们现在所有文档一套策略，怪不得。明天我按文档类型拆一版测试。</p>
]]></description><link>https://localaihub.com/post/76</link><guid isPermaLink="true">https://localaihub.com/post/76</guid><dc:creator><![CDATA[小周]]></dc:creator><pubDate>Sun, 03 May 2026 10:39:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sun, 03 May 2026 08:24:00 GMT]]></title><description><![CDATA[<p dir="auto">所以切块不是全局参数，是文档类型参数。制度、FAQ、表格说明、会议纪要要分开策略。</p>
]]></description><link>https://localaihub.com/post/75</link><guid isPermaLink="true">https://localaihub.com/post/75</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Sun, 03 May 2026 08:24:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sun, 03 May 2026 05:54:00 GMT]]></title><description><![CDATA[<p dir="auto">我补个反例。客服知识库短问答，如果块太大，模型会把相邻问题混在一起答。</p>
]]></description><link>https://localaihub.com/post/74</link><guid isPermaLink="true">https://localaihub.com/post/74</guid><dc:creator><![CDATA[青菜]]></dc:creator><pubDate>Sun, 03 May 2026 05:54:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sun, 03 May 2026 04:05:00 GMT]]></title><description><![CDATA[<p dir="auto">试过，适合文档结构稳定的场景。坏处是引用要处理好，不然引用显示大块，用户找不到原句。</p>
]]></description><link>https://localaihub.com/post/73</link><guid isPermaLink="true">https://localaihub.com/post/73</guid><dc:creator><![CDATA[小路灯]]></dc:creator><pubDate>Sun, 03 May 2026 04:05:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sun, 03 May 2026 01:49:00 GMT]]></title><description><![CDATA[<p dir="auto">有人试过 parent-child chunk 吗？小块召回，大块喂模型。</p>
]]></description><link>https://localaihub.com/post/72</link><guid isPermaLink="true">https://localaihub.com/post/72</guid><dc:creator><![CDATA[小吴]]></dc:creator><pubDate>Sun, 03 May 2026 01:49:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sun, 03 May 2026 01:41:00 GMT]]></title><description><![CDATA[<p dir="auto">我这边是按标题先切，再用长度兜底。标题断了就不切，宁愿块大一点。</p>
]]></description><link>https://localaihub.com/post/71</link><guid isPermaLink="true">https://localaihub.com/post/71</guid><dc:creator><![CDATA[不想写周报]]></dc:creator><pubDate>Sun, 03 May 2026 01:41:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sun, 03 May 2026 00:17:00 GMT]]></title><description><![CDATA[<p dir="auto">可以先做一个小测试集。不要看向量库返回分数，看“答案需要的证据是否同时回来”。</p>
]]></description><link>https://localaihub.com/post/70</link><guid isPermaLink="true">https://localaihub.com/post/70</guid><dc:creator><![CDATA[阿航]]></dc:creator><pubDate>Sun, 03 May 2026 00:17:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sat, 02 May 2026 21:34:00 GMT]]></title><description><![CDATA[<p dir="auto">overlap 开太大还有副作用，top_k 里全是同一段的近邻，rerank 前就挤掉别的证据了。</p>
]]></description><link>https://localaihub.com/post/69</link><guid isPermaLink="true">https://localaihub.com/post/69</guid><dc:creator><![CDATA[小潘同学]]></dc:creator><pubDate>Sat, 02 May 2026 21:34:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sat, 02 May 2026 19:04:00 GMT]]></title><description><![CDATA[<p dir="auto">300 字对政策类可能太短。代码注释、FAQ 可以短，制度和合同至少让一个完整条款在同一块里。</p>
]]></description><link>https://localaihub.com/post/68</link><guid isPermaLink="true">https://localaihub.com/post/68</guid><dc:creator><![CDATA[rootless]]></dc:creator><pubDate>Sat, 02 May 2026 19:04:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sat, 02 May 2026 16:47:00 GMT]]></title><description><![CDATA[<p dir="auto">我以前也以为越碎越准，后来发现“考勤补卡”这种内容，前半段是条件，后半段是例外，拆开就完了。</p>
]]></description><link>https://localaihub.com/post/67</link><guid isPermaLink="true">https://localaihub.com/post/67</guid><dc:creator><![CDATA[米饭]]></dc:creator><pubDate>Sat, 02 May 2026 16:47:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sat, 02 May 2026 14:57:00 GMT]]></title><description><![CDATA[<p dir="auto">你们有没有把标题层级带进去？比如一级标题、二级标题、文件名。只靠正文切块会很吃亏。</p>
]]></description><link>https://localaihub.com/post/66</link><guid isPermaLink="true">https://localaihub.com/post/66</guid><dc:creator><![CDATA[nora]]></dc:creator><pubDate>Sat, 02 May 2026 14:57:00 GMT</pubDate></item><item><title><![CDATA[Reply to RAG 切块不是越碎越安全吗？ on Sat, 02 May 2026 13:50:00 GMT]]></title><description><![CDATA[<p dir="auto">不一定是 overlap。太碎以后上下文关系断掉，模型拿到的是一堆碎片，不知道哪句属于哪个章节。</p>
]]></description><link>https://localaihub.com/post/65</link><guid isPermaLink="true">https://localaihub.com/post/65</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Sat, 02 May 2026 13:50:00 GMT</pubDate></item></channel></rss>