<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[metadata 到底放多少，放多了会不会拖慢检索？]]></title><description><![CDATA[<p dir="auto">我现在 metadata 里放了 path、title、部门、更新时间、权限组、页码，感觉有点臃肿。大家一般怎么取舍？</p>
]]></description><link>https://localaihub.com/topic/54/metadata-到底放多少-放多了会不会拖慢检索</link><generator>RSS for Node</generator><lastBuildDate>Wed, 03 Jun 2026 19:29:16 GMT</lastBuildDate><atom:link href="https://localaihub.com/topic/54.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 02 May 2026 13:57:00 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sun, 03 May 2026 13:11:00 GMT]]></title><description><![CDATA[<p dir="auto">这个思路对。metadata 是检索和治理的桥，不是垃圾桶。</p>
]]></description><link>https://localaihub.com/post/93</link><guid isPermaLink="true">https://localaihub.com/post/93</guid><dc:creator><![CDATA[阿白]]></dc:creator><pubDate>Sun, 03 May 2026 13:11:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sun, 03 May 2026 11:57:00 GMT]]></title><description><![CDATA[<p dir="auto">收到。我准备把展示字段和过滤字段分开建，不再把 metadata 当万能袋子。</p>
]]></description><link>https://localaihub.com/post/92</link><guid isPermaLink="true">https://localaihub.com/post/92</guid><dc:creator><![CDATA[橘子汽水]]></dc:creator><pubDate>Sun, 03 May 2026 11:57:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sun, 03 May 2026 09:28:00 GMT]]></title><description><![CDATA[<p dir="auto">还有 hash。后面做增量更新，没有 hash 会很痛。</p>
]]></description><link>https://localaihub.com/post/91</link><guid isPermaLink="true">https://localaihub.com/post/91</guid><dc:creator><![CDATA[没起名字]]></dc:creator><pubDate>Sun, 03 May 2026 09:28:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sun, 03 May 2026 07:31:00 GMT]]></title><description><![CDATA[<p dir="auto">我建议最小集：doc_id、chunk_id、title_path、page、section、mtime、acl、source_url。其他先别急。</p>
]]></description><link>https://localaihub.com/post/90</link><guid isPermaLink="true">https://localaihub.com/post/90</guid><dc:creator><![CDATA[米饭]]></dc:creator><pubDate>Sun, 03 May 2026 07:31:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sun, 03 May 2026 05:28:00 GMT]]></title><description><![CDATA[<p dir="auto">能帮忙，但别当唯一来源。抽错一次，后面全链路都会信它。</p>
]]></description><link>https://localaihub.com/post/89</link><guid isPermaLink="true">https://localaihub.com/post/89</guid><dc:creator><![CDATA[林小北]]></dc:creator><pubDate>Sun, 03 May 2026 05:28:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sun, 03 May 2026 02:56:00 GMT]]></title><description><![CDATA[<p dir="auto">metadata extractor 自动抽摘要靠谱吗？</p>
]]></description><link>https://localaihub.com/post/88</link><guid isPermaLink="true">https://localaihub.com/post/88</guid><dc:creator><![CDATA[小周]]></dc:creator><pubDate>Sun, 03 May 2026 02:56:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sun, 03 May 2026 00:11:00 GMT]]></title><description><![CDATA[<p dir="auto">可以把 title 作为单独字段给 reranker 或 prompt，embedding 文本里只放当前标题链和正文，不放整条路径。</p>
]]></description><link>https://localaihub.com/post/87</link><guid isPermaLink="true">https://localaihub.com/post/87</guid><dc:creator><![CDATA[小路灯]]></dc:creator><pubDate>Sun, 03 May 2026 00:11:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sat, 02 May 2026 22:48:00 GMT]]></title><description><![CDATA[<p dir="auto">我遇到过 title 拼太多导致召回偏题。文档标题叫“2024 新版员工手册”，里面很多块都被“员工手册”带偏。</p>
]]></description><link>https://localaihub.com/post/86</link><guid isPermaLink="true">https://localaihub.com/post/86</guid><dc:creator><![CDATA[半糖]]></dc:creator><pubDate>Sat, 02 May 2026 22:48:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sat, 02 May 2026 21:50:00 GMT]]></title><description><![CDATA[<p dir="auto">标题可以拼进去，权限组别拼。用户问“财务制度”，标题是语义线索；权限是访问控制，不是语义。</p>
]]></description><link>https://localaihub.com/post/85</link><guid isPermaLink="true">https://localaihub.com/post/85</guid><dc:creator><![CDATA[nora]]></dc:creator><pubDate>Sat, 02 May 2026 21:50:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sat, 02 May 2026 21:34:00 GMT]]></title><description><![CDATA[<p dir="auto">metadata 放进 embedding 文本吗？比如标题和正文一起 embed。</p>
]]></description><link>https://localaihub.com/post/84</link><guid isPermaLink="true">https://localaihub.com/post/84</guid><dc:creator><![CDATA[赵赵]]></dc:creator><pubDate>Sat, 02 May 2026 21:34:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sat, 02 May 2026 21:17:00 GMT]]></title><description><![CDATA[<p dir="auto">权限组必须有，但别只存在 metadata 里。向量库过滤只是应用层的一环，原文下载也要校验。</p>
]]></description><link>https://localaihub.com/post/83</link><guid isPermaLink="true">https://localaihub.com/post/83</guid><dc:creator><![CDATA[leaf_1997]]></dc:creator><pubDate>Sat, 02 May 2026 21:17:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sat, 02 May 2026 18:47:00 GMT]]></title><description><![CDATA[<p dir="auto">更新时间我觉得也要有，本地知识库最怕旧文件赢了新文件。</p>
]]></description><link>https://localaihub.com/post/82</link><guid isPermaLink="true">https://localaihub.com/post/82</guid><dc:creator><![CDATA[小陈在改bug]]></dc:creator><pubDate>Sat, 02 May 2026 18:47:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sat, 02 May 2026 18:18:00 GMT]]></title><description><![CDATA[<p dir="auto">path 和页码一定要有，不然用户问出处时很难回到原文。</p>
]]></description><link>https://localaihub.com/post/81</link><guid isPermaLink="true">https://localaihub.com/post/81</guid><dc:creator><![CDATA[MingK]]></dc:creator><pubDate>Sat, 02 May 2026 18:18:00 GMT</pubDate></item><item><title><![CDATA[Reply to metadata 到底放多少，放多了会不会拖慢检索？ on Sat, 02 May 2026 15:54:00 GMT]]></title><description><![CDATA[<p dir="auto">先分两类：检索过滤要用的，展示引用要用的。别把“可能以后有用”的都塞进去。</p>
]]></description><link>https://localaihub.com/post/80</link><guid isPermaLink="true">https://localaihub.com/post/80</guid><dc:creator><![CDATA[阿白]]></dc:creator><pubDate>Sat, 02 May 2026 15:54:00 GMT</pubDate></item></channel></rss>