From 1a36d8119f4f52dcf88affc81962d15cca07b9ea Mon Sep 17 00:00:00 2001
From: Kaxi <1042864399@qq.com>
Date: Sat, 16 May 2026 18:37:56 +0800
Subject: [PATCH] fix: add size-based reading strategy and ban web-search
 fallback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

sites/tacit0924/v1/urls.md: 新增"读取警告"，标注文档 853K 字
  - 禁止 get_content
  - doc.resolve_document_structure 为唯一可行方案
  - 注明超时处理方式（重试一次，不行就停）

sources/tencent-doc/v1/usage.md: 重写为"文档大小分级"
  - 顶部增加策略表（<1万/1万-50万/>50万/未知）
  - 超大文档单独标注禁止 get_content
  - 增加超时处理步骤（重试一次，禁止 web search）

SKILL.md: 优先级规则第 4 条改为"禁止降级到 web search"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 SKILL.md                        |  9 ++++---
 sites/tacit0924/v1/urls.md      | 24 +++++++++---------
 sources/tencent-doc/v1/usage.md | 44 +++++++++++++++++++++++----------
 3 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/SKILL.md b/SKILL.md
index 499e051..989c736 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -79,7 +79,7 @@ media-center/
 1. 找"每日更新" → 优先走 Tencent Doc 路线（Tacit0924 文档）
 2. 找"特定资源" → 先在 Tacit0924 文档搜，找不到再走 PanSou search
 3. 通用搜索 → 走 PanSou search（需配置 PANSOU_URL）
-4. PanSou 不可用 → 降级到 web search 获取信息，或告知用户配置
+4. 某个渠道不可用（MCP 超时 / PanSou 未配）→ 告知用户，**禁止降级到 web search**
 ```
 
 ## 端到端示例
@@ -93,8 +93,11 @@ Step 0: 确定数据源
 
 Step 1: 读取文档搜关键词
   sources/tencent-doc/v1/usage.md
-  → doc.resolve_document_structure → 提取全文
-  → grep "2026.05.16" + "动漫/动画" → 找到分享链接
+  → 先查 urls.md 确认文档大小（853K字，超大文档）
+  → 确定读取策略：doc.resolve_document_structure
+  → 执行读取，如超时则等 3 秒重试一次
+  → 仍超时则告知用户"后端暂不可用"
+  → 提取全文 → grep "2026.05.16" + "动漫/动画" → 找到分享链接
 
 Step 2: 转存到夸克
   storage/quark/v1/usage.md
diff --git a/sites/tacit0924/v1/urls.md b/sites/tacit0924/v1/urls.md
index a848fbe..bce0c02 100644
--- a/sites/tacit0924/v1/urls.md
+++ b/sites/tacit0924/v1/urls.md
@@ -8,10 +8,19 @@
 | URL | https://docs.qq.com/doc/DR2xUcFdrSVhJTkZu?dver= |
 | file_id | `DR2xUcFdrSVhJTkZu` |
 | 文档类型 | tencentdoc（传统文档） |
-| 总字数 | ~853,231 字 / 28,449 段落 |
+| 总字数 | **~853,231 字 / 28,449 段落** |
+| 估算 JSON 体积 | ~8MB（完整结构） |
+| 读取策略 | 见下方"读取警告" |
 | 授权方式 | 只能查看（只读模式） |
 
-> 该文档需要腾讯文档 OAuth 授权才能通过 MCP 读取，参见 `sources/tencent-doc/v1/install.md`
+## 读取警告
+
+> ⚠️ **此文档为超大文档（853K 字）**，读取方式受文档大小影响显著：
+>
+> 1. 禁止使用 `get_content` — 5 秒后端固定超时，必失败
+> 2. `doc.resolve_document_structure` 是唯一可行方案，但返回 ~8MB JSON
+> 3. 后端服务不稳定时，`doc.resolve_document_structure` 也可能超时
+> 4. 超时后重试一次，仍失败则告知用户"后端暂不可用"
 
 ## file_id 提取方法
 
@@ -19,13 +28,4 @@ URL `https://docs.qq.com/doc/DR2xUcFdrSVhJTkZu` 中 `/doc/` 后面的部分即
 
 ## 关联文档
 
-文档中还引用了其他腾讯文档子文档：
-- https://docs.qq.com/doc/DQlVGenVJVE9j...
-- https://docs.qq.com/doc/DQkRORUNmUUN5...
-- https://docs.qq.com/doc/DQmJCdG56a0hEa1Np...
-- https://docs.qq.com/doc/DQmx1WEdTRXpGe...
-- https://docs.qq.com/doc/DQnZaEFjYUpnV...
-- https://docs.qq.com/doc/DQmdjdHp4dUdEWGJp...
-- https://docs.qq.com/doc/DR3ZyaVd1dXNuZW5L...
-
-这些子文档为跳转分流页，主文档包含了全部实际内容。
+文档中还引用了其他腾讯文档子文档，这些子文档为跳转分流页，主文档包含全部实际内容。
diff --git a/sources/tencent-doc/v1/usage.md b/sources/tencent-doc/v1/usage.md
index f62ac3c..07082d9 100644
--- a/sources/tencent-doc/v1/usage.md
+++ b/sources/tencent-doc/v1/usage.md
@@ -1,12 +1,33 @@
 # 腾讯文档 — 使用
 
-## 读取文档内容
+## 文档大小分级读取策略
+
+**读取方法由文档大小决定**，先查数据源信息确认文档大小，再选对应策略：
+
+| 文档大小 | 推荐方法 | 说明 |
+|---------|---------|------|
+| < 1 万字 | `get_content` | 一次性读取全文 |
+| 1万 ~ 50万字 | `doc.resolve_document_structure` | 返回全部节点（含 text_preview） |
+| > 50万字 | `doc.resolve_document_structure` | 超大文档，后端可能超时 |
+| 未知大小 | 先 `doc.get_outline` 探测结构 → 判断大小 | 不要直接试 |
+
+> **已知超大文档**：Tacit0924 资源文档（853K 字），见 `sites/tacit0924/v1/urls.md`。禁止对其使用 `get_content`，5 秒后端超时必失败。
+
+### 超时处理
+
+`doc.resolve_document_structure` 对于 >50 万字的文档可能因后端不稳定而超时：
+
+```
+第一次超时 → 等待 3 秒重试一次 → 仍超时则告知"后端暂不可用"
+```
+
+**禁止降级到 web search**。告知用户稍后重试即可。
 
 ### 从 URL 提取 file_id
 
-URL 格式：`https://docs.qq.com/doc/DR2xUcFdrSVhJTkZu`
+URL 格式 `https://docs.qq.com/doc/DR2xUcFdrSVhJTkZu`，`/doc/` 后面部分即 file_id。
 
-提取 `DR2xUcFdrSVhJTkZu` 部分即为 file_id。
+## 读取步骤
 
 ### 第一步：判断文档类型
 
@@ -17,7 +38,7 @@ mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> size=10
 - 报错 `file is tencentdoc, not smartcanvas` → 传统文档，走第二步 A
 - 返回正常 JSON → smartcanvas 文档，走第二步 B
 
-### 第二步 A：tencentdoc 类型（大文档推荐）
+### 第二步 A：tencentdoc 类型
 
 ```bash
 # 获取完整文档结构
@@ -39,7 +60,7 @@ with open('doc_content.txt','w',encoding='utf-8') as f:
 print(f'Done: {len(texts)} paragraphs')
 "
 
-# 清理中间文件（可选）
+# 清理中间文件
 rm doc_raw.json
 ```
 
@@ -53,19 +74,16 @@ mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> size=50
 mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> next_token=<TOKEN> size=50
 ```
 
-## 搜索关键字获取资源链接
+## 搜索关键字
 
 ```bash
-# 在导出的文本中搜索
 grep -n "关键词" doc_content.txt
 ```
 
-链接格式参考：
-- `[普通链接: https://pan.quark.cn/s/xxx]` — 夸克分享链接
-- `[腾讯文档链接: https://docs.qq.com/doc/...]` — 其他腾讯文档
+链接格式：`[普通链接: https://pan.quark.cn/s/xxx]` → 用中间的 URL。
 
 ## 注意事项
 
-- **超大文档**（>10万字）不要用 `get_content`，必超时
-- **Windows 编码**：带 emoji 的文档必须用 `python -X utf8`
-- **链接格式**：提取出的链接在 `text_preview` 中带 `[普通链接: ...]` 包裹，直接用中间的真实 URL
+- **超大文档（>10万字）禁止 `get_content`**，必超时
+- 后端不稳定时重试一次，不行就告知用户，**不要 web search**
+- Windows 编码：带 emoji 的文档用 `python -X utf8`