Files

T

sutong 1a36d8119f fix: add size-based reading strategy and ban web-search fallback

sites/tacit0924/v1/urls.md: 新增"读取警告"，标注文档 853K 字
  - 禁止 get_content
  - doc.resolve_document_structure 为唯一可行方案
  - 注明超时处理方式（重试一次，不行就停）

sources/tencent-doc/v1/usage.md: 重写为"文档大小分级"
  - 顶部增加策略表（<1万/1万-50万/>50万/未知）
  - 超大文档单独标注禁止 get_content
  - 增加超时处理步骤（重试一次，禁止 web search）

SKILL.md: 优先级规则第 4 条改为"禁止降级到 web search"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-05-16 18:37:56 +08:00

2.6 KiB

Raw Blame History

腾讯文档 — 使用

文档大小分级读取策略

读取方法由文档大小决定，先查数据源信息确认文档大小，再选对应策略：

文档大小	推荐方法	说明
< 1 万字	`get_content`	一次性读取全文
1万 ~ 50万字	`doc.resolve_document_structure`	返回全部节点（含 text_preview）
> 50万字	`doc.resolve_document_structure`	超大文档，后端可能超时
未知大小	先 `doc.get_outline` 探测结构 → 判断大小	不要直接试

已知超大文档：Tacit0924 资源文档（853K 字），见 sites/tacit0924/v1/urls.md。禁止对其使用 get_content，5 秒后端超时必失败。

超时处理

doc.resolve_document_structure 对于 >50 万字的文档可能因后端不稳定而超时：

第一次超时 → 等待 3 秒重试一次 → 仍超时则告知"后端暂不可用"

禁止降级到 web search。告知用户稍后重试即可。

从 URL 提取 file_id

URL 格式 https://docs.qq.com/doc/DR2xUcFdrSVhJTkZu，/doc/ 后面部分即 file_id。

读取步骤

第一步：判断文档类型

mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> size=10

报错 file is tencentdoc, not smartcanvas → 传统文档，走第二步 A
返回正常 JSON → smartcanvas 文档，走第二步 B

第二步 A：tencentdoc 类型

# 获取完整文档结构
mcporter call tencent-docs doc.resolve_document_structure file_id=<FILE_ID> > doc_raw.json

# 提取纯文本
python -X utf8 -c "
import json
with open('doc_raw.json','r',encoding='utf-8') as f:
    data=json.load(f)
texts=[]
for n in data.get('nodes',[]):
    p=n.get('text_preview','')
    hl=n.get('heading_level',0)
    if p:
        texts.append(('#'*hl+' '+p) if hl>0 else p)
with open('doc_content.txt','w',encoding='utf-8') as f:
    f.write('\n'.join(texts))
print(f'Done: {len(texts)} paragraphs')
"

# 清理中间文件
rm doc_raw.json

第二步 B：smartcanvas 类型（支持分页）

# 首次读取
mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> size=50

# 翻页（用上一页返回的 next_token）
mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> next_token=<TOKEN> size=50

搜索关键字

grep -n "关键词" doc_content.txt

链接格式：[普通链接: https://pan.quark.cn/s/xxx] → 用中间的 URL。

注意事项

超大文档（>10万字）禁止 get_content，必超时
后端不稳定时重试一次，不行就告知用户，不要 web search
Windows 编码：带 emoji 的文档用 python -X utf8

2.6 KiB Raw Blame History Unescape Escape