media-center/sources/tencent-doc/v1/usage.md

# 腾讯文档 — 使用

## 文档大小分级读取策略

**读取方法由文档大小决定**，先查数据源信息确认文档大小，再选对应策略：

| 文档大小 | 推荐方法 | 说明 |
|---------|---------|------|
| < 1 万字 | `get_content` | 一次性读取全文 |
| 1万 ~ 50万字 | `doc.resolve_document_structure` | 返回全部节点（含 text_preview） |
| > 50万字 | `doc.resolve_document_structure` | 超大文档，后端可能超时 |
| 未知大小 | 先 `doc.get_outline` 探测结构 → 判断大小 | 不要直接试 |

> **已知超大文档**：Tacit0924 资源文档（853K 字），见 `sites/tacit0924/v1/urls.md`。禁止对其使用 `get_content`，5 秒后端超时必失败。

### 超时处理

`doc.resolve_document_structure` 对于 >50 万字的文档可能因后端不稳定而超时：

```
第一次超时 → 等待 3 秒重试一次 → 仍超时则告知"后端暂不可用"
```

**禁止降级到 web search**。告知用户稍后重试即可。

### 从 URL 提取 file_id

URL 格式 `https://docs.qq.com/doc/DR2xUcFdrSVhJTkZu`，`/doc/` 后面部分即 file_id。

## 读取步骤

### 第一步：判断文档类型

```bash
mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> size=10
```

- 报错 `file is tencentdoc, not smartcanvas` → 传统文档，走第二步 A
- 返回正常 JSON → smartcanvas 文档，走第二步 B

### 第二步 A：tencentdoc 类型

```bash
# 获取完整文档结构
mcporter call tencent-docs doc.resolve_document_structure file_id=<FILE_ID> > tmp/doc_raw.json

# 提取纯文本
python -X utf8 -c "
import json
with open('tmp/doc_raw.json','r',encoding='utf-8') as f:
    data=json.load(f)
texts=[]
for n in data.get('nodes',[]):
    p=n.get('text_preview','')
    hl=n.get('heading_level',0)
    if p:
        texts.append(('#'*hl+' '+p) if hl>0 else p)
with open('tmp/doc_content.txt','w',encoding='utf-8') as f:
    f.write('\n'.join(texts))
print(f'Done: {len(texts)} paragraphs')
"

# 清理中间文件
rm tmp/doc_raw.json
```

### 第二步 B：smartcanvas 类型（支持分页）

```bash
# 首次读取
mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> size=50

# 翻页（用上一页返回的 next_token）
mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> next_token=<TOKEN> size=50
```

## 搜索关键字

```bash
grep -n "关键词" doc_content.txt
```

链接格式：`[普通链接: https://pan.quark.cn/s/xxx]` → 用中间的 URL。

## 注意事项

- **超大文档（>10万字）禁止 `get_content`**，必超时
- 后端不稳定时重试一次，不行就告知用户，**不要 web search**
- Windows 编码：带 emoji 的文档用 `python -X utf8`