media-center/sources/tencent-doc/v1/usage.md

# 腾讯文档 — 使用

## 读取文档内容

### 从 URL 提取 file_id

URL 格式：`https://docs.qq.com/doc/DR2xUcFdrSVhJTkZu`

提取 `DR2xUcFdrSVhJTkZu` 部分即为 file_id。

### 第一步：判断文档类型

```bash
mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> size=10
```

- 报错 `file is tencentdoc, not smartcanvas` → 传统文档，走第二步 A
- 返回正常 JSON → smartcanvas 文档，走第二步 B

### 第二步 A：tencentdoc 类型（大文档推荐）

```bash
# 获取完整文档结构
mcporter call tencent-docs doc.resolve_document_structure file_id=<FILE_ID> > doc_raw.json

# 提取纯文本
python -X utf8 -c "
import json
with open('doc_raw.json','r',encoding='utf-8') as f:
    data=json.load(f)
texts=[]
for n in data.get('nodes',[]):
    p=n.get('text_preview','')
    hl=n.get('heading_level',0)
    if p:
        texts.append(('#'*hl+' '+p) if hl>0 else p)
with open('doc_content.txt','w',encoding='utf-8') as f:
    f.write('\n'.join(texts))
print(f'Done: {len(texts)} paragraphs')
"

# 清理中间文件（可选）
rm doc_raw.json
```

### 第二步 B：smartcanvas 类型（支持分页）

```bash
# 首次读取
mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> size=50

# 翻页（用上一页返回的 next_token）
mcporter call tencent-docs smartcanvas.read file_id=<FILE_ID> next_token=<TOKEN> size=50
```

## 搜索关键字获取资源链接

```bash
# 在导出的文本中搜索
grep -n "关键词" doc_content.txt
```

链接格式参考：
- `[普通链接: https://pan.quark.cn/s/xxx]` — 夸克分享链接
- `[腾讯文档链接: https://docs.qq.com/doc/...]` — 其他腾讯文档

## 注意事项

- **超大文档**（>10万字）不要用 `get_content`，必超时
- **Windows 编码**：带 emoji 的文档必须用 `python -X utf8`
- **链接格式**：提取出的链接在 `text_preview` 中带 `[普通链接: ...]` 包裹，直接用中间的真实 URL