feat: add yunpan1 search source

- sites/yunpan1/v1/: 新增云盘资源分享社区搜索源 - intro.md: 论坛介绍、板块列表 - urls.md: 站点链接、Cookie 维护说明 - usage.md: 搜索脚本使用、登录流程 - yunpan1_search.py: Python 搜索脚本（标准库零依赖） - .gitignore: 追加 .idea/ __pycache__/ *.pyc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-16 20:44:28 +08:00
parent a39547f5f6
commit ecc1b9d6dc
5 changed files with 265 additions and 0 deletions
@@ -1,2 +1,5 @@
 # 临时文件 / 配置文件，不纳入版本管理
 tmp/
+.idea/
+__pycache__/
+*.pyc
@@ -0,0 +1,33 @@
+# yunpan1 云盘资源分享社区 — 介绍
+
+## 是什么
+
+[yunpan1.cc](https://yunpan1.cc/) 是一个 Discuz! 论坛，用户自发分享云盘资源链接。
+
+## 资源类型
+
+| 板块 | 说明 |
+|------|------|
+| 影视 | 电影、电视剧 |
+| 动漫 | 国漫、日漫 |
+| 书籍 | 电子书、杂志 |
+| 学习 | 课程、资料 |
+| 软件 | 各类软件 |
+| 模板 | PPT 模板等 |
+| 教程 | 技能教程 |
+| 游戏 | 单机/安卓游戏 |
+| 系统 | Windows/Linux 镜像 |
+| 图片 | 壁纸、素材 |
+| 音频 | 音乐、有声书 |
+| 其它 | 未分类 |
+
+## 支持的网盘
+
+夸克网盘、阿里云盘、百度网盘、UC 网盘
+
+## 特点
+
+- 多用户发布，资源覆盖面广
+- 每日更新频繁
+- 部分帖子链接直接可见，部分需回复后可见
+- 需要登录才能搜索和查看隐藏内容
@@ -0,0 +1,38 @@
+# yunpan1 云盘资源分享社区 — 链接
+
+## 站点信息
+
+| 项目 | 内容 |
+|------|------|
+| 站点 | https://yunpan1.cc/ |
+| 旧版 | https://old.yunpan1.wang |
+| 类型 | Discuz! 论坛 |
+| 登录 | 需要账号 |
+| Cookie 有效期 | 约 1 天 |
+
+## 账号
+
+| 项目 | 值 |
+|------|-----|
+| 邮箱 | 需要向用户索要 |
+| 密码 | 需要向用户索要 |
+
+## Cookie
+
+成功登录后 Cookie 保存在 `tmp/yunpan1_cookies.txt`，有效期约 1 天。
+
+Cookie 过期后需要通过 Playwright 重新登录。登录流程：
+
+```
+1. 打开 https://yunpan1.cc/member.php?mod=logging&action=login
+2. 填写邮箱 + 密码 + 点击登录
+3. 从浏览器上下文中提取 auth、saltkey 等 Cookie
+4. 写入 tmp/yunpan1_cookies.txt
+```
+
+## 板块链接
+
+| 板块 | URL |
+|------|-----|
+| 动漫 | https://yunpan1.cc/forum.php?mod=forumdisplay&fid=3 |
+| 影视 | https://yunpan1.cc/forum.php?mod=forumdisplay&fid=2 |
@@ -0,0 +1,53 @@
+# yunpan1 — 获取资源
+
+## Python 脚本搜索（推荐）
+
+依赖：Python 标准库（无需额外安装）
+
+```bash
+py -X utf8 sites/yunpan1/v1/yunpan1_search.py <关键词>
+```
+
+示例：
+
+```bash
+py -X utf8 sites/yunpan1/v1/yunpan1_search.py 遮天
+py -X utf8 sites/yunpan1/v1/yunpan1_search.py 完美世界
+```
+
+搜索结果：
+- **完整夸克链接**（12 位 ID，可直接转存到夸克网盘）
+- **被截断的链接**（部分帖子在搜索结果中截断了链接，需点进帖子查看）
+- 链接自动保存到 `tmp/quark_links.txt`
+
+首次搜索因 Discuz! 后端建索引可能等待 1-2 分钟，同一关键词后续秒回。
+
+## Cookie 维护
+
+搜索需要登录态，Cookie 保存在 `tmp/yunpan1_cookies.txt`，约 1 天过期。
+
+Cookie 过期后通过 Playwright 重新登录获取：
+
+```javascript
+// Playwright 登录流程
+await page.goto('https://yunpan1.cc/member.php?mod=logging&action=login');
+await page.locator('form[name="login"] input[name="username"]').fill('向用户索要邮箱');
+await page.locator('form[name="login"] input[name="password"]').fill('向用户索要密码');
+await page.locator('form[name="login"] button[name="loginsubmit"]').click();
+// 等待跳转回首页确认登录成功
+
+// 提取 Cookie 保存到文件
+const cookies = await page.context().cookies();
+// 关键 Cookie: 2dF6_2132_auth, 2dF6_2132_saltkey, 2dF6_2132_lastvisit
+```
+
+## 直接浏览板块
+
+```
+动漫：https://yunpan1.cc/forum.php?mod=forumdisplay&fid=3
+影视：https://yunpan1.cc/forum.php?mod=forumdisplay&fid=2
+```
+
+## 拿到链接后的操作
+
+夸克链接转存流程见 `storage/quark/v1/usage.md`
@@ -0,0 +1,138 @@
+#!python
+# -*- coding: utf-8 -*-
+"""
+yunpan1 搜索工具 — 搜索云盘资源分享社区并提取夸克链接
+
+用法：
+  py -X utf8 yunpan1_search.py <关键词>
+  py -X utf8 yunpan1_search.py 遮天
+
+说明：
+  - Discuz! 首次搜索需建索引（可能等 60-120 秒），同一关键词后续秒回
+  - Cookie 文件: tmp/yunpan1_cookies.txt（需先通过 Playwright 登录获取）
+  - 依赖: Python 标准库（无需额外安装）
+"""
+
+import re
+import os
+import sys
+import urllib.request
+import urllib.parse
+import time
+
+# ── 配置 ──────────────────────────────────────────────────────────
+BASE_URL = 'https://yunpan1.cc'
+SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
+COOKIE_FILE = os.path.abspath(os.path.join(SCRIPT_DIR, '..', '..', '..', 'tmp', 'yunpan1_cookies.txt'))
+OUTPUT_FILE = os.path.abspath(os.path.join(SCRIPT_DIR, '..', '..', '..', 'tmp', 'quark_links.txt'))
+SEARCH_URL = BASE_URL + '/search.php?mod=forum&srchtxt={keyword}&searchsubmit=yes'
+
+# 全局 opener（timeout=300 避免分块传输时断连）
+_opener = urllib.request.build_opener()
+_opener.timeout = 300
+
+
+# ── 工具函数 ──────────────────────────────────────────────────────
+
+def load_cookie():
+    path = COOKIE_FILE
+    if not os.path.exists(path):
+        print(f'❌ Cookie 文件不存在: {path}')
+        print(f'请通过 Playwright 登录 yunpan1.cc 后，将 Cookie 保存到该文件（一行一个 key=value）')
+        sys.exit(1)
+    with open(path, 'r', encoding='utf-8') as f:
+        return f.read().strip().replace('\n', '; ')
+
+
+def request(url, cookie):
+    req = urllib.request.Request(url, headers={
+        'Cookie': cookie,
+        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
+    })
+    t0 = time.time()
+    resp = _opener.open(req)
+    html = resp.read().decode('utf-8', errors='replace')
+    return resp.status, html, resp.url, time.time() - t0
+
+
+# ── 提取函数 ──────────────────────────────────────────────────────
+
+def extract_quark_links(html):
+    """提取完整夸克链接（12 位字母数字 ID），排除末尾拼接 https 的误匹配"""
+    raw = re.findall(r'https?://pan\.quark\.cn/s/[a-zA-Z0-9]{12,}', html)
+    # 过滤掉后面紧跟着 https 的（如 .../s/xxxhttps）
+    return sorted(set(l for l in raw if not l.endswith('https')))
+
+
+def extract_truncated_links(html):
+    """提取被截断的链接（1-11 位 ID，需要点进帖子）"""
+    raw = re.findall(r'https?://pan\.quark\.cn/s/[a-zA-Z0-9]{1,11}', html)
+    return sorted(set(raw))
+
+
+def extract_threads(html):
+    """提取帖子列表"""
+    titles = re.findall(r'<a[^>]+href="forum\.php\?mod=viewthread[^>]+>([^<]+)</a>', html)
+    seen = set()
+    result = []
+    for t in titles:
+        t = t.strip()
+        if t and len(t) > 5 and t not in seen:
+            seen.add(t)
+            result.append(t)
+    return result
+
+
+# ── 主流程 ────────────────────────────────────────────────────────
+
+def main():
+    if len(sys.argv) < 2:
+        print('用法: py -X utf8 yunpan1_search.py <关键词>')
+        print('示例: py -X utf8 yunpan1_search.py 遮天')
+        sys.exit(1)
+
+    keyword = sys.argv[1]
+    cookie = load_cookie()
+
+    print(f'🔍 搜索 "{keyword}" ...')
+    print(f'⏱  首次搜索需建索引，可能等待 1-2 分钟，请耐心等待...')
+    sys.stdout.flush()
+
+    status, html, final_url, elapsed = request(
+        SEARCH_URL.format(keyword=urllib.parse.quote(keyword)),
+        cookie
+    )
+
+    links = extract_quark_links(html)
+    truncated = extract_truncated_links(html)
+    threads = extract_threads(html)
+
+    print(f'\n✅ 完成（耗时 {elapsed:.0f}s，状态码 {status}）')
+    print(f'   HTML: {len(html)} 字符 / {len(threads)} 条帖子')
+
+    if links:
+        print(f'\n{"=" * 50}')
+        print(f'✅ 完整夸克链接（可直接转存）: {len(links)}')
+        print(f'{"=" * 50}')
+        for l in links:
+            print(f'   {l}')
+
+        os.makedirs(os.path.dirname(OUTPUT_FILE), exist_ok=True)
+        with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
+            f.write('\n'.join(links) + '\n')
+        print(f'\n💾 链接已保存: {os.path.relpath(OUTPUT_FILE)}')
+    else:
+        print('\n⚠️  未找到完整夸克链接')
+
+    if truncated:
+        print(f'\n⚠️  被截断的链接（{len(truncated)} 个，需点进帖子）:')
+        for l in truncated[:5]:
+            print(f'   {l}')
+
+    print(f'\n📌 帖子预览（前 10 条）:')
+    for t in threads[:10]:
+        print(f'   • {t[:60]}')
+
+
+if __name__ == '__main__':
+    main()