TravelContentCreator/docs/HOTSPOT_MODULE.md

# 热点数据模块

> 多平台热点数据采集和管理

## 一、模块结构

```
domain/hotspot/
├── __init__.py           # 模块入口
├── models.py             # 数据模型 (HotTopic, HotTopicSource)
├── manager.py            # 热点管理器 (缓存、聚合)
└── crawlers/             # 爬虫模块
    ├── base.py           # 爬虫基类
    ├── weibo.py          # 微博热搜 (需要优化)
    ├── baidu.py          # 百度热搜 ✅ (含旅游榜)
    ├── bing.py           # Bing 搜索建议 ✅
    ├── calendar.py       # 节日日历 ✅
    ├── xiaohongshu.py    # 小红书热门 ✅
    └── mediacrawler/     # MediaCrawler 集成
        ├── __init__.py
        └── xhs_crawler.py  # 小红书实时爬虫

libs/MediaCrawler/        # MediaCrawler 项目 (子模块)
api/routers/hotspot.py    # API 路由
```

## 二、API 接口

### 2.1 获取所有热点

```bash
GET /api/v2/hotspot/all?force=false
```

### 2.2 按平台获取

```bash
# 微博热搜 (待优化)
GET /api/v2/hotspot/weibo?limit=50

# 百度热搜
GET /api/v2/hotspot/baidu?limit=50

# 小红书热门 (文旅相关)
GET /api/v2/hotspot/xiaohongshu?limit=20

# 节日日历
GET /api/v2/hotspot/calendar?days=30
```

### 2.3 聚合查询

```bash
# 旅游相关热点 (所有来源)
GET /api/v2/hotspot/travel?limit=10

# 热门话题 (去重合并)
GET /api/v2/hotspot/trending?limit=20
```

### 2.4 自定义热点

```bash
# 获取
GET /api/v2/hotspot/custom

# 添加
POST /api/v2/hotspot/custom
{
  "title": "冬季温泉推荐",
  "tags": ["温泉", "冬季", "度假"],
  "category": "travel"
}

# 删除
DELETE /api/v2/hotspot/custom/{title}
```

## 三、数据模型

### HotTopic

```python
@dataclass
class HotTopic:
    title: str                    # 话题标题
    source: HotTopicSource        # 来源 (weibo/baidu/xhs/calendar/custom)
    rank: Optional[int]           # 排名
    heat: Optional[int]           # 热度值
    category: HotTopicCategory    # 分类 (travel/food/festival/trending)
    url: Optional[str]            # 原始链接
    description: Optional[str]    # 描述
    tags: List[str]               # 标签
    fetched_at: datetime          # 获取时间
    expires_at: Optional[datetime] # 过期时间
```

### HotTopicCategory

- `travel` - 旅游相关
- `food` - 美食
- `festival` - 节日节气
- `event` - 热门事件
- `trending` - 热门话题
- `season` - 季节性
- `other` - 其他

## 四、缓存策略

| 来源 | 缓存时间 | 说明 |
|-----|---------|------|
| 微博 | 5分钟 | 实时性高 |
| 百度 | 10分钟 | 实时性高 |
| 小红书 | 30分钟 | 预设话题 |
| 日历 | 1小时 | 静态数据 |
| 自定义 | 24小时 | 手动管理 |

## 五、百度多榜单

百度爬虫支持多个榜单：

```python
# 默认获取: 实时热点 + 旅游榜
crawler = BaiduCrawler()  # tabs=['realtime', 'travel']

# 自定义榜单
crawler = BaiduCrawler(tabs=['realtime', 'travel', 'movie'])
```

**支持的榜单**:
- `realtime` - 实时热点 (50条)
- `travel` - 旅游榜 (30条) ⭐
- `movie` - 电影榜
- `teleplay` - 电视剧榜
- `novel` - 小说榜
- `car` - 汽车榜
- `game` - 游戏榜

## 六、小红书模块说明

基于 MediaCrawler 项目获取实时数据，需要扫码登录。

### 6.1 使用爬虫

```python
from domain.hotspot.crawlers import XiaohongshuCrawler

crawler = XiaohongshuCrawler()

# 扫码登录 (首次需要，Cookie 会缓存)
await crawler.login()

# 获取热门话题
topics = await crawler.fetch()

# 搜索笔记
notes = await crawler.search_notes("旅游攻略", page_size=20)

# 获取笔记详情
detail = await crawler.get_note_detail("note_id")
```

### 6.2 直接使用桥接器

```python
from domain.hotspot.crawlers.mediacrawler import get_xhs_bridge

bridge = get_xhs_bridge()

# 登录
await bridge.login()

# 搜索笔记
notes = await bridge.search_notes("旅游攻略", page_size=20)

# 获取笔记详情
detail = await bridge.get_note_detail("note_id")
```

### 6.3 MediaCrawler 项目

位置: `libs/MediaCrawler/`

来源: https://github.com/NanmiCoder/MediaCrawler

支持平台: 小红书、抖音、微博、B站、快手、知乎、贴吧

### 6.4 搜索关键词

默认搜索以下文旅相关关键词：

```python
SEARCH_KEYWORDS = [
    "旅游攻略", "周末去哪玩", "亲子游推荐", "自驾游路线",
    "网红打卡地", "小众景点", "酒店推荐", "民宿推荐",
    "冬季旅行", "滑雪攻略", "温泉度假",
]
```

可自定义关键词：

```python
crawler = XiaohongshuCrawler(keywords=["三亚旅游", "哈尔滨冰雪"])
```

## 六、使用示例

### Python 调用

```python
from domain.hotspot import get_hotspot_manager

async def get_travel_hotspots():
    manager = get_hotspot_manager()

    # 获取旅游相关热点
    topics = await manager.get_travel_topics(limit=10)

    for topic in topics:
        print(f"[{topic.source.value}] {topic.title}")

    return topics
```

### 内容生成集成

```python
# 在内容生成时使用热点
from domain.hotspot import get_hotspot_manager

async def generate_with_hotspots():
    manager = get_hotspot_manager()

    # 获取热点
    topics = await manager.get_travel_topics(limit=5)
    hot_topics = [t.title for t in topics]

    # 传递给内容生成引擎
    params = {
        "subject": {...},
        "hot_topics": {
            "events": hot_topics[:2],
            "festivals": ["元旦", "圣诞"],
            "trending": hot_topics[2:],
        }
    }
```

## 七、待优化

1. **微博爬虫**: 需要 Cookie 或使用 Playwright
2. **抖音热搜**: 待添加
3. **MediaCrawler 集成**: 获取小红书实时数据
4. **定时更新**: 后台定时刷新缓存
5. **持久化**: 热点数据存储到数据库