Compare commits
2 Commits
放到code文件夹下
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 7eec301831 | |||
| 8f021a93d3 |
164
code/README_CnOCR.md
Normal file
164
code/README_CnOCR.md
Normal file
@ -0,0 +1,164 @@
|
||||
# 视频字幕OCR提取器 - CnOCR集成
|
||||
|
||||
## 概述
|
||||
|
||||
字幕提取器现在支持三种OCR引擎:
|
||||
- **PaddleOCR**: 百度开源OCR引擎
|
||||
- **EasyOCR**: 轻量级OCR引擎
|
||||
- **CnOCR**: 中文OCR专用引擎(新增)
|
||||
|
||||
## CnOCR安装和配置
|
||||
|
||||
### 1. 自动安装(推荐)
|
||||
|
||||
```bash
|
||||
cd code
|
||||
python install_cnocr.py
|
||||
```
|
||||
|
||||
### 2. 手动安装
|
||||
|
||||
```bash
|
||||
# 安装CnOCR
|
||||
pip install cnocr[ort-cpu] -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
|
||||
# 创建模型目录
|
||||
mkdir -p /root/autodl-tmp/llm/cnocr
|
||||
|
||||
# 设置环境变量
|
||||
export CNOCR_HOME=/root/autodl-tmp/llm/cnocr
|
||||
```
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 1. 单独使用CnOCR
|
||||
|
||||
```bash
|
||||
python ocr_subtitle_extractor.py your_video.mp4 -e cnocr
|
||||
```
|
||||
|
||||
### 2. 使用所有OCR引擎
|
||||
|
||||
```bash
|
||||
python ocr_subtitle_extractor.py your_video.mp4 -e all
|
||||
```
|
||||
|
||||
### 3. 完整参数示例
|
||||
|
||||
```bash
|
||||
python ocr_subtitle_extractor.py your_video.mp4 \
|
||||
-e cnocr \
|
||||
-l ch \
|
||||
-i 30 \
|
||||
-c 0.5 \
|
||||
-o results \
|
||||
-f json \
|
||||
--position bottom
|
||||
```
|
||||
|
||||
## 参数说明
|
||||
|
||||
- `-e, --engine`: OCR引擎选择
|
||||
- `paddleocr`: 仅使用PaddleOCR
|
||||
- `easyocr`: 仅使用EasyOCR
|
||||
- `cnocr`: 仅使用CnOCR(新增)
|
||||
- `all`: 使用所有三种引擎
|
||||
|
||||
- `-l, --language`: 语言设置
|
||||
- `ch`: 中文
|
||||
- `en`: 英文
|
||||
- `ch_en`: 中英文混合
|
||||
|
||||
- `-i, --interval`: 帧采样间隔(默认30帧)
|
||||
- `-c, --confidence`: 置信度阈值(默认0.5)
|
||||
- `-o, --output`: 输出目录
|
||||
- `-f, --format`: 输出格式(json/txt/srt)
|
||||
- `--position`: 字幕区域位置(full/center/bottom)
|
||||
|
||||
## CnOCR特点
|
||||
|
||||
1. **专为中文优化**: 对中文识别效果更好
|
||||
2. **轻量级**: 模型体积较小,运行速度快
|
||||
3. **易于部署**: 安装简单,依赖少
|
||||
4. **多种模型**: 支持多种检测和识别模型
|
||||
|
||||
## 测试CnOCR集成
|
||||
|
||||
```bash
|
||||
python test_cnocr.py
|
||||
```
|
||||
|
||||
这个脚本会:
|
||||
1. 测试CnOCR安装
|
||||
2. 测试模型下载
|
||||
3. 测试字幕提取器集成
|
||||
4. 显示测试结果
|
||||
|
||||
## 模型存储位置
|
||||
|
||||
所有CnOCR模型文件都会下载到:
|
||||
```
|
||||
/root/autodl-tmp/llm/cnocr/
|
||||
```
|
||||
|
||||
首次使用时会自动下载所需模型,请耐心等待。
|
||||
|
||||
## 输出格式
|
||||
|
||||
使用CnOCR时,识别结果中的`engine`字段会标记为`"CnOCR"`,便于区分不同引擎的结果。
|
||||
|
||||
## 性能对比
|
||||
|
||||
| 引擎 | 中文识别 | 英文识别 | 速度 | 模型大小 |
|
||||
|------|----------|----------|------|----------|
|
||||
| PaddleOCR | 优秀 | 优秀 | 中等 | 大 |
|
||||
| EasyOCR | 良好 | 优秀 | 较慢 | 大 |
|
||||
| CnOCR | 优秀 | 良好 | 较快 | 中等 |
|
||||
|
||||
## 故障排除
|
||||
|
||||
### 1. 安装失败
|
||||
```bash
|
||||
# 更新pip
|
||||
pip install --upgrade pip
|
||||
|
||||
# 使用国内源
|
||||
pip install cnocr[ort-cpu] -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
```
|
||||
|
||||
### 2. 模型下载失败
|
||||
```bash
|
||||
# 检查网络连接
|
||||
# 确保有足够的磁盘空间
|
||||
# 重新运行安装脚本
|
||||
python install_cnocr.py
|
||||
```
|
||||
|
||||
### 3. 环境变量问题
|
||||
```bash
|
||||
# 在脚本开头添加
|
||||
export CNOCR_HOME=/root/autodl-tmp/llm/cnocr
|
||||
```
|
||||
|
||||
## 示例输出
|
||||
|
||||
```json
|
||||
{
|
||||
"video_path": "test_video.mp4",
|
||||
"subtitles": [
|
||||
{
|
||||
"timestamp": 1.5,
|
||||
"text": "这是一个测试字幕",
|
||||
"confidence": 0.95,
|
||||
"bbox": [[10, 20], [200, 20], [200, 50], [10, 50]],
|
||||
"engine": "CnOCR"
|
||||
}
|
||||
],
|
||||
"stats": {
|
||||
"total_detections": 150,
|
||||
"filtered_detections": 120,
|
||||
"unique_texts": 50,
|
||||
"average_confidence": 0.87
|
||||
}
|
||||
}
|
||||
```
|
||||
484
code/api_video_with_monitor.py
Normal file
484
code/api_video_with_monitor.py
Normal file
@ -0,0 +1,484 @@
|
||||
#!/usr/bin/env python3
|
||||
from openai import OpenAI
|
||||
import os
|
||||
import base64
|
||||
import time
|
||||
import psutil
|
||||
import subprocess
|
||||
from datetime import datetime
|
||||
|
||||
|
||||
class MemoryMonitor:
|
||||
def __init__(self):
|
||||
self.checkpoints = []
|
||||
self.initial_memory = self.get_memory_info()
|
||||
|
||||
def get_memory_info(self):
|
||||
"""获取当前内存使用情况"""
|
||||
memory = psutil.virtual_memory()
|
||||
gpu_info = self.get_gpu_memory()
|
||||
process = psutil.Process()
|
||||
memory_info = process.memory_info()
|
||||
|
||||
return {
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"system_memory_gb": memory.used / 1024**3,
|
||||
"system_memory_percent": memory.percent,
|
||||
"gpu_memory": gpu_info,
|
||||
"process_memory_mb": memory_info.rss / 1024 / 1024
|
||||
}
|
||||
|
||||
def get_gpu_memory(self):
|
||||
"""获取GPU内存使用情况"""
|
||||
try:
|
||||
result = subprocess.run(['nvidia-smi', '--query-gpu=memory.total,memory.used,memory.free',
|
||||
'--format=csv,noheader,nounits'],
|
||||
capture_output=True, text=True, check=True)
|
||||
lines = result.stdout.strip().split('\n')
|
||||
gpu_info = []
|
||||
for i, line in enumerate(lines):
|
||||
parts = line.split(', ')
|
||||
if len(parts) == 3:
|
||||
total, used, free = map(int, parts)
|
||||
gpu_info.append({
|
||||
"gpu_id": i,
|
||||
"total_mb": total,
|
||||
"used_mb": used,
|
||||
"free_mb": free,
|
||||
"usage_percent": round(used / total * 100, 2)
|
||||
})
|
||||
return gpu_info
|
||||
except:
|
||||
return []
|
||||
|
||||
def checkpoint(self, name=""):
|
||||
"""创建内存检查点"""
|
||||
current_memory = self.get_memory_info()
|
||||
|
||||
if self.checkpoints:
|
||||
last_memory = self.checkpoints[-1]["memory"]
|
||||
memory_diff = {
|
||||
"system_memory_gb": current_memory["system_memory_gb"] - last_memory["system_memory_gb"],
|
||||
"process_memory_mb": current_memory["process_memory_mb"] - last_memory["process_memory_mb"],
|
||||
}
|
||||
|
||||
# GPU内存差异
|
||||
gpu_diff = []
|
||||
if current_memory["gpu_memory"] and last_memory["gpu_memory"]:
|
||||
for i in range(min(len(current_memory["gpu_memory"]), len(last_memory["gpu_memory"]))):
|
||||
current_gpu = current_memory["gpu_memory"][i]["used_mb"]
|
||||
last_gpu = last_memory["gpu_memory"][i]["used_mb"]
|
||||
gpu_diff.append({
|
||||
"gpu_id": i,
|
||||
"used_mb_diff": current_gpu - last_gpu
|
||||
})
|
||||
memory_diff["gpu_memory"] = gpu_diff
|
||||
else:
|
||||
memory_diff = None
|
||||
|
||||
checkpoint = {
|
||||
"name": name,
|
||||
"memory": current_memory,
|
||||
"memory_diff": memory_diff
|
||||
}
|
||||
|
||||
self.checkpoints.append(checkpoint)
|
||||
return checkpoint
|
||||
|
||||
def check_memory_risk(self):
|
||||
"""检查内存风险等级"""
|
||||
current = self.get_memory_info()
|
||||
|
||||
# 系统内存风险
|
||||
sys_risk = "低"
|
||||
if current["system_memory_percent"] > 90:
|
||||
sys_risk = "高"
|
||||
elif current["system_memory_percent"] > 80:
|
||||
sys_risk = "中"
|
||||
|
||||
# GPU内存风险
|
||||
gpu_risk = "低"
|
||||
if current["gpu_memory"]:
|
||||
max_gpu_usage = max(gpu["usage_percent"] for gpu in current["gpu_memory"])
|
||||
if max_gpu_usage > 95:
|
||||
gpu_risk = "高"
|
||||
elif max_gpu_usage > 85:
|
||||
gpu_risk = "中"
|
||||
|
||||
return {
|
||||
"system_risk": sys_risk,
|
||||
"gpu_risk": gpu_risk,
|
||||
"current_memory": current
|
||||
}
|
||||
|
||||
def print_memory_status(self, title=""):
|
||||
"""打印当前内存状态"""
|
||||
current = self.get_memory_info()
|
||||
risk = self.check_memory_risk()
|
||||
|
||||
print(f"\n{'='*50}")
|
||||
print(f"🔍 {title if title else '内存状态检查'}")
|
||||
print(f"{'='*50}")
|
||||
|
||||
# 系统内存
|
||||
risk_icon = {"低": "✅", "中": "⚠️", "高": "🚨"}[risk["system_risk"]]
|
||||
print(f"💾 系统内存: {current['system_memory_gb']:.1f} GB ({current['system_memory_percent']:.1f}%) {risk_icon}")
|
||||
|
||||
# GPU内存
|
||||
if current["gpu_memory"]:
|
||||
risk_icon = {"低": "✅", "中": "⚠️", "高": "🚨"}[risk["gpu_risk"]]
|
||||
for gpu in current["gpu_memory"]:
|
||||
print(f"🎮 GPU {gpu['gpu_id']}: {gpu['used_mb']:.0f}/{gpu['total_mb']:.0f} MB ({gpu['usage_percent']:.1f}%) {risk_icon}")
|
||||
|
||||
# 进程内存
|
||||
print(f"🔧 当前进程: {current['process_memory_mb']:.1f} MB")
|
||||
|
||||
return risk
|
||||
|
||||
def analyze_file_sizes(video_path, audio_path=None, txt_content=""):
|
||||
"""分析文件大小和预估内存占用"""
|
||||
print(f"\n{'='*50}")
|
||||
print("📊 文件大小分析")
|
||||
print(f"{'='*50}")
|
||||
|
||||
total_estimated_mb = 0
|
||||
warnings = []
|
||||
|
||||
# 视频文件分析
|
||||
if os.path.exists(video_path):
|
||||
video_size = os.path.getsize(video_path)
|
||||
video_size_mb = video_size / 1024 / 1024
|
||||
base64_size_mb = video_size_mb * 1.33 # Base64编码增加约33%
|
||||
memory_estimate_mb = base64_size_mb * 2 # 编码过程需要双倍内存
|
||||
|
||||
print(f"🎥 视频文件: {os.path.basename(video_path)}")
|
||||
print(f" 原始大小: {video_size_mb:.2f} MB")
|
||||
print(f" Base64后: {base64_size_mb:.2f} MB")
|
||||
print(f" 内存估算: {memory_estimate_mb:.2f} MB")
|
||||
|
||||
total_estimated_mb += memory_estimate_mb
|
||||
|
||||
if base64_size_mb > 100:
|
||||
warnings.append("视频文件过大(>100MB Base64)")
|
||||
elif base64_size_mb > 50:
|
||||
warnings.append("视频文件较大(>50MB Base64)")
|
||||
|
||||
# 音频文件分析
|
||||
if audio_path and os.path.exists(audio_path):
|
||||
audio_size = os.path.getsize(audio_path)
|
||||
audio_size_mb = audio_size / 1024 / 1024
|
||||
base64_size_mb = audio_size_mb * 1.33
|
||||
memory_estimate_mb = base64_size_mb * 2
|
||||
|
||||
print(f"\n🎵 音频文件: {os.path.basename(audio_path)}")
|
||||
print(f" 原始大小: {audio_size_mb:.2f} MB")
|
||||
print(f" Base64后: {base64_size_mb:.2f} MB")
|
||||
print(f" 内存估算: {memory_estimate_mb:.2f} MB")
|
||||
|
||||
total_estimated_mb += memory_estimate_mb
|
||||
|
||||
if base64_size_mb > 50:
|
||||
warnings.append("音频文件过大(>50MB Base64)")
|
||||
|
||||
# 文本内容分析
|
||||
if txt_content:
|
||||
text_size_mb = len(txt_content.encode('utf-8')) / 1024 / 1024
|
||||
print(f"\n📝 文本内容: {len(txt_content)} 字符 ({text_size_mb:.3f} MB)")
|
||||
total_estimated_mb += text_size_mb
|
||||
|
||||
if len(txt_content) > 50000:
|
||||
warnings.append("文本内容过长(>50k字符)")
|
||||
|
||||
print(f"\n📋 总估算内存: {total_estimated_mb:.2f} MB")
|
||||
|
||||
# 风险评估
|
||||
if total_estimated_mb > 500:
|
||||
print("🚨 高风险: 内容过大,强烈建议压缩或分段处理")
|
||||
warnings.append("总内存占用过高(>500MB)")
|
||||
elif total_estimated_mb > 200:
|
||||
print("⚠️ 中风险: 建议监控内存使用")
|
||||
warnings.append("总内存占用较高(>200MB)")
|
||||
else:
|
||||
print("✅ 低风险: 内存占用在可接受范围内")
|
||||
|
||||
return total_estimated_mb, warnings
|
||||
|
||||
# Base64 编码格式
|
||||
def encode_video(video_path):
|
||||
with open(video_path, "rb") as video_file:
|
||||
return base64.b64encode(video_file.read()).decode("utf-8")
|
||||
|
||||
def encode_audio(audio_path):
|
||||
with open(audio_path, "rb") as audio_file:
|
||||
return base64.b64encode(audio_file.read()).decode("utf-8")
|
||||
|
||||
def read_txt_file(txt_path):
|
||||
"""读取txt文件内容"""
|
||||
try:
|
||||
with open(txt_path, 'r', encoding='utf-8') as file:
|
||||
content = file.read()
|
||||
print(f"成功读取txt文件: {txt_path}")
|
||||
print(f"文件内容长度: {len(content)} 字符")
|
||||
return content
|
||||
except FileNotFoundError:
|
||||
print(f"错误: 找不到文件 {txt_path}")
|
||||
return ""
|
||||
except Exception as e:
|
||||
print(f"读取文件时出错: {e}")
|
||||
return ""
|
||||
|
||||
def save_result_to_txt(response_text, video_path, save_dir="results"):
|
||||
"""将分析结果保存为TXT文件"""
|
||||
os.makedirs(save_dir, exist_ok=True)
|
||||
|
||||
video_name = os.path.splitext(os.path.basename(video_path))[0]
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
txt_filename = f"{video_name}_analysis_{timestamp}.txt"
|
||||
txt_path = os.path.join(save_dir, txt_filename)
|
||||
|
||||
content = f"""视频分析结果
|
||||
=====================================
|
||||
视频文件: {video_path}
|
||||
分析时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
|
||||
=====================================
|
||||
|
||||
{response_text}
|
||||
"""
|
||||
|
||||
try:
|
||||
with open(txt_path, 'w', encoding='utf-8') as f:
|
||||
f.write(content)
|
||||
print(f"\n✅ 分析结果已保存到: {txt_path}")
|
||||
return txt_path
|
||||
except Exception as e:
|
||||
print(f"\n❌ 保存TXT文件失败: {e}")
|
||||
return None
|
||||
|
||||
# 初始化内存监控器
|
||||
monitor = MemoryMonitor()
|
||||
|
||||
STREAM_MODE = True
|
||||
|
||||
# 文件路径配置
|
||||
video_path = "/root/autodl-tmp/video2audio/sample_demo_6.mp4"
|
||||
audio_path = "/root/autodl-tmp/video2audio/sample_demo_6.wav"
|
||||
#txt_path = "/root/autodl-tmp/hot_video_analyse/source/example_reference.txt"
|
||||
|
||||
# 初始内存检查
|
||||
monitor.checkpoint("程序启动")
|
||||
monitor.print_memory_status("程序启动时内存状态")
|
||||
|
||||
# 分析文件大小和预估内存占用
|
||||
txt_content = ""
|
||||
estimated_memory, warnings = analyze_file_sizes(video_path, txt_content=txt_content)
|
||||
|
||||
# 如果有警告,询问是否继续
|
||||
if warnings:
|
||||
print(f"\n⚠️ 发现以下潜在问题:")
|
||||
for warning in warnings:
|
||||
print(f" - {warning}")
|
||||
print(f"\n建议:")
|
||||
print(f" - 使用更小的测试文件")
|
||||
print(f" - 监控内存使用情况")
|
||||
print(f" - 如遇到错误,尝试压缩文件")
|
||||
|
||||
# 编码前内存检查
|
||||
monitor.checkpoint("开始编码前")
|
||||
risk_before = monitor.check_memory_risk()
|
||||
|
||||
if risk_before["system_risk"] == "高" or risk_before["gpu_risk"] == "高":
|
||||
print(f"\n🚨 警告: 当前内存使用率已经很高,继续可能导致内存溢出!")
|
||||
print(f" 系统内存风险: {risk_before['system_risk']}")
|
||||
print(f" GPU内存风险: {risk_before['gpu_risk']}")
|
||||
|
||||
print("\n开始编码文件...")
|
||||
encode_start_time = time.time()
|
||||
|
||||
try:
|
||||
base64_video = encode_video(video_path)
|
||||
print(f"✅ 视频编码完成")
|
||||
except Exception as e:
|
||||
print(f"❌ 视频编码失败: {e}")
|
||||
monitor.print_memory_status("编码失败时内存状态")
|
||||
exit(1)
|
||||
base64_audio = encode_audio(audio_path)
|
||||
# 编码后内存检查
|
||||
monitor.checkpoint("编码完成")
|
||||
encode_end_time = time.time()
|
||||
encode_duration = encode_end_time - encode_start_time
|
||||
|
||||
print(f"📁 文件编码完成,耗时: {encode_duration:.2f} 秒")
|
||||
|
||||
# 检查编码后内存变化
|
||||
last_checkpoint = monitor.checkpoints[-1]
|
||||
if last_checkpoint["memory_diff"]:
|
||||
diff = last_checkpoint["memory_diff"]
|
||||
print(f"📊 编码过程内存变化:")
|
||||
print(f" 进程内存增加: {diff['process_memory_mb']:+.1f} MB")
|
||||
if diff["gpu_memory"]:
|
||||
for gpu_diff in diff["gpu_memory"]:
|
||||
print(f" GPU {gpu_diff['gpu_id']} 内存变化: {gpu_diff['used_mb_diff']:+.0f} MB")
|
||||
|
||||
client = OpenAI(
|
||||
api_key="EMPTY",
|
||||
base_url="http://localhost:8000/v1",
|
||||
)
|
||||
|
||||
# 构建content列表
|
||||
content_list = [
|
||||
{
|
||||
"type": "video_url",
|
||||
"video_url": {"url": f"data:video/mp4;base64,{base64_video}"},
|
||||
"type": "audio_url",
|
||||
"audio_url": {"url": f"data:audio/wav;base64,{base64_audio}"},
|
||||
}
|
||||
]
|
||||
|
||||
# 如果txt文件有内容,添加到content中
|
||||
if txt_content.strip():
|
||||
content_list.append({
|
||||
"type": "text",
|
||||
"text": f"参考文档内容:\n{txt_content}\n\n"
|
||||
})
|
||||
|
||||
# 添加主要提示文本(简化版以减少内存使用)
|
||||
content_list.append({
|
||||
"type": "text",
|
||||
"text": """请分析这个抖音短视频的内容:
|
||||
|
||||
1. **口播内容**:转录视频中的语音内容
|
||||
2. **字幕文字**:识别画面中的文字和字幕
|
||||
3. **勾子分析**:分析视频的开头勾子策略
|
||||
|
||||
请用JSON格式输出结果:
|
||||
{
|
||||
"口播分析": {"是否有口播": "", "口播内容": "", "讲话时长": ""},
|
||||
"字幕分析": {"是否有字幕": "", "字幕内容": "", "字幕位置": ""},
|
||||
"勾子分析": {"勾子类型": "", "勾子公式": "", "勾子内容": ""}
|
||||
}"""
|
||||
})
|
||||
|
||||
# API请求前内存检查
|
||||
monitor.checkpoint("API请求前")
|
||||
monitor.print_memory_status("API请求前内存状态")
|
||||
|
||||
print(f"\n🚀 开始请求API...")
|
||||
print(f"📅 请求时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
print(f"🔄 Stream模式: {STREAM_MODE}")
|
||||
print(f"📋 Content项目数量: {len(content_list)}")
|
||||
|
||||
# 计算请求大小
|
||||
total_request_size = sum(len(str(content)) for content in content_list)
|
||||
print(f"📏 请求总大小: {total_request_size/1024/1024:.2f} MB")
|
||||
|
||||
api_start_time = time.time()
|
||||
|
||||
try:
|
||||
completion = client.chat.completions.create(
|
||||
model="/root/autodl-tmp/llm/Qwen-omni",
|
||||
messages=[
|
||||
{
|
||||
"role": "system",
|
||||
"content": [{"type":"text","text": "You are a helpful assistant."}]
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": content_list
|
||||
}
|
||||
],
|
||||
stream=STREAM_MODE,
|
||||
stream_options={"include_usage": True} if STREAM_MODE else None,
|
||||
max_tokens=1024, # 限制输出长度以节省内存
|
||||
)
|
||||
|
||||
if STREAM_MODE:
|
||||
full_response = ""
|
||||
usage_info = None
|
||||
first_token_time = None
|
||||
token_count = 0
|
||||
|
||||
print("✨ 正在生成回复...")
|
||||
for chunk in completion:
|
||||
if chunk.choices:
|
||||
delta = chunk.choices[0].delta
|
||||
if delta.content:
|
||||
if first_token_time is None:
|
||||
first_token_time = time.time()
|
||||
first_token_delay = first_token_time - api_start_time
|
||||
print(f"🚀 首个token延迟: {first_token_delay:.2f} 秒")
|
||||
|
||||
full_response += delta.content
|
||||
token_count += 1
|
||||
else:
|
||||
usage_info = chunk.usage
|
||||
|
||||
api_end_time = time.time()
|
||||
total_duration = api_end_time - api_start_time
|
||||
|
||||
print("\n" + "="*50)
|
||||
print("📝 完整回复:")
|
||||
print("="*50)
|
||||
print(full_response)
|
||||
|
||||
# 保存结果为TXT文件
|
||||
txt_file_path = save_result_to_txt(full_response, video_path)
|
||||
|
||||
# API完成后内存检查
|
||||
monitor.checkpoint("API完成")
|
||||
|
||||
# 输出时间统计信息
|
||||
print("\n" + "="*50)
|
||||
print("⏱️ 时间统计:")
|
||||
print("="*50)
|
||||
print(f"📁 文件编码时间: {encode_duration:.2f} 秒")
|
||||
if first_token_time:
|
||||
print(f"🚀 首个token延迟: {first_token_delay:.2f} 秒")
|
||||
generation_time = api_end_time - first_token_time
|
||||
print(f"⚡ 内容生成时间: {generation_time:.2f} 秒")
|
||||
print(f"🕐 API总响应时间: {total_duration:.2f} 秒")
|
||||
print(f"📊 生成token数量: {token_count}")
|
||||
if first_token_time and token_count > 0:
|
||||
tokens_per_second = token_count / generation_time
|
||||
print(f"🔥 生成速度: {tokens_per_second:.2f} tokens/秒")
|
||||
print(f"⏰ 完成时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
|
||||
if usage_info:
|
||||
print(f"\n📈 使用情况: {usage_info}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n❌ API请求失败!")
|
||||
print(f"错误类型: {type(e)}")
|
||||
print(f"错误信息: {e}")
|
||||
|
||||
# 错误时进行内存检查
|
||||
monitor.checkpoint("API错误")
|
||||
monitor.print_memory_status("API错误时内存状态")
|
||||
|
||||
# 分析可能的原因
|
||||
if "Internal Server Error" in str(e) or "OutOfMemoryError" in str(e):
|
||||
print(f"\n💡 可能的内存溢出原因:")
|
||||
print(f" - 视频文件过大 ({estimated_memory:.1f} MB)")
|
||||
print(f" - GPU内存不足")
|
||||
print(f" - 系统内存不足")
|
||||
print(f"\n建议解决方案:")
|
||||
print(f" - 使用更小的视频文件")
|
||||
print(f" - 重启vLLM服务释放GPU内存")
|
||||
print(f" - 降低max_tokens限制")
|
||||
|
||||
# 最终内存状态报告
|
||||
print(f"\n{'='*60}")
|
||||
print("📊 最终内存使用报告")
|
||||
print(f"{'='*60}")
|
||||
|
||||
for i, checkpoint in enumerate(monitor.checkpoints):
|
||||
print(f"{i+1}. {checkpoint['name']}")
|
||||
if checkpoint['memory_diff']:
|
||||
diff = checkpoint['memory_diff']
|
||||
if abs(diff['process_memory_mb']) > 10: # 只显示显著变化
|
||||
print(f" 进程内存变化: {diff['process_memory_mb']:+.1f} MB")
|
||||
if diff['gpu_memory']:
|
||||
for gpu_diff in diff['gpu_memory']:
|
||||
if abs(gpu_diff['used_mb_diff']) > 50: # 只显示显著变化
|
||||
print(f" GPU {gpu_diff['gpu_id']} 变化: {gpu_diff['used_mb_diff']:+.0f} MB")
|
||||
|
||||
monitor.print_memory_status("程序结束时内存状态")
|
||||
337
code/batch_subtitle_extractor.py
Normal file
337
code/batch_subtitle_extractor.py
Normal file
@ -0,0 +1,337 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
批量视频字幕提取器
|
||||
支持批量处理多个视频文件,提取字幕
|
||||
支持PaddleOCR、EasyOCR和CnOCR三种引擎
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import json
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
import logging
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
|
||||
# 添加当前目录到路径以导入OCR模块
|
||||
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
|
||||
|
||||
from ocr_subtitle_extractor import VideoSubtitleExtractor
|
||||
|
||||
# 设置OCR模型路径环境变量
|
||||
os.environ['EASYOCR_MODULE_PATH'] = '/root/autodl-tmp/llm/easyocr'
|
||||
os.environ['CNOCR_HOME'] = '/root/autodl-tmp/llm/cnocr'
|
||||
|
||||
# 设置日志
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class BatchSubtitleExtractor:
|
||||
"""批量视频字幕提取器"""
|
||||
|
||||
def __init__(self, ocr_engine="paddleocr", language="ch", max_workers=2):
|
||||
"""
|
||||
初始化批量提取器
|
||||
|
||||
Args:
|
||||
ocr_engine: OCR引擎 ("paddleocr", "easyocr", "cnocr", "all")
|
||||
language: 语言设置 ("ch", "en", "ch_en")
|
||||
max_workers: 最大并行工作数
|
||||
"""
|
||||
self.ocr_engine = ocr_engine
|
||||
self.language = language
|
||||
self.max_workers = max_workers
|
||||
self.extractor = VideoSubtitleExtractor(ocr_engine=ocr_engine, language=language)
|
||||
|
||||
def find_video_files(self, input_dir):
|
||||
"""查找目录中的所有视频文件"""
|
||||
video_extensions = ['.mp4', '.avi', '.mov', '.mkv', '.flv', '.wmv', '.webm']
|
||||
video_files = []
|
||||
|
||||
input_path = Path(input_dir)
|
||||
|
||||
if input_path.is_file():
|
||||
# 单个文件
|
||||
if input_path.suffix.lower() in video_extensions:
|
||||
video_files.append(input_path)
|
||||
elif input_path.is_dir():
|
||||
# 目录中的所有视频文件
|
||||
for ext in video_extensions:
|
||||
video_files.extend(input_path.glob(f"*{ext}"))
|
||||
video_files.extend(input_path.glob(f"*{ext.upper()}"))
|
||||
|
||||
return sorted(video_files)
|
||||
|
||||
def extract_single_video(self, video_path, output_dir, **kwargs):
|
||||
"""
|
||||
处理单个视频文件
|
||||
|
||||
Args:
|
||||
video_path: 视频文件路径
|
||||
output_dir: 输出目录
|
||||
**kwargs: 其他参数
|
||||
|
||||
Returns:
|
||||
dict: 处理结果
|
||||
"""
|
||||
video_path = Path(video_path)
|
||||
video_name = video_path.stem
|
||||
|
||||
logger.info(f"开始处理视频: {video_path}")
|
||||
start_time = time.time()
|
||||
|
||||
try:
|
||||
# 提取字幕
|
||||
results = self.extractor.extract_subtitles_from_video(
|
||||
str(video_path),
|
||||
sample_interval=kwargs.get('interval', 30),
|
||||
confidence_threshold=kwargs.get('confidence', 0.5),
|
||||
subtitle_position=kwargs.get('position', 'bottom')
|
||||
)
|
||||
|
||||
# 保存结果
|
||||
output_path = Path(output_dir)
|
||||
output_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
for format_type in kwargs.get('formats', ['json']):
|
||||
output_file = output_path / f"{video_name}_subtitles.{format_type}"
|
||||
self.extractor.save_results(results, output_file, format_type)
|
||||
|
||||
process_time = time.time() - start_time
|
||||
results['process_time'] = process_time
|
||||
results['video_path'] = str(video_path)
|
||||
results['success'] = True
|
||||
|
||||
# 统计位置信息
|
||||
subtitles_with_bbox = [s for s in results['subtitles'] if s.get('bbox')]
|
||||
bbox_coverage = len(subtitles_with_bbox) / len(results['subtitles']) * 100 if results['subtitles'] else 0
|
||||
|
||||
logger.info(f"完成处理视频: {video_path} (耗时: {process_time:.2f}秒)")
|
||||
logger.info(f" 字幕总数: {len(results['subtitles'])}")
|
||||
logger.info(f" 有位置信息: {len(subtitles_with_bbox)}")
|
||||
logger.info(f" 位置信息覆盖率: {bbox_coverage:.1f}%")
|
||||
|
||||
return {
|
||||
'video_path': str(video_path),
|
||||
'success': True,
|
||||
'process_time': process_time,
|
||||
'subtitle_count': results['stats']['filtered_detections'],
|
||||
'text_length': results['stats']['text_length'],
|
||||
'total_subtitles': len(results['subtitles']),
|
||||
'subtitles_with_bbox': len(subtitles_with_bbox),
|
||||
'bbox_coverage': bbox_coverage,
|
||||
'output_files': [str(output_path / f"{video_name}_subtitles.{fmt}") for fmt in kwargs.get('formats', ['json'])]
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
error_msg = f"处理视频 {video_path} 时出错: {str(e)}"
|
||||
logger.error(error_msg)
|
||||
|
||||
return {
|
||||
'video_path': str(video_path),
|
||||
'success': False,
|
||||
'error': error_msg,
|
||||
'process_time': time.time() - start_time
|
||||
}
|
||||
|
||||
def extract_batch(self, input_dir, output_dir, parallel=True, **kwargs):
|
||||
"""
|
||||
批量提取字幕
|
||||
|
||||
Args:
|
||||
input_dir: 输入目录或文件
|
||||
output_dir: 输出目录
|
||||
parallel: 是否并行处理
|
||||
**kwargs: 其他参数
|
||||
|
||||
Returns:
|
||||
dict: 批量处理结果
|
||||
"""
|
||||
logger.info(f"开始批量字幕提取")
|
||||
logger.info(f"输入: {input_dir}")
|
||||
logger.info(f"输出目录: {output_dir}")
|
||||
logger.info(f"OCR引擎: {self.ocr_engine}")
|
||||
logger.info(f"字幕位置: {kwargs.get('position', 'bottom')}")
|
||||
logger.info(f"并行处理: {parallel}")
|
||||
|
||||
start_time = time.time()
|
||||
|
||||
# 查找视频文件
|
||||
video_files = self.find_video_files(input_dir)
|
||||
|
||||
if not video_files:
|
||||
logger.warning(f"在 {input_dir} 中未找到视频文件")
|
||||
return {
|
||||
'success': False,
|
||||
'message': '未找到视频文件',
|
||||
'total_files': 0,
|
||||
'results': []
|
||||
}
|
||||
|
||||
logger.info(f"找到 {len(video_files)} 个视频文件")
|
||||
|
||||
results = []
|
||||
|
||||
if parallel and len(video_files) > 1:
|
||||
# 并行处理
|
||||
logger.info(f"使用 {self.max_workers} 个并行工作进程")
|
||||
|
||||
with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
|
||||
# 提交任务
|
||||
future_to_video = {
|
||||
executor.submit(self.extract_single_video, video_file, output_dir, **kwargs): video_file
|
||||
for video_file in video_files
|
||||
}
|
||||
|
||||
# 收集结果
|
||||
for future in as_completed(future_to_video):
|
||||
video_file = future_to_video[future]
|
||||
try:
|
||||
result = future.result()
|
||||
results.append(result)
|
||||
|
||||
# 显示进度
|
||||
progress = len(results) / len(video_files) * 100
|
||||
logger.info(f"批量处理进度: {progress:.1f}% ({len(results)}/{len(video_files)})")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"处理视频 {video_file} 时发生异常: {str(e)}")
|
||||
results.append({
|
||||
'video_path': str(video_file),
|
||||
'success': False,
|
||||
'error': str(e)
|
||||
})
|
||||
else:
|
||||
# 串行处理
|
||||
for i, video_file in enumerate(video_files, 1):
|
||||
logger.info(f"处理第 {i}/{len(video_files)} 个视频")
|
||||
result = self.extract_single_video(video_file, output_dir, **kwargs)
|
||||
results.append(result)
|
||||
|
||||
# 显示进度
|
||||
progress = i / len(video_files) * 100
|
||||
logger.info(f"批量处理进度: {progress:.1f}%")
|
||||
|
||||
total_time = time.time() - start_time
|
||||
|
||||
# 统计结果
|
||||
success_count = sum(1 for r in results if r['success'])
|
||||
failed_count = len(results) - success_count
|
||||
|
||||
total_subtitles = sum(r.get('subtitle_count', 0) for r in results if r['success'])
|
||||
total_text_length = sum(r.get('text_length', 0) for r in results if r['success'])
|
||||
|
||||
# 统计位置信息
|
||||
total_subtitles_raw = sum(r.get('total_subtitles', 0) for r in results if r['success'])
|
||||
total_subtitles_with_bbox = sum(r.get('subtitles_with_bbox', 0) for r in results if r['success'])
|
||||
overall_bbox_coverage = total_subtitles_with_bbox / total_subtitles_raw * 100 if total_subtitles_raw > 0 else 0
|
||||
|
||||
batch_result = {
|
||||
'success': True,
|
||||
'total_time': total_time,
|
||||
'total_files': len(video_files),
|
||||
'success_count': success_count,
|
||||
'failed_count': failed_count,
|
||||
'total_subtitles': total_subtitles,
|
||||
'total_text_length': total_text_length,
|
||||
'total_subtitles_raw': total_subtitles_raw,
|
||||
'total_subtitles_with_bbox': total_subtitles_with_bbox,
|
||||
'overall_bbox_coverage': overall_bbox_coverage,
|
||||
'output_directory': output_dir,
|
||||
'ocr_engine': self.ocr_engine,
|
||||
'timestamp': datetime.now().isoformat(),
|
||||
'results': results
|
||||
}
|
||||
|
||||
# 保存批量处理报告
|
||||
report_file = Path(output_dir) / "batch_report.json"
|
||||
with open(report_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(batch_result, f, ensure_ascii=False, indent=2)
|
||||
|
||||
logger.info(f"批量处理完成!")
|
||||
logger.info(f"总文件数: {len(video_files)}")
|
||||
logger.info(f"成功: {success_count}, 失败: {failed_count}")
|
||||
logger.info(f"总耗时: {total_time:.2f} 秒")
|
||||
logger.info(f"提取字幕: {total_subtitles} 个")
|
||||
logger.info(f"文本长度: {total_text_length} 字符")
|
||||
logger.info(f"位置信息统计:")
|
||||
logger.info(f" 总字幕数: {total_subtitles_raw}")
|
||||
logger.info(f" 有位置信息: {total_subtitles_with_bbox}")
|
||||
logger.info(f" 位置信息覆盖率: {overall_bbox_coverage:.1f}%")
|
||||
logger.info(f"处理报告: {report_file}")
|
||||
|
||||
return batch_result
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
parser = argparse.ArgumentParser(description="批量视频字幕提取器")
|
||||
parser.add_argument("input", help="输入视频文件或目录")
|
||||
parser.add_argument("-e", "--engine", default="cnocr",
|
||||
choices=["paddleocr", "easyocr", "cnocr", "all"],
|
||||
help="OCR引擎 (默认: paddleocr)")
|
||||
parser.add_argument("-l", "--language", default="ch",
|
||||
choices=["ch", "en", "ch_en"],
|
||||
help="语言设置 (默认: ch)")
|
||||
parser.add_argument("-i", "--interval", type=int, default=30,
|
||||
help="帧采样间隔 (默认: 30)")
|
||||
parser.add_argument("-c", "--confidence", type=float, default=0.5,
|
||||
help="置信度阈值 (默认: 0.5)")
|
||||
parser.add_argument("-o", "--output", default="batch_subtitles",
|
||||
help="输出目录 (默认: batch_subtitles)")
|
||||
parser.add_argument("-f", "--formats", nargs='+', default=["json"],
|
||||
choices=["json", "txt", "srt"],
|
||||
help="输出格式 (默认: json)")
|
||||
parser.add_argument("--position", default="full",
|
||||
choices=["full", "center", "bottom"],
|
||||
help="字幕区域位置 (full=全屏, center=居中0.5-0.8, bottom=居下0.7-1.0)")
|
||||
parser.add_argument("--workers", type=int, default=2,
|
||||
help="并行工作进程数 (默认: 2)")
|
||||
parser.add_argument("--no-parallel", action="store_true",
|
||||
help="禁用并行处理")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 创建批量提取器
|
||||
batch_extractor = BatchSubtitleExtractor(
|
||||
ocr_engine=args.engine,
|
||||
language=args.language,
|
||||
max_workers=args.workers
|
||||
)
|
||||
|
||||
try:
|
||||
# 执行批量提取
|
||||
result = batch_extractor.extract_batch(
|
||||
input_dir=args.input,
|
||||
output_dir=args.output,
|
||||
parallel=not args.no_parallel,
|
||||
interval=args.interval,
|
||||
confidence=args.confidence,
|
||||
formats=args.formats,
|
||||
position=args.position
|
||||
)
|
||||
|
||||
if result['success']:
|
||||
print(f"\n✅ 批量字幕提取完成!")
|
||||
print(f"📁 输出目录: {args.output}")
|
||||
print(f"📊 成功处理: {result['success_count']}/{result['total_files']} 个视频")
|
||||
if result['failed_count'] > 0:
|
||||
print(f"❌ 失败: {result['failed_count']} 个")
|
||||
print(f"⏱️ 总耗时: {result['total_time']:.2f} 秒")
|
||||
print(f"📝 字幕片段: {result['total_subtitles']} 个")
|
||||
print(f"📏 文本长度: {result['total_text_length']} 字符")
|
||||
print(f"📍 位置信息统计:")
|
||||
print(f" 总字幕数: {result['total_subtitles_raw']}")
|
||||
print(f" 有位置信息: {result['total_subtitles_with_bbox']}")
|
||||
print(f" 位置信息覆盖率: {result['overall_bbox_coverage']:.1f}%")
|
||||
else:
|
||||
print(f"\n❌ 批量处理失败: {result.get('message', '未知错误')}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"批量处理出错: {str(e)}")
|
||||
print(f"\n❌ 批量处理出错: {str(e)}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
545
code/copy_video.py
Normal file
545
code/copy_video.py
Normal file
@ -0,0 +1,545 @@
|
||||
from openai import OpenAI
|
||||
import os
|
||||
import base64
|
||||
import time
|
||||
from datetime import datetime
|
||||
|
||||
|
||||
# Base64 编码格式
|
||||
def encode_video(video_path):
|
||||
with open(video_path, "rb") as video_file:
|
||||
return base64.b64encode(video_file.read()).decode("utf-8")
|
||||
|
||||
def encode_audio(audio_path):
|
||||
with open(audio_path, "rb") as audio_file:
|
||||
return base64.b64encode(audio_file.read()).decode("utf-8")
|
||||
|
||||
def read_txt_file(txt_path):
|
||||
"""读取txt文件内容"""
|
||||
try:
|
||||
with open(txt_path, 'r', encoding='utf-8') as file:
|
||||
content = file.read()
|
||||
print(f"成功读取txt文件: {txt_path}")
|
||||
print(f"文件内容长度: {len(content)} 字符")
|
||||
return content
|
||||
except FileNotFoundError:
|
||||
print(f"错误: 找不到文件 {txt_path}")
|
||||
return ""
|
||||
except Exception as e:
|
||||
print(f"读取文件时出错: {e}")
|
||||
return ""
|
||||
|
||||
def read_json_file(json_path):
|
||||
"""读取JSON文件内容"""
|
||||
try:
|
||||
import json
|
||||
with open(json_path, 'r', encoding='utf-8') as file:
|
||||
data = json.load(file)
|
||||
print(f"成功读取JSON文件: {json_path}")
|
||||
return data
|
||||
except FileNotFoundError:
|
||||
print(f"错误: 找不到文件 {json_path}")
|
||||
return None
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"JSON解析错误: {e}")
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f"读取JSON文件时出错: {e}")
|
||||
return None
|
||||
|
||||
def format_speech_json(speech_data):
|
||||
"""格式化口播转文字JSON数据(支持SenseVoice格式)"""
|
||||
if not speech_data:
|
||||
return ""
|
||||
|
||||
formatted_text = "【口播转文字内容】\n"
|
||||
|
||||
if isinstance(speech_data, dict):
|
||||
# 新SenseVoice格式 - 处理raw_result
|
||||
if 'raw_result' in speech_data:
|
||||
raw_result = speech_data['raw_result']
|
||||
if isinstance(raw_result, list) and len(raw_result) > 0:
|
||||
# 提取所有文本内容
|
||||
all_texts = []
|
||||
for item in raw_result:
|
||||
if isinstance(item, dict) and 'text' in item:
|
||||
text = item['text']
|
||||
# 清理SenseVoice的特殊标签
|
||||
import re
|
||||
clean_text = re.sub(r'<\|[^|]+\|>', '', text)
|
||||
clean_text = ' '.join(clean_text.split())
|
||||
if clean_text.strip():
|
||||
all_texts.append(clean_text.strip())
|
||||
|
||||
if all_texts:
|
||||
formatted_text += f"完整转录文本: {' '.join(all_texts)}\n"
|
||||
|
||||
# 基本信息
|
||||
if 'model' in speech_data:
|
||||
formatted_text += f"转录模型: {speech_data['model']}\n"
|
||||
|
||||
if 'transcribe_time' in speech_data:
|
||||
formatted_text += f"转录耗时: {speech_data['transcribe_time']:.3f}秒\n"
|
||||
|
||||
if 'file_path' in speech_data:
|
||||
formatted_text += f"音频文件: {speech_data['file_path']}\n"
|
||||
|
||||
# 旧SenseVoice格式(兼容)
|
||||
elif 'clean_text' in speech_data:
|
||||
formatted_text += f"完整转录文本: {speech_data['clean_text']}\n"
|
||||
|
||||
if 'model' in speech_data:
|
||||
formatted_text += f"转录模型: {speech_data['model']}\n"
|
||||
|
||||
if 'transcribe_time' in speech_data:
|
||||
formatted_text += f"转录耗时: {speech_data['transcribe_time']:.3f}秒\n"
|
||||
|
||||
# 情绪分析
|
||||
if 'emotions' in speech_data and speech_data['emotions']:
|
||||
emotions = [emotion.get('emotion', '') for emotion in speech_data['emotions']]
|
||||
formatted_text += f"情绪分析: {', '.join(emotions)}\n"
|
||||
|
||||
# 背景事件
|
||||
if 'events' in speech_data and speech_data['events']:
|
||||
events = [event.get('event', '') for event in speech_data['events']]
|
||||
formatted_text += f"音频事件: {', '.join(events)}\n"
|
||||
|
||||
# 如果是字幕提取器的格式(备用)
|
||||
elif 'continuous_text' in speech_data:
|
||||
formatted_text += f"完整文本: {speech_data['continuous_text']}\n"
|
||||
|
||||
if 'stats' in speech_data:
|
||||
stats = speech_data['stats']
|
||||
formatted_text += f"统计信息: 检测数量{stats.get('filtered_detections', 0)}个,"
|
||||
formatted_text += f"平均置信度{stats.get('average_confidence', 0):.3f}\n"
|
||||
|
||||
return formatted_text
|
||||
|
||||
def format_whisper_json(whisper_data):
|
||||
"""格式化Whisper口播转文字JSON数据"""
|
||||
if not whisper_data:
|
||||
return ""
|
||||
|
||||
formatted_text = "【Whisper口播转文字内容】\n"
|
||||
|
||||
if isinstance(whisper_data, dict):
|
||||
# 基本信息
|
||||
# 详细时间轴 - 显示所有片段
|
||||
if 'segments' in whisper_data and len(whisper_data['segments']) > 0:
|
||||
formatted_text += "\n详细时间轴:\n"
|
||||
for segment in whisper_data['segments']:
|
||||
segment_id = segment.get('id', 0)
|
||||
start_time = segment.get('start', 0)
|
||||
end_time = segment.get('end', 0)
|
||||
text = segment.get('text', '')
|
||||
formatted_text += f" id:{segment_id}, start:{start_time:.2f}, end:{end_time:.2f}, text:{text}\n"
|
||||
|
||||
return formatted_text
|
||||
|
||||
def format_ocr_json(ocr_data):
|
||||
"""格式化OCR字幕转文字JSON数据"""
|
||||
if not ocr_data:
|
||||
return ""
|
||||
|
||||
formatted_text = "【OCR字幕识别内容】\n"
|
||||
|
||||
# 如果是字幕提取器的格式
|
||||
if isinstance(ocr_data, dict):
|
||||
# 显示使用的OCR引擎
|
||||
# if 'ocr_engine' in ocr_data:
|
||||
# formatted_text += f"OCR引擎: {ocr_data['ocr_engine']}\n"
|
||||
|
||||
if 'continuous_text' in ocr_data:
|
||||
formatted_text += f"完整字幕文本: {ocr_data['continuous_text']}\n"
|
||||
|
||||
# if 'subtitles' in ocr_data and len(ocr_data['subtitles']) > 0:
|
||||
# formatted_text += "详细字幕时间轴:\n"
|
||||
# for subtitle in ocr_data['subtitles'][:10]: # 只显示前10个,避免过长
|
||||
# timestamp = subtitle.get('timestamp', 0)
|
||||
# text = subtitle.get('text', '')
|
||||
# engine = subtitle.get('engine', '')
|
||||
# confidence = subtitle.get('confidence', 0)
|
||||
# formatted_text += f" {timestamp:.2f}s [{engine}|{confidence:.3f}]: {text}\n"
|
||||
|
||||
# if len(ocr_data['subtitles']) > 10:
|
||||
# formatted_text += f" ... (还有{len(ocr_data['subtitles']) - 10}个字幕片段)\n"
|
||||
|
||||
return formatted_text
|
||||
|
||||
def format_clip_json(clip_data):
|
||||
"""格式化视频转场分析JSON数据"""
|
||||
if not clip_data:
|
||||
return ""
|
||||
|
||||
formatted_text = "【视频转场分析内容】\n"
|
||||
|
||||
if isinstance(clip_data, dict):
|
||||
# 显示视频基本信息
|
||||
if 'video_name' in clip_data:
|
||||
formatted_text += f"视频名称: {clip_data['video_name']}\n"
|
||||
|
||||
if 'analysis_time' in clip_data:
|
||||
formatted_text += f"分析时间: {clip_data['analysis_time']}\n"
|
||||
|
||||
if 'total_scenes' in clip_data:
|
||||
formatted_text += f"检测到场景数: {clip_data['total_scenes']} 个\n"
|
||||
|
||||
# 详细场景信息
|
||||
if 'scenes' in clip_data and len(clip_data['scenes']) > 0:
|
||||
formatted_text += "\n详细场景信息:\n"
|
||||
for i, scene in enumerate(clip_data['scenes'], 1):
|
||||
formatted_text += f"scenes {i}:\n"
|
||||
formatted_text += f" start_time: {scene.get('start_time', 0):.2f}秒\n"
|
||||
formatted_text += f" end_time: {scene.get('end_time', 0):.2f}秒\n"
|
||||
formatted_text += f" duration: {scene.get('duration', 0):.2f}秒\n"
|
||||
formatted_text += f" type: {scene.get('type')}\n"
|
||||
formatted_text += "\n"
|
||||
|
||||
return formatted_text
|
||||
|
||||
def save_result_to_txt(response_text, video_path, save_dir="/root/autodl-tmp/final_output"):
|
||||
"""将分析结果保存为TXT文件"""
|
||||
# 创建保存目录
|
||||
os.makedirs(save_dir, exist_ok=True)
|
||||
|
||||
# 生成文件名(基于视频文件名和时间戳)
|
||||
video_name = os.path.splitext(os.path.basename(video_path))[0]
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
txt_filename = f"{video_name}_analysis_{timestamp}.txt"
|
||||
txt_path = os.path.join(save_dir, txt_filename)
|
||||
|
||||
# 准备保存内容(添加头部信息)
|
||||
content = f"""视频分析结果
|
||||
=====================================
|
||||
视频文件: {video_path}
|
||||
分析时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
|
||||
=====================================
|
||||
|
||||
{response_text}
|
||||
"""
|
||||
|
||||
# 保存到文件
|
||||
try:
|
||||
with open(txt_path, 'w', encoding='utf-8') as f:
|
||||
f.write(content)
|
||||
print(f"\n✅ 分析结果已保存到: {txt_path}")
|
||||
return txt_path
|
||||
except Exception as e:
|
||||
print(f"\n❌ 保存TXT文件失败: {e}")
|
||||
return None
|
||||
|
||||
STREAM_MODE = True
|
||||
|
||||
# 文件路径配置
|
||||
video_path = "/root/autodl-tmp/new/老挝泼水节.mp4"
|
||||
#audio_path = "/root/autodl-tmp/video2audio/sample_demo_6.wav"
|
||||
#txt_path = "/root/autodl-tmp/hot_video_analyse/source/example_reference.txt" # 使用示例参考文档
|
||||
|
||||
# JSON文件路径配置
|
||||
speech_json_path = "/root/autodl-tmp/new_sensevoice/老挝泼水节_sensevoice.json" # 口播转文字JSON文件
|
||||
ocr_json_path = "/root/autodl-tmp/new_cnocr/老挝泼水节_subtitles.json" # OCR字幕转文字JSON文件
|
||||
#clip_json_path = "/root/autodl-tmp/02_VideoSplitter/VideoSplitter_output/shou_gonglve_3_scenes.json"
|
||||
whisper_json_path = "/root/autodl-tmp/new_whisper/老挝泼水节_transcript.json" # Whisper转文字JSON文件
|
||||
|
||||
# 编码文件
|
||||
print("开始编码文件...")
|
||||
encode_start_time = time.time()
|
||||
|
||||
base64_video = encode_video(video_path)
|
||||
#base64_audio = encode_audio(audio_path)
|
||||
#txt_content = read_txt_file(txt_path)
|
||||
|
||||
#读取JSON文件内容
|
||||
print("读取JSON文件...")
|
||||
speech_data = read_json_file(speech_json_path)
|
||||
ocr_data = read_json_file(ocr_json_path)
|
||||
#clip_data = read_json_file(clip_json_path)
|
||||
whisper_data = read_json_file(whisper_json_path)
|
||||
|
||||
# 格式化JSON内容
|
||||
speech_content = format_speech_json(speech_data)
|
||||
ocr_content = format_ocr_json(ocr_data)
|
||||
#clip_content = format_clip_json(clip_data)
|
||||
whisper_content = format_whisper_json(whisper_data)
|
||||
|
||||
# # 合并内容
|
||||
txt_content = ""
|
||||
# if speech_content:
|
||||
# txt_content += speech_content + "\n\n"
|
||||
if ocr_content:
|
||||
txt_content += ocr_content + "\n\n"
|
||||
# if clip_content:
|
||||
# txt_content += clip_content + "\n\n"
|
||||
if whisper_content:
|
||||
txt_content += whisper_content + "\n\n"
|
||||
|
||||
print(f"合并后的参考内容长度: {len(txt_content)} 字符")
|
||||
print(txt_content)
|
||||
encode_end_time = time.time()
|
||||
encode_duration = encode_end_time - encode_start_time
|
||||
print(f"文件编码完成,耗时: {encode_duration:.2f} 秒")
|
||||
|
||||
|
||||
client = OpenAI(
|
||||
# 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx"
|
||||
api_key="EMPTY",
|
||||
base_url="http://localhost:8000/v1",
|
||||
)
|
||||
|
||||
# 构建content列表
|
||||
content_list = [
|
||||
{
|
||||
# 直接传入视频文件时,请将type的值设置为video_url
|
||||
"type": "video_url",
|
||||
"video_url": {"url": f"data:video/mp4;base64,{base64_video}"},
|
||||
}
|
||||
|
||||
# ,
|
||||
# {
|
||||
# "type": "audio_url",
|
||||
# "audio_url": {"url": f"data:audio/wav;base64,{base64_audio}"},
|
||||
# }
|
||||
]
|
||||
|
||||
# 如果txt文件有内容,添加到content中
|
||||
if txt_content.strip():
|
||||
content_list.append({
|
||||
"type": "text",
|
||||
"text": f"参考资料内容:\n{txt_content}\n\n",
|
||||
"need": "第一部分是视频内容,第二部分是视频的字幕时间轴内容,第三部分是口播的字幕时间轴内容"
|
||||
})
|
||||
|
||||
# 添加主要提示文本
|
||||
content_list.append({
|
||||
"type": "text",
|
||||
"text": """🎥 **抖音短视频内容分析专家**
|
||||
## 任务背景
|
||||
您是一位经验丰富的视频导演和编辑,需要基于以上两个时间轴数据,和视频内容。为视频写一个完整、流畅的脚本。
|
||||
请对这个抖音短视频进行详细的内容分析,重点关注以下三个方面:
|
||||
## 🎤 一、口播内容提取
|
||||
请仔细听取视频中的语音内容,完整转录:
|
||||
- **完整口播转录**:参考口播的字幕时间轴内容和视频内容,逐字逐句转录所有口语表达
|
||||
- **语音时长**:估算总的讲话时长
|
||||
## 📝 二、字幕文字识别
|
||||
请识别视频画面中出现的所有文字内容:
|
||||
- **屏幕字幕**:参考口播的字幕时间轴内容和视频内容,识别字幕
|
||||
- **标题文字**:识别停靠时间稍长的,视频开头、中间、结尾出现的大标题和贴图。
|
||||
|
||||
## 🎬 三、转场效果分析
|
||||
请仔细观察视频中的转场效果,并且结合参考资料中的时间轴内容,请你整体分析一下视频。比如几个画面出现第一个转场等.
|
||||
转场的time_start","time_end","textIdx"请严格按照参考资料中的口播内容的时间戳start,end,id和字幕内容的时间戳“开始时间“,”结束时间“
|
||||
填写,不要自己生成。
|
||||
|
||||
|
||||
## 📊 输出格式要求
|
||||
|
||||
## 视频内容分析
|
||||
请按照以下JSON格式输出视频描述:
|
||||
|
||||
{
|
||||
"total_Oral broadcasting":"请你生成一个完整的口播内容。",
|
||||
"summary": "请用一句话总结视频的核心内容,突出视频的主要卖点和价值主张",
|
||||
"content": [
|
||||
{
|
||||
"type": "cut",
|
||||
"scenes": 1,
|
||||
"time_start": 0.0,
|
||||
"time_end": 2.0,
|
||||
"talk": "请将对应时间的口播或字幕信息,填入此",
|
||||
"description": "详细描述这个镜头的画面内容、人物动作、场景特点等"
|
||||
},
|
||||
|
||||
{
|
||||
"type": "cut",
|
||||
"scenes": 2,
|
||||
"time_start": 2.0,
|
||||
"time_end": 4.5,
|
||||
"talk": "请将对应时间的口播或字幕信息,填入此",
|
||||
"description": "描述这个镜头的具体内容,包括画面细节、转场效果等"
|
||||
},
|
||||
|
||||
{
|
||||
"type": "cut",
|
||||
"scenes": 3,
|
||||
"time_start": 4.5,
|
||||
"time_end": 6.0,
|
||||
"talk": "请将对应时间的口播或字幕信息,填入此",
|
||||
"description": "描述这个镜头的具体内容,包括画面细节、转场效果等"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
## 输出要求
|
||||
1. summary:用一句话概括视频核心内容,突出主要卖点
|
||||
2. content:按时间顺序交替描述镜头和转场
|
||||
- 镜头(lens)描述:
|
||||
* textIdx:镜头序号,从1开始递增
|
||||
* time_start:开始时间(秒),精确到小数点后一位
|
||||
* time_end:结束时间(秒),精确到小数点后一位
|
||||
* talk:该镜头中的对话或文字内容
|
||||
* description:详细描述镜头内容,包括:
|
||||
- 画面构图和场景
|
||||
- 人物动作和表情
|
||||
- 重要道具和元素
|
||||
- 特殊效果和转场
|
||||
|
||||
|
||||
## 注意事项
|
||||
1. 保持描述简洁明了,但要有足够的细节
|
||||
2. 突出视频的亮点和特色
|
||||
3. 确保时间戳的准确性
|
||||
4. 对话内容要符合视频画面
|
||||
5. 整体风格要统一连贯
|
||||
6. 每个镜头的描述要包含关键信息
|
||||
|
||||
## 示例内容描述
|
||||
1. 镜头1:
|
||||
- 开场特写镜头,展示产品外观
|
||||
- 画面从模糊到清晰,突出产品细节
|
||||
- 背景音乐渐入,营造氛围
|
||||
- 文字提示:"全新升级,品质保证"
|
||||
|
||||
2. 转场1-2:
|
||||
- 类型:平滑滑动
|
||||
- 目的:自然过渡到使用场景
|
||||
- 效果:画面从产品特写平滑滑向使用场景
|
||||
|
||||
3. 镜头2:
|
||||
- 中景展示使用场景
|
||||
- 人物自然流畅的动作展示
|
||||
- 光线明亮,突出产品效果
|
||||
- 文字说明:"简单操作,轻松上手"
|
||||
|
||||
4. 转场2-3:
|
||||
- 类型:快速缩放
|
||||
- 目的:突出产品核心功能
|
||||
- 效果:画面快速聚焦到产品关键部位
|
||||
|
||||
5. 镜头3:
|
||||
- 特写展示产品核心功能
|
||||
- 慢动作展示关键细节
|
||||
- 画面色彩鲜明,对比强烈
|
||||
- 文字强调:"专业性能,值得信赖"
|
||||
|
||||
请根据以上要求,分析视频并输出JSON格式的描述。
|
||||
|
||||
请开始详细分析这个抖音短视频:"""
|
||||
})
|
||||
|
||||
print(f"\n开始请求API...")
|
||||
print(f"请求时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
print(f"Stream模式: {STREAM_MODE}")
|
||||
print(f"Content项目数量: {len(content_list)}")
|
||||
|
||||
# 记录API请求开始时间
|
||||
api_start_time = time.time()
|
||||
completion = client.chat.completions.create(
|
||||
model="/root/autodl-tmp/llm/Qwen-omni",
|
||||
messages=[
|
||||
{
|
||||
"role": "system",
|
||||
"content": [{"type":"text","text": "You are a helpful assistant."}]
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": content_list
|
||||
}
|
||||
],
|
||||
stream=STREAM_MODE,
|
||||
stream_options={"include_usage": True} if STREAM_MODE else None,
|
||||
temperature=0.3
|
||||
)
|
||||
|
||||
if STREAM_MODE:
|
||||
# 流式输出 - 拼接完整回复
|
||||
full_response = ""
|
||||
usage_info = None
|
||||
|
||||
# 记录第一个token的时间
|
||||
first_token_time = None
|
||||
token_count = 0
|
||||
|
||||
print("正在生成回复...")
|
||||
for chunk in completion:
|
||||
if chunk.choices:
|
||||
delta = chunk.choices[0].delta
|
||||
if delta.content:
|
||||
# 记录第一个token的时间
|
||||
if first_token_time is None:
|
||||
first_token_time = time.time()
|
||||
first_token_delay = first_token_time - api_start_time
|
||||
print(f"首个token延迟: {first_token_delay:.2f} 秒")
|
||||
|
||||
# 拼接内容
|
||||
full_response += delta.content
|
||||
token_count += 1
|
||||
|
||||
# 实时显示(可选)
|
||||
#print(delta.content, end='', flush=True)
|
||||
else:
|
||||
# 保存使用情况信息
|
||||
usage_info = chunk.usage
|
||||
|
||||
# 记录API请求结束时间
|
||||
api_end_time = time.time()
|
||||
total_duration = api_end_time - api_start_time
|
||||
|
||||
# 输出完整的响应
|
||||
print("\n" + "="*50)
|
||||
print("完整回复:")
|
||||
print("="*50)
|
||||
print(full_response)
|
||||
|
||||
# 保存结果为TXT文件
|
||||
txt_file_path = save_result_to_txt(full_response + "total_duration:" + str(total_duration), video_path)
|
||||
|
||||
# 输出时间统计信息
|
||||
print("\n" + "="*50)
|
||||
print("⏱️ 时间统计:")
|
||||
print("="*50)
|
||||
print(f"📁 文件编码时间: {encode_duration:.2f} 秒")
|
||||
if first_token_time:
|
||||
print(f"🚀 首个token延迟: {first_token_delay:.2f} 秒")
|
||||
generation_time = api_end_time - first_token_time
|
||||
print(f"⚡ 内容生成时间: {generation_time:.2f} 秒")
|
||||
print(f"🕐 API总响应时间: {total_duration:.2f} 秒")
|
||||
print(f"📊 生成token数量: {token_count}")
|
||||
if first_token_time and token_count > 0:
|
||||
tokens_per_second = token_count / generation_time
|
||||
print(f"🔥 生成速度: {tokens_per_second:.2f} tokens/秒")
|
||||
print(f"⏰ 完成时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
|
||||
# 输出使用情况信息
|
||||
if usage_info:
|
||||
print("\n" + "="*50)
|
||||
print("📈 使用情况:")
|
||||
print("="*50)
|
||||
print(usage_info)
|
||||
|
||||
else:
|
||||
# 非流式输出 - 直接输出完整响应
|
||||
api_end_time = time.time()
|
||||
total_duration = api_end_time - api_start_time
|
||||
|
||||
print("非流式输出模式:")
|
||||
print("完整回复:")
|
||||
print("="*50)
|
||||
print(completion.choices[0].message.content)
|
||||
|
||||
# 保存结果为TXT文件
|
||||
txt_file_path = save_result_to_txt(completion.choices[0].message.content, video_path)
|
||||
|
||||
# 输出时间统计信息
|
||||
print("\n" + "="*50)
|
||||
print("⏱️ 时间统计:")
|
||||
print("="*50)
|
||||
print(f"📁 文件编码时间: {encode_duration:.2f} 秒")
|
||||
print(f"🕐 API总响应时间: {total_duration:.2f} 秒")
|
||||
print(f"⏰ 完成时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
|
||||
# 输出使用情况信息
|
||||
if hasattr(completion, 'usage') and completion.usage:
|
||||
print("\n" + "="*50)
|
||||
print("📈 使用情况:")
|
||||
print("="*50)
|
||||
print(completion.usage)
|
||||
194
code/director_prompt.py
Normal file
194
code/director_prompt.py
Normal file
@ -0,0 +1,194 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
导演编辑提示词生成器
|
||||
整合OCR字幕和Whisper口播时间轴,生成专业视频脚本
|
||||
"""
|
||||
|
||||
def generate_director_prompt(ocr_timeline, whisper_timeline, video_info=None):
|
||||
"""
|
||||
生成导演编辑提示词
|
||||
|
||||
Args:
|
||||
ocr_timeline: OCR字幕时间轴数据
|
||||
whisper_timeline: Whisper口播时间轴数据
|
||||
video_info: 视频基本信息(可选)
|
||||
|
||||
Returns:
|
||||
str: 导演编辑提示词
|
||||
"""
|
||||
|
||||
prompt = f"""
|
||||
# 导演编辑任务:视频脚本整合与优化
|
||||
|
||||
## 任务背景
|
||||
您是一位经验丰富的视频导演和编辑,需要基于以下两个时间轴数据,为视频创作一个完整、流畅的脚本。
|
||||
|
||||
## 输入数据
|
||||
|
||||
### OCR字幕时间轴(视觉文字内容)
|
||||
{ocr_timeline}
|
||||
|
||||
### Whisper口播时间轴(音频内容)
|
||||
{whisper_timeline}
|
||||
|
||||
{video_info if video_info else ""}
|
||||
|
||||
## 导演编辑要求
|
||||
|
||||
### 1. 内容整合策略
|
||||
- **时间同步**:将OCR字幕与Whisper口播按时间轴精确对齐
|
||||
- **内容互补**:识别字幕与口播的重复、补充和差异部分
|
||||
- **信息完整性**:确保重要信息不遗漏,避免重复冗余
|
||||
|
||||
### 2. 脚本结构设计
|
||||
- **开场设计**:基于前3秒内容设计引人入胜的开场
|
||||
- **节奏控制**:根据时间轴密度调整内容节奏
|
||||
- **高潮设置**:识别关键信息点,设计内容高潮
|
||||
- **结尾收束**:基于最后内容设计有力结尾
|
||||
|
||||
### 3. 语言风格优化
|
||||
- **口语化处理**:将OCR识别文字转换为自然口语表达
|
||||
- **情感表达**:根据内容调整语调、语速和情感色彩
|
||||
- **文化适配**:考虑目标受众,调整表达方式
|
||||
|
||||
### 4. 视觉与音频协调
|
||||
- **字幕时机**:优化字幕出现时机,与音频节奏配合
|
||||
- **重点突出**:识别关键信息,在脚本中重点标注
|
||||
- **转场设计**:设计自然的内容转场和过渡
|
||||
|
||||
## 输出格式要求
|
||||
|
||||
请按以下格式输出脚本:
|
||||
|
||||
### 完整脚本
|
||||
```
|
||||
[时间戳] [角色/场景] [内容]
|
||||
```
|
||||
|
||||
### 脚本分析
|
||||
- **内容概览**:简要总结视频核心内容
|
||||
- **关键信息点**:列出3-5个最重要的信息
|
||||
- **目标受众**:分析适合的观众群体
|
||||
- **传播建议**:提供传播和推广建议
|
||||
|
||||
### 技术参数
|
||||
- **总时长**:基于时间轴计算
|
||||
- **内容密度**:评估信息密度是否合适
|
||||
- **节奏分析**:分析内容节奏变化
|
||||
|
||||
## 创作原则
|
||||
1. **真实性**:保持原始内容的真实性,不添加虚构信息
|
||||
2. **流畅性**:确保脚本逻辑清晰,表达流畅
|
||||
3. **吸引力**:增强内容的吸引力和传播性
|
||||
4. **专业性**:体现专业导演的编辑水平
|
||||
|
||||
请基于以上要求,创作一个完整、专业的视频脚本。
|
||||
"""
|
||||
|
||||
return prompt
|
||||
|
||||
def format_timeline_for_prompt(timeline_data, timeline_type):
|
||||
"""
|
||||
格式化时间轴数据用于提示词
|
||||
|
||||
Args:
|
||||
timeline_data: 时间轴数据
|
||||
timeline_type: 时间轴类型 ("OCR" 或 "Whisper")
|
||||
|
||||
Returns:
|
||||
str: 格式化的时间轴文本
|
||||
"""
|
||||
if timeline_type == "OCR":
|
||||
formatted = "OCR字幕识别结果:\n"
|
||||
for entry in timeline_data:
|
||||
timestamp = entry.get('timestamp', 0)
|
||||
contents = entry.get('contents', [])
|
||||
formatted += f"时间点 {timestamp:.2f}s:\n"
|
||||
for content in contents:
|
||||
text = content.get('text', '')
|
||||
bbox = content.get('bbox', [])
|
||||
formatted += f" - 文字: '{text}'\n"
|
||||
if bbox:
|
||||
formatted += f" 位置: {bbox}\n"
|
||||
formatted += "\n"
|
||||
|
||||
elif timeline_type == "Whisper":
|
||||
formatted = "Whisper语音识别结果:\n"
|
||||
for i, entry in enumerate(timeline_data):
|
||||
start_time = entry.get('start', 0)
|
||||
end_time = entry.get('end', 0)
|
||||
text = entry.get('text', '')
|
||||
formatted += f" id:{i}, start:{start_time:.2f}, end:{end_time:.2f}, text:{text}\n"
|
||||
|
||||
return formatted
|
||||
|
||||
def create_video_script_prompt(ocr_json_path, whisper_data=None):
|
||||
"""
|
||||
创建完整的视频脚本提示词
|
||||
|
||||
Args:
|
||||
ocr_json_path: OCR JSON文件路径
|
||||
whisper_data: Whisper识别数据(可选)
|
||||
|
||||
Returns:
|
||||
str: 完整的导演编辑提示词
|
||||
"""
|
||||
# 读取OCR数据
|
||||
from pre_data_1 import read_json_file, format_ocr_json
|
||||
|
||||
ocr_data = read_json_file(ocr_json_path)
|
||||
if not ocr_data:
|
||||
return "错误:无法读取OCR数据文件"
|
||||
|
||||
# 格式化OCR时间轴
|
||||
_, subtitle_array = format_ocr_json(ocr_data)
|
||||
ocr_timeline = format_timeline_for_prompt(subtitle_array, "OCR")
|
||||
|
||||
# 格式化Whisper时间轴(如果有)
|
||||
whisper_timeline = ""
|
||||
if whisper_data:
|
||||
whisper_timeline = format_timeline_for_prompt(whisper_data, "Whisper")
|
||||
else:
|
||||
whisper_timeline = "(暂无Whisper数据)"
|
||||
|
||||
# 视频基本信息
|
||||
video_info = f"""
|
||||
### 视频基本信息
|
||||
- 文件路径: {ocr_json_path}
|
||||
- OCR引擎: {ocr_data.get('ocr_engine', 'Unknown')}
|
||||
- 视频时长: {ocr_data.get('duration', 0):.2f}秒
|
||||
- 视频分辨率: {ocr_data.get('frame_width', 0)}x{ocr_data.get('frame_height', 0)}
|
||||
- 视频帧率: {ocr_data.get('fps', 0):.2f}FPS
|
||||
"""
|
||||
|
||||
# 生成导演提示词
|
||||
prompt = generate_director_prompt(ocr_timeline, whisper_timeline, video_info)
|
||||
|
||||
return prompt
|
||||
|
||||
# 示例使用
|
||||
if __name__ == "__main__":
|
||||
# 示例Whisper数据(实际使用时应该从文件读取)
|
||||
example_whisper_data = [
|
||||
{"start": 0.00, "end": 1.80, "text": "潑水街不只有云南"},
|
||||
{"start": 1.80, "end": 3.56, "text": "老窝更远更传统"},
|
||||
{"start": 3.56, "end": 5.64, "text": "快来接触这份湿身快乐"},
|
||||
# ... 更多数据
|
||||
]
|
||||
|
||||
# 生成提示词
|
||||
prompt = create_video_script_prompt(
|
||||
"/root/autodl-tmp/new_cnocr/老挝泼水节_subtitles.json",
|
||||
example_whisper_data
|
||||
)
|
||||
|
||||
# 保存提示词到文件
|
||||
import os
|
||||
output_path = "/root/autodl-tmp/new_cnocr/director_prompt.txt"
|
||||
with open(output_path, 'w', encoding='utf-8') as f:
|
||||
f.write(prompt)
|
||||
|
||||
print(f"导演编辑提示词已保存到: {output_path}")
|
||||
print("\n提示词预览(前500字符):")
|
||||
print(prompt[:500] + "...")
|
||||
316
code/pre_data_1.py
Normal file
316
code/pre_data_1.py
Normal file
@ -0,0 +1,316 @@
|
||||
import os
|
||||
|
||||
def read_json_file(json_path):
|
||||
"""读取JSON文件内容"""
|
||||
try:
|
||||
import json
|
||||
with open(json_path, 'r', encoding='utf-8') as file:
|
||||
data = json.load(file)
|
||||
print(f"成功读取JSON文件: {json_path}")
|
||||
return data
|
||||
except FileNotFoundError:
|
||||
print(f"错误: 找不到文件 {json_path}")
|
||||
return None
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"JSON解析错误: {e}")
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f"读取JSON文件时出错: {e}")
|
||||
return None
|
||||
|
||||
def calculate_text_similarity(text1, text2):
|
||||
"""
|
||||
计算两个文本的相似度(使用Jaccard相似度)
|
||||
|
||||
Args:
|
||||
text1: 第一个文本
|
||||
text2: 第二个文本
|
||||
|
||||
Returns:
|
||||
float: 相似度值 (0-1之间)
|
||||
"""
|
||||
# 检查空文本
|
||||
if not text1 or not text2:
|
||||
return 0.0
|
||||
|
||||
# 清理文本,移除空白字符
|
||||
text1 = text1.strip()
|
||||
text2 = text2.strip()
|
||||
|
||||
if not text1 or not text2:
|
||||
return 0.0
|
||||
|
||||
# 如果两个文本完全相同
|
||||
if text1 == text2:
|
||||
return 1.0
|
||||
|
||||
# 将文本转换为字符集合
|
||||
chars1 = set(text1)
|
||||
chars2 = set(text2)
|
||||
|
||||
# 计算Jaccard相似度
|
||||
intersection = len(chars1.intersection(chars2))
|
||||
union = len(chars1.union(chars2))
|
||||
|
||||
similarity = intersection / union if union > 0 else 0.0
|
||||
return similarity
|
||||
|
||||
def calculate_iou(box1, box2):
|
||||
"""
|
||||
计算两个边界框的IoU (Intersection over Union)
|
||||
|
||||
Args:
|
||||
box1: 第一个边界框 [x1, y1, x2, y2] 或 [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
|
||||
box2: 第二个边界框 [x1, y1, x2, y2] 或 [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
|
||||
|
||||
Returns:
|
||||
float: IoU值 (0-1之间)
|
||||
"""
|
||||
# 处理不同的输入格式
|
||||
if len(box1) == 4 and isinstance(box1[0], (int, float)):
|
||||
# 格式: [x1, y1, x2, y2]
|
||||
x1_1, y1_1, x2_1, y2_1 = box1
|
||||
elif len(box1) == 4 and isinstance(box1[0], list):
|
||||
# 格式: [[x1,y1], [x2,y2], [x3,y3], [x4,y4]] - 取最小和最大坐标
|
||||
x_coords = [point[0] for point in box1]
|
||||
y_coords = [point[1] for point in box1]
|
||||
x1_1, x2_1 = min(x_coords), max(x_coords)
|
||||
y1_1, y2_1 = min(y_coords), max(y_coords)
|
||||
else:
|
||||
raise ValueError("box1格式错误,应为[x1,y1,x2,y2]或[[x1,y1],[x2,y2],[x3,y3],[x4,y4]]")
|
||||
|
||||
if len(box2) == 4 and isinstance(box2[0], (int, float)):
|
||||
# 格式: [x1, y1, x2, y2]
|
||||
x1_2, y1_2, x2_2, y2_2 = box2
|
||||
elif len(box2) == 4 and isinstance(box2[0], list):
|
||||
# 格式: [[x1,y1], [x2,y2], [x3,y3], [x4,y4]] - 取最小和最大坐标
|
||||
x_coords = [point[0] for point in box2]
|
||||
y_coords = [point[1] for point in box2]
|
||||
x1_2, x2_2 = min(x_coords), max(x_coords)
|
||||
y1_2, y2_2 = min(y_coords), max(y_coords)
|
||||
else:
|
||||
raise ValueError("box2格式错误,应为[x1,y1,x2,y2]或[[x1,y1],[x2,y2],[x3,y3],[x4,y4]]")
|
||||
|
||||
# 计算交集区域
|
||||
x_left = max(x1_1, x1_2)
|
||||
y_top = max(y1_1, y1_2)
|
||||
x_right = min(x2_1, x2_2)
|
||||
y_bottom = min(y2_1, y2_2)
|
||||
|
||||
# 检查是否有交集
|
||||
if x_right < x_left or y_bottom < y_top:
|
||||
return 0.0
|
||||
|
||||
# 计算交集面积
|
||||
intersection_area = (x_right - x_left) * (y_bottom - y_top)
|
||||
|
||||
# 计算并集面积
|
||||
box1_area = (x2_1 - x1_1) * (y2_1 - y1_1)
|
||||
box2_area = (x2_2 - x1_2) * (y2_2 - y1_2)
|
||||
union_area = box1_area + box2_area - intersection_area
|
||||
|
||||
# 计算IoU
|
||||
iou = intersection_area / union_area if union_area > 0 else 0.0
|
||||
|
||||
return iou
|
||||
|
||||
def format_ocr_json(ocr_data):
|
||||
"""格式化OCR字幕转文字JSON数据"""
|
||||
if not ocr_data:
|
||||
return "", []
|
||||
|
||||
formatted_text = "【OCR字幕识别内容】\n"
|
||||
|
||||
# 如果是字幕提取器的格式
|
||||
if isinstance(ocr_data, dict):
|
||||
# 基本信息
|
||||
if 'ocr_engine' in ocr_data:
|
||||
formatted_text += f"OCR引擎: {ocr_data['ocr_engine']}\n"
|
||||
|
||||
if 'video_path' in ocr_data:
|
||||
formatted_text += f"视频文件: {ocr_data['video_path']}\n"
|
||||
|
||||
if 'duration' in ocr_data:
|
||||
formatted_text += f"视频时长: {ocr_data['duration']:.2f}秒\n"
|
||||
|
||||
if 'fps' in ocr_data:
|
||||
formatted_text += f"视频帧率: {ocr_data['fps']:.2f}FPS\n"
|
||||
|
||||
if 'frame_width' in ocr_data and 'frame_height' in ocr_data:
|
||||
formatted_text += f"视频分辨率: {ocr_data['frame_width']}x{ocr_data['frame_height']}\n"
|
||||
|
||||
# 字幕区域信息
|
||||
if 'subtitle_position' in ocr_data:
|
||||
formatted_text += f"字幕区域: {ocr_data['subtitle_position']}\n"
|
||||
|
||||
if 'subtitle_region' in ocr_data:
|
||||
region = ocr_data['subtitle_region']
|
||||
formatted_text += f"字幕区域坐标: {region}\n"
|
||||
|
||||
# 处理参数
|
||||
if 'sample_interval' in ocr_data:
|
||||
formatted_text += f"采样间隔: {ocr_data['sample_interval']}帧\n"
|
||||
|
||||
if 'confidence_threshold' in ocr_data:
|
||||
formatted_text += f"置信度阈值: {ocr_data['confidence_threshold']}\n"
|
||||
|
||||
# 完整字幕文本
|
||||
if 'continuous_text' in ocr_data:
|
||||
formatted_text += f"\n📄 完整字幕文本:\n"
|
||||
formatted_text += f"{ocr_data['continuous_text']}\n"
|
||||
|
||||
# 详细字幕时间轴 - 按三层嵌套数组结构组织
|
||||
if 'subtitles' in ocr_data and len(ocr_data['subtitles']) > 0:
|
||||
subtitles = ocr_data['subtitles']
|
||||
|
||||
# 按时间戳分组存储
|
||||
timestamp_groups = {}
|
||||
for subtitle in subtitles:
|
||||
timestamp = subtitle.get('timestamp', 0)
|
||||
text = subtitle.get('text', '')
|
||||
confidence = subtitle.get('confidence', 0)
|
||||
engine = subtitle.get('engine', 'Unknown')
|
||||
bbox = subtitle.get('bbox', [])
|
||||
|
||||
if timestamp not in timestamp_groups:
|
||||
timestamp_groups[timestamp] = []
|
||||
|
||||
# 第三层:内容和位置
|
||||
subtitle_content = {
|
||||
'text': text,
|
||||
'bbox': bbox,
|
||||
"timestamp": timestamp
|
||||
}
|
||||
|
||||
timestamp_groups[timestamp].append(subtitle_content)
|
||||
|
||||
# 转换为三层嵌套数组结构
|
||||
subtitle_array = []
|
||||
sorted_timestamps = sorted(timestamp_groups.keys())
|
||||
|
||||
for timestamp in sorted_timestamps:
|
||||
# 第一层:时间戳
|
||||
timestamp_entry = {
|
||||
'timestamp': timestamp,
|
||||
'contents': timestamp_groups[timestamp] # 第二层:同一时间戳内的各个内容
|
||||
}
|
||||
subtitle_array.append(timestamp_entry)
|
||||
|
||||
# 显示三层嵌套数组结构
|
||||
formatted_text += f"\n⏰ 详细字幕时间轴 (三层嵌套数组结构):\n"
|
||||
|
||||
# 只显示前10个时间戳,避免过长
|
||||
display_count = min(10, len(subtitle_array))
|
||||
for i, timestamp_entry in enumerate(subtitle_array[:display_count], 1):
|
||||
timestamp = timestamp_entry['timestamp']
|
||||
contents = timestamp_entry['contents']
|
||||
|
||||
formatted_text += f" {i}. {timestamp:.2f}s:\n"
|
||||
|
||||
# 显示该时间戳下的所有字幕(第二层)
|
||||
for j, content in enumerate(contents, 1):
|
||||
text = content['text']
|
||||
bbox = content['bbox']
|
||||
|
||||
formatted_text += f" {j}. [{timestamp:.2f}s|{confidence:.3f}]: {text}\n"
|
||||
|
||||
# 如果有位置信息,显示bbox(第三层)
|
||||
if bbox:
|
||||
formatted_text += f" 位置: {bbox}\n"
|
||||
|
||||
formatted_text += "\n"
|
||||
|
||||
if len(subtitle_array) > display_count:
|
||||
formatted_text += f" ... (还有{len(subtitle_array) - display_count}个时间戳)\n"
|
||||
|
||||
# 返回三层嵌套数组结构
|
||||
return formatted_text, subtitle_array
|
||||
|
||||
return formatted_text, []
|
||||
|
||||
def merge_and_filter_subtitles(subtitle_array, iou_threshold=0.7, text_similarity_threshold=0.7):
|
||||
"""
|
||||
合并并过滤字幕内容,去除重复和空内容,返回格式化字符串和处理后的数组
|
||||
"""
|
||||
# 深拷贝,避免原地修改
|
||||
import copy
|
||||
subtitle_array = copy.deepcopy(subtitle_array)
|
||||
formatted_text = []
|
||||
|
||||
for i in range(len(subtitle_array)):
|
||||
for j in range(len(subtitle_array[i]["contents"])):
|
||||
# 修复:确保i+k不会超出数组范围
|
||||
for k in range(1, len(subtitle_array) - i): # 从1开始,避免自己和自己比较
|
||||
if i + k >= len(subtitle_array): # 安全检查
|
||||
break
|
||||
for l in range(len(subtitle_array[i+k]["contents"])):
|
||||
text = subtitle_array[i]["contents"][j]["text"]
|
||||
bbox = subtitle_array[i]["contents"][j]["bbox"]
|
||||
text_1 = subtitle_array[i+k]["contents"][l]["text"]
|
||||
bbox_1 = subtitle_array[i+k]["contents"][l]["bbox"]
|
||||
|
||||
iou = calculate_iou(bbox, bbox_1)
|
||||
text_similarity = calculate_text_similarity(text, text_1)
|
||||
|
||||
if iou > iou_threshold and text_similarity > text_similarity_threshold:
|
||||
# 记录需要删除的索引
|
||||
subtitle_array[i+k]["contents"][l]["text"] = ''
|
||||
subtitle_array[i]["contents"][j]["timestamp"] += 1
|
||||
|
||||
# 删除text为空字符串的contents
|
||||
for i in range(len(subtitle_array)):
|
||||
subtitle_array[i]["contents"] = [content for content in subtitle_array[i]["contents"] if content["text"] != '']
|
||||
|
||||
# 删除contents为空的时间戳条目
|
||||
subtitle_array = [entry for entry in subtitle_array if len(entry["contents"]) > 0]
|
||||
|
||||
#formatted_text.append("处理后的字幕数组:")
|
||||
for i, timestamp_entry in enumerate(subtitle_array[:], 1):
|
||||
formatted_text.append(f"\n开始时间 {timestamp_entry['timestamp']:.2f}s:")
|
||||
#formatted_text.append(f" 包含 {len(timestamp_entry['contents'])} 个字幕内容")
|
||||
for j, content in enumerate(timestamp_entry['contents'], 1):
|
||||
formatted_text.append(f" {j}. 文本: '{content['text']}'")
|
||||
if content['bbox']:
|
||||
formatted_text.append(f" 位置: {content['bbox']}")
|
||||
if 'timestamp' in content and content['timestamp']:
|
||||
formatted_text.append(f" 结束时间: {content['timestamp']:.2f}s")
|
||||
|
||||
#formatted_text.append("\n完整数组结构:")
|
||||
#formatted_text.append(str(subtitle_array))
|
||||
|
||||
return '\n'.join(formatted_text), subtitle_array
|
||||
|
||||
|
||||
ocr_json_path = "/root/autodl-tmp/new_cnocr/哈尔滨_subtitles.json"
|
||||
|
||||
ocr_data = read_json_file(ocr_json_path)
|
||||
pre_data , subtitle_array= format_ocr_json(ocr_data)
|
||||
|
||||
iou_threshold = 0.8
|
||||
text_similarity_threshold = 0.8
|
||||
a , b = merge_and_filter_subtitles(subtitle_array, iou_threshold, text_similarity_threshold)
|
||||
#print("\n完整数组结构:")
|
||||
print(a)
|
||||
print(b)
|
||||
|
||||
# 保存输出结果到txt文件
|
||||
|
||||
output_dir = os.path.dirname(ocr_json_path)
|
||||
output_filename = os.path.splitext(os.path.basename(ocr_json_path))[0] + "_processed.txt"
|
||||
output_path = os.path.join(output_dir, output_filename)
|
||||
|
||||
try:
|
||||
with open(output_path, 'w', encoding='utf-8') as f:
|
||||
f.write(a)
|
||||
print(f"\n处理结果已保存到: {output_path}")
|
||||
except Exception as e:
|
||||
print(f"保存文件时出错: {e}")
|
||||
|
||||
#验证 "/root/autodl-tmp/douyin_ocr/兰州_subtitles.json" 里面的重复的两个内容,确实是bbox不重叠
|
||||
# a = [[303, 243], [442, 243], [442, 303], [303, 303]]
|
||||
# b = [[339, 231], [495, 241], [490, 304], [335, 294]]
|
||||
# c = [[482, 273], [660, 276], [660, 303], [481, 300]]
|
||||
# d = [[536, 268], [732, 273], [731, 300], [535, 295]]
|
||||
|
||||
# iou = calculate_iou(a,b) # 0.47
|
||||
# d = calculate_iou(c,d) # 0.40
|
||||
127
code/token_counter.py
Normal file
127
code/token_counter.py
Normal file
@ -0,0 +1,127 @@
|
||||
import tiktoken
|
||||
import os
|
||||
import cv2
|
||||
|
||||
def count_tokens(text, model="gpt-4"):
|
||||
"""统计文本的token数量"""
|
||||
try:
|
||||
encoding = tiktoken.encoding_for_model(model)
|
||||
tokens = encoding.encode(text)
|
||||
return len(tokens)
|
||||
except Exception as e:
|
||||
print(f"Token统计出错: {e}")
|
||||
# 简单估算:中文字符约1.5个token,英文单词约1.3个token
|
||||
chinese_chars = sum(1 for char in text if '\u4e00' <= char <= '\u9fff')
|
||||
english_words = len([word for word in text.split() if word.isascii()])
|
||||
estimated_tokens = int(chinese_chars * 1.5 + english_words * 1.3)
|
||||
return estimated_tokens
|
||||
|
||||
def get_video_token_estimate(video_path):
|
||||
"""估算视频的token数量(基于文件大小和时长)"""
|
||||
try:
|
||||
cap = cv2.VideoCapture(video_path)
|
||||
if not cap.isOpened():
|
||||
return {'estimated_tokens': 0, 'duration': 0, 'frame_count': 0, 'fps': 0, 'file_size_mb': 0, 'frames_used': 0}
|
||||
|
||||
# 获取视频信息
|
||||
fps = cap.get(cv2.CAP_PROP_FPS)
|
||||
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||
duration = frame_count / fps if fps > 0 else 0
|
||||
|
||||
# 获取文件大小
|
||||
file_size = os.path.getsize(video_path)
|
||||
|
||||
cap.release()
|
||||
|
||||
# 基于GPT-4V的token估算规则
|
||||
# 视频token = 基础token + 帧数 * 每帧token
|
||||
base_tokens = 85 # 基础token
|
||||
frames_per_second = min(fps, 1) # 每秒最多1帧
|
||||
total_frames = min(frame_count, int(duration * frames_per_second))
|
||||
tokens_per_frame = 170 # 每帧约170个token
|
||||
|
||||
estimated_tokens = base_tokens + total_frames * tokens_per_frame
|
||||
|
||||
return {
|
||||
'estimated_tokens': int(estimated_tokens),
|
||||
'duration': duration,
|
||||
'frame_count': frame_count,
|
||||
'fps': fps,
|
||||
'file_size_mb': file_size / (1024 * 1024),
|
||||
'frames_used': total_frames
|
||||
}
|
||||
except Exception as e:
|
||||
print(f"视频token估算出错: {e}")
|
||||
return {'estimated_tokens': 0, 'duration': 0, 'frame_count': 0, 'fps': 0, 'file_size_mb': 0, 'frames_used': 0}
|
||||
|
||||
def analyze_input_tokens(video_path, text_content="", prompt_text=""):
|
||||
"""分析输入token统计"""
|
||||
print("\n" + "="*50)
|
||||
print("📊 Token统计信息:")
|
||||
print("="*50)
|
||||
|
||||
# 统计视频token
|
||||
video_token_info = get_video_token_estimate(video_path)
|
||||
print(f"🎬 视频Token统计:")
|
||||
print(f" 估算Token数量: {video_token_info['estimated_tokens']:,}")
|
||||
print(f" 视频时长: {video_token_info['duration']:.2f}秒")
|
||||
print(f" 总帧数: {video_token_info['frame_count']:,}")
|
||||
print(f" 帧率: {video_token_info['fps']:.2f} fps")
|
||||
print(f" 文件大小: {video_token_info['file_size_mb']:.2f} MB")
|
||||
print(f" 使用帧数: {video_token_info['frames_used']:,}")
|
||||
|
||||
# 统计文本token
|
||||
text_tokens = 0
|
||||
if text_content.strip():
|
||||
text_tokens = count_tokens(text_content)
|
||||
print(f"\n📝 文本Token统计:")
|
||||
print(f" 文本内容Token: {text_tokens:,}")
|
||||
print(f" 文本字符数: {len(text_content):,}")
|
||||
|
||||
# 统计提示词token
|
||||
prompt_tokens = 0
|
||||
if prompt_text.strip():
|
||||
prompt_tokens = count_tokens(prompt_text)
|
||||
print(f" 提示词Token: {prompt_tokens:,}")
|
||||
|
||||
video_cost = 0.0015
|
||||
text_cost = 0.0004
|
||||
total_cost = (video_token_info['estimated_tokens']*video_cost + text_tokens*text_cost + prompt_tokens*text_cost)/1000
|
||||
# 计算总输入token
|
||||
total_input_tokens = (video_token_info['estimated_tokens'] + text_tokens + prompt_tokens)
|
||||
print(f"\n📈 总输入Token统计:")
|
||||
print(f" 视频Token: {video_token_info['estimated_tokens']:,}")
|
||||
print(f" 文本Token: {text_tokens:,}")
|
||||
print(f" 提示词Token: {prompt_tokens:,}")
|
||||
print(f" 🔥 总输入Token: {total_input_tokens:,}")
|
||||
print(f" 💰 总费用: {total_cost:.4f}元")
|
||||
print("="*50)
|
||||
|
||||
return {
|
||||
'video_tokens': video_token_info['estimated_tokens'],
|
||||
'text_tokens': text_tokens,
|
||||
'prompt_tokens': prompt_tokens,
|
||||
'total_input_tokens': total_input_tokens,
|
||||
'video_info': video_token_info,
|
||||
'total_cost': total_cost
|
||||
}
|
||||
|
||||
if __name__ == "__main__":
|
||||
# 测试token统计功能
|
||||
test_video = "/root/autodl-tmp/new/哈尔滨.mp4"
|
||||
test_text = "这是一个测试文本,包含中英文内容。This is a test text with Chinese and English content."
|
||||
test_prompt = "请分析这个视频的内容。"
|
||||
|
||||
result = analyze_input_tokens(test_video, test_text, test_prompt)
|
||||
print(f"\n测试结果: {result}")
|
||||
|
||||
# video_token = result['video_tokens']
|
||||
# video_cost = 0.0015
|
||||
|
||||
# prompt_token = result['prompt_tokens']
|
||||
# text_token = result['text_tokens']
|
||||
# text_cost = 0.0004
|
||||
|
||||
# total_cost = video_token*video_cost + prompt_token*text_cost + text_token*text_cost
|
||||
|
||||
# print(total_cost)
|
||||
292
requirements.txt
Normal file
292
requirements.txt
Normal file
@ -0,0 +1,292 @@
|
||||
absl-py==2.1.0
|
||||
aiohappyeyeballs==2.6.1
|
||||
aiohttp==3.12.11
|
||||
aiosignal==1.3.2
|
||||
airportsdata==20250523
|
||||
anaconda-anon-usage @ file:///croot/anaconda-anon-usage_1710965072196/work
|
||||
annotated-types==0.7.0
|
||||
anyio==4.6.2.post1
|
||||
archspec @ file:///croot/archspec_1709217642129/work
|
||||
argon2-cffi==23.1.0
|
||||
argon2-cffi-bindings==21.2.0
|
||||
arrow==1.3.0
|
||||
astor==0.8.1
|
||||
asttokens==2.4.1
|
||||
async-lru==2.0.4
|
||||
attrs==24.2.0
|
||||
babel==2.16.0
|
||||
beautifulsoup4==4.12.3
|
||||
blake3==1.0.5
|
||||
bleach==6.2.0
|
||||
boltons @ file:///work/perseverance-python-buildout/croot/boltons_1698851177130/work
|
||||
Brotli @ file:///croot/brotli-split_1714483155106/work
|
||||
cachetools==6.0.0
|
||||
certifi==2025.6.15
|
||||
cffi @ file:///croot/cffi_1714483155441/work
|
||||
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
|
||||
click==8.2.1
|
||||
cloudpickle==3.1.1
|
||||
cnocr==2.3.1
|
||||
cnstd==1.2.5.2
|
||||
coloredlogs==15.0.1
|
||||
comm==0.2.2
|
||||
compressed-tensors==0.9.4
|
||||
conda @ file:///croot/conda_1714403036266/work
|
||||
conda-content-trust @ file:///croot/conda-content-trust_1714483159009/work
|
||||
conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1706733287605/work/src
|
||||
conda-package-handling @ file:///croot/conda-package-handling_1714483155348/work
|
||||
conda_package_streaming @ file:///work/perseverance-python-buildout/croot/conda-package-streaming_1698847176583/work
|
||||
contourpy==1.3.1
|
||||
cryptography @ file:///croot/cryptography_1714660666131/work
|
||||
cupy-cuda12x==13.4.1
|
||||
cycler==0.12.1
|
||||
debugpy==1.8.9
|
||||
decorator==5.1.1
|
||||
decord==0.6.0
|
||||
defusedxml==0.7.1
|
||||
depyf==0.18.0
|
||||
dill==0.4.0
|
||||
diskcache==5.6.3
|
||||
distro @ file:///croot/distro_1714488253808/work
|
||||
dnspython==2.7.0
|
||||
einops==0.8.1
|
||||
email_validator==2.2.0
|
||||
executing==2.1.0
|
||||
fake-useragent==2.2.0
|
||||
fastapi==0.115.12
|
||||
fastapi-cli==0.0.7
|
||||
fastjsonschema==2.20.0
|
||||
fastrlock==0.8.3
|
||||
filelock==3.16.1
|
||||
flatbuffers==25.2.10
|
||||
fonttools==4.55.0
|
||||
fqdn==1.5.1
|
||||
frozenlist==1.6.2
|
||||
fsspec==2024.10.0
|
||||
gguf==0.17.0
|
||||
gitdb==4.0.12
|
||||
GitPython==3.1.44
|
||||
googleapis-common-protos==1.70.0
|
||||
grpcio==1.68.0
|
||||
h11==0.14.0
|
||||
hf-xet==1.1.3
|
||||
httpcore==1.0.7
|
||||
httptools==0.6.4
|
||||
httpx==0.27.2
|
||||
huggingface-hub==0.32.4
|
||||
humanfriendly==10.0
|
||||
idna @ file:///croot/idna_1714398848350/work
|
||||
imageio==2.37.0
|
||||
importlib_metadata==8.7.0
|
||||
interegular==0.3.3
|
||||
ipykernel==6.29.5
|
||||
ipython==8.29.0
|
||||
ipywidgets==8.1.5
|
||||
isoduration==20.11.0
|
||||
jedi==0.19.2
|
||||
Jinja2==3.1.6
|
||||
jiter==0.10.0
|
||||
joblib==1.5.1
|
||||
json5==0.9.28
|
||||
jsonpatch @ file:///croot/jsonpatch_1714483231291/work
|
||||
jsonpointer==2.1
|
||||
jsonschema==4.23.0
|
||||
jsonschema-specifications==2024.10.1
|
||||
jupyter-events==0.10.0
|
||||
jupyter-lsp==2.2.5
|
||||
jupyter_client==8.6.3
|
||||
jupyter_core==5.7.2
|
||||
jupyter_server==2.14.2
|
||||
jupyter_server_terminals==0.5.3
|
||||
jupyterlab==4.3.1
|
||||
jupyterlab-language-pack-zh-CN==4.2.post3
|
||||
jupyterlab_pygments==0.3.0
|
||||
jupyterlab_server==2.27.3
|
||||
jupyterlab_widgets==3.0.13
|
||||
kiwisolver==1.4.7
|
||||
lark==1.2.2
|
||||
lazy_loader==0.4
|
||||
libmambapy @ file:///croot/mamba-split_1714483352891/work/libmambapy
|
||||
lightning-utilities==0.14.3
|
||||
llguidance==0.7.29
|
||||
llvmlite==0.44.0
|
||||
lm-format-enforcer==0.10.11
|
||||
lxml==5.4.0
|
||||
Markdown==3.7
|
||||
markdown-it-py==3.0.0
|
||||
MarkupSafe==3.0.2
|
||||
matplotlib==3.9.2
|
||||
matplotlib-inline==0.1.7
|
||||
mdurl==0.1.2
|
||||
menuinst @ file:///croot/menuinst_1714510563922/work
|
||||
mistral_common==1.5.6
|
||||
mistune==3.0.2
|
||||
modelscope==1.26.0
|
||||
more-itertools==10.7.0
|
||||
motmetrics==1.4.0
|
||||
mpmath==1.3.0
|
||||
msgpack==1.1.0
|
||||
msgspec==0.19.0
|
||||
multidict==6.4.4
|
||||
nbclient==0.10.0
|
||||
nbconvert==7.16.4
|
||||
nbformat==5.10.4
|
||||
nest-asyncio==1.6.0
|
||||
networkx==3.4.2
|
||||
ninja==1.11.1.4
|
||||
notebook_shim==0.2.4
|
||||
numba==0.61.2
|
||||
numpy==2.1.3
|
||||
nvidia-cublas-cu12==12.6.4.1
|
||||
nvidia-cuda-cupti-cu12==12.6.80
|
||||
nvidia-cuda-nvrtc-cu12==12.6.77
|
||||
nvidia-cuda-runtime-cu12==12.6.77
|
||||
nvidia-cudnn-cu12==9.5.1.17
|
||||
nvidia-cufft-cu12==11.3.0.4
|
||||
nvidia-cufile-cu12==1.11.1.6
|
||||
nvidia-curand-cu12==10.3.7.77
|
||||
nvidia-cusolver-cu12==11.7.1.2
|
||||
nvidia-cusparse-cu12==12.5.4.2
|
||||
nvidia-cusparselt-cu12==0.6.3
|
||||
nvidia-nccl-cu12==2.26.2
|
||||
nvidia-nvjitlink-cu12==12.6.85
|
||||
nvidia-nvtx-cu12==12.6.77
|
||||
onnx==1.18.0
|
||||
onnxruntime==1.22.0
|
||||
openai==1.84.0
|
||||
openai-whisper==20240930
|
||||
opencv-python==4.11.0.86
|
||||
opencv-python-headless==4.11.0.86
|
||||
opentelemetry-api==1.34.0
|
||||
opentelemetry-exporter-otlp==1.34.0
|
||||
opentelemetry-exporter-otlp-proto-common==1.34.0
|
||||
opentelemetry-exporter-otlp-proto-grpc==1.34.0
|
||||
opentelemetry-exporter-otlp-proto-http==1.34.0
|
||||
opentelemetry-proto==1.34.0
|
||||
opentelemetry-sdk==1.34.0
|
||||
opentelemetry-semantic-conventions==0.55b0
|
||||
opentelemetry-semantic-conventions-ai==0.4.9
|
||||
outcome==1.3.0.post0
|
||||
outlines==0.1.11
|
||||
outlines_core==0.1.26
|
||||
overrides==7.7.0
|
||||
packaging @ file:///croot/packaging_1710807400464/work
|
||||
pandas==2.3.0
|
||||
pandocfilters==1.5.1
|
||||
parso==0.8.4
|
||||
partial-json-parser==0.2.1.1.post5
|
||||
pexpect==4.9.0
|
||||
pillow==11.0.0
|
||||
platformdirs @ file:///work/perseverance-python-buildout/croot/platformdirs_1701732573265/work
|
||||
pluggy @ file:///work/perseverance-python-buildout/croot/pluggy_1698805497733/work
|
||||
prometheus-fastapi-instrumentator==7.1.0
|
||||
prometheus_client==0.21.0
|
||||
prompt_toolkit==3.0.48
|
||||
propcache==0.3.1
|
||||
protobuf==5.28.3
|
||||
psutil==6.1.0
|
||||
ptyprocess==0.7.0
|
||||
pure_eval==0.2.3
|
||||
py-cpuinfo==9.0.0
|
||||
pyclipper==1.3.0.post6
|
||||
pycosat @ file:///croot/pycosat_1714510623388/work
|
||||
pycountry==24.6.1
|
||||
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
|
||||
pydantic==2.11.5
|
||||
pydantic_core==2.33.2
|
||||
Pygments==2.18.0
|
||||
pyparsing==3.2.0
|
||||
PySocks @ file:///work/perseverance-python-buildout/croot/pysocks_1698845478203/work
|
||||
python-dateutil==2.9.0.post0
|
||||
python-dotenv==1.1.0
|
||||
python-json-logger==2.0.7
|
||||
python-multipart==0.0.20
|
||||
pytorch-lightning==2.5.1.post0
|
||||
pytz==2025.2
|
||||
PyYAML==6.0.2
|
||||
pyzmq==26.2.0
|
||||
rapidocr-onnxruntime==1.3.25
|
||||
ray==2.46.0
|
||||
referencing==0.35.1
|
||||
regex==2024.11.6
|
||||
requests @ file:///croot/requests_1707355572290/work
|
||||
rfc3339-validator==0.1.4
|
||||
rfc3986-validator==0.1.1
|
||||
rich==14.0.0
|
||||
rich-toolkit==0.14.7
|
||||
rpds-py==0.21.0
|
||||
ruamel.yaml @ file:///work/perseverance-python-buildout/croot/ruamel.yaml_1698863605521/work
|
||||
safetensors==0.5.3
|
||||
scikit-image==0.25.2
|
||||
scikit-learn==1.7.0
|
||||
scipy==1.15.3
|
||||
seaborn==0.13.2
|
||||
selenium==4.33.0
|
||||
Send2Trash==1.8.3
|
||||
sentencepiece==0.2.0
|
||||
sentry-sdk==2.29.1
|
||||
setproctitle==1.3.6
|
||||
setuptools==79.0.1
|
||||
shapely==2.1.1
|
||||
shellingham==1.5.4
|
||||
six==1.16.0
|
||||
smmap==5.0.2
|
||||
sniffio==1.3.1
|
||||
sortedcontainers==2.4.0
|
||||
soundfile==0.13.1
|
||||
soupsieve==2.6
|
||||
stack-data==0.6.3
|
||||
starlette==0.46.2
|
||||
supervisor==4.2.5
|
||||
sympy==1.14.0
|
||||
tensorboard==2.18.0
|
||||
tensorboard-data-server==0.7.2
|
||||
terminado==0.18.1
|
||||
threadpoolctl==3.6.0
|
||||
tifffile==2025.6.11
|
||||
tiktoken==0.9.0
|
||||
tinycss2==1.4.0
|
||||
tokenizers==0.21.1
|
||||
torch==2.7.0
|
||||
torchaudio==2.7.0
|
||||
torchmetrics==1.7.2
|
||||
torchvision==0.22.0
|
||||
tornado==6.4.2
|
||||
tqdm @ file:///croot/tqdm_1714567712644/work
|
||||
traitlets==5.14.3
|
||||
transformers==4.52.4
|
||||
trio==0.30.0
|
||||
trio-websocket==0.12.2
|
||||
triton==3.3.0
|
||||
truststore @ file:///work/perseverance-python-buildout/croot/truststore_1701735771625/work
|
||||
typer==0.16.0
|
||||
types-python-dateutil==2.9.0.20241003
|
||||
typing-inspection==0.4.1
|
||||
typing_extensions==4.13.2
|
||||
tzdata==2025.2
|
||||
ultralytics==8.3.153
|
||||
ultralytics-thop==2.0.14
|
||||
Unidecode==1.4.0
|
||||
uri-template==1.3.0
|
||||
urllib3==2.4.0
|
||||
uvicorn==0.34.3
|
||||
uvloop==0.21.0
|
||||
vllm==0.9.0.1
|
||||
wandb==0.20.1
|
||||
watchfiles==1.0.5
|
||||
wcwidth==0.2.13
|
||||
webcolors==24.11.1
|
||||
webdriver-manager==4.0.2
|
||||
webencodings==0.5.1
|
||||
websocket-client==1.8.0
|
||||
websockets==15.0.1
|
||||
Werkzeug==3.1.3
|
||||
wheel==0.43.0
|
||||
widgetsnbextension==4.0.13
|
||||
wsproto==1.2.0
|
||||
xformers==0.0.30
|
||||
xgrammar==0.1.19
|
||||
xmltodict==0.14.2
|
||||
yarl==1.20.0
|
||||
zipp==3.23.0
|
||||
zstandard @ file:///croot/zstandard_1714677652653/work
|
||||
197
requirements_ocr_voice.txt
Normal file
197
requirements_ocr_voice.txt
Normal file
@ -0,0 +1,197 @@
|
||||
aiohappyeyeballs==2.6.1
|
||||
aiohttp==3.12.11
|
||||
aiosignal==1.3.2
|
||||
aliyun-python-sdk-core==2.16.0
|
||||
aliyun-python-sdk-kms==2.16.5
|
||||
annotated-types==0.7.0
|
||||
antlr4-python3-runtime==4.9.3
|
||||
anyio==4.9.0
|
||||
astor==0.8.1
|
||||
attrs==25.3.0
|
||||
audioread==3.0.1
|
||||
cachetools==6.0.0
|
||||
certifi==2025.4.26
|
||||
cffi==1.17.1
|
||||
chardet==5.2.0
|
||||
charset-normalizer==3.4.2
|
||||
click==8.2.1
|
||||
cnocr==2.3.1
|
||||
cnstd==1.2.5.2
|
||||
coloredlogs==15.0.1
|
||||
colorlog==6.9.0
|
||||
contourpy==1.3.2
|
||||
crcmod==1.7
|
||||
cryptography==45.0.4
|
||||
cssselect==1.3.0
|
||||
cssutils==2.11.1
|
||||
cycler==0.12.1
|
||||
dataclasses-json==0.6.7
|
||||
decorator==5.2.1
|
||||
distro==1.9.0
|
||||
easyocr==1.7.2
|
||||
editdistance==0.8.1
|
||||
einops==0.8.1
|
||||
et_xmlfile==2.0.0
|
||||
filelock==3.18.0
|
||||
flatbuffers==25.2.10
|
||||
fonttools==4.58.2
|
||||
frozenlist==1.7.0
|
||||
fsspec==2025.5.1
|
||||
ftfy==6.3.1
|
||||
funasr==1.2.6
|
||||
gitdb==4.0.12
|
||||
GitPython==3.1.44
|
||||
GPUtil==1.4.0
|
||||
greenlet==3.2.3
|
||||
h11==0.16.0
|
||||
hf-xet==1.1.3
|
||||
httpcore==1.0.9
|
||||
httpx==0.28.1
|
||||
huggingface-hub==0.32.4
|
||||
humanfriendly==10.0
|
||||
hydra-core==1.3.2
|
||||
idna==3.10
|
||||
imageio==2.37.0
|
||||
imagesize==1.4.1
|
||||
jaconv==0.4.0
|
||||
jamo==0.4.1
|
||||
jieba==0.42.1
|
||||
Jinja2==3.1.6
|
||||
jiter==0.10.0
|
||||
jmespath==0.10.0
|
||||
joblib==1.5.1
|
||||
jsonpatch==1.33
|
||||
jsonpointer==3.0.0
|
||||
kaldiio==2.18.1
|
||||
kiwisolver==1.4.8
|
||||
langchain==0.2.17
|
||||
langchain-community==0.2.17
|
||||
langchain-core==0.2.43
|
||||
langchain-openai==0.1.25
|
||||
langchain-text-splitters==0.2.4
|
||||
langsmith==0.1.147
|
||||
lazy_loader==0.4
|
||||
Levenshtein==0.27.1
|
||||
librosa==0.11.0
|
||||
lightning-utilities==0.14.3
|
||||
llvmlite==0.44.0
|
||||
lxml==5.4.0
|
||||
MarkupSafe==3.0.2
|
||||
marshmallow==3.26.1
|
||||
matplotlib==3.10.3
|
||||
modelscope==1.26.0
|
||||
more-itertools==10.7.0
|
||||
mpmath==1.3.0
|
||||
msgpack==1.1.0
|
||||
multidict==6.4.4
|
||||
mypy_extensions==1.1.0
|
||||
networkx==3.5
|
||||
ninja==1.11.1.4
|
||||
numba==0.61.2
|
||||
numpy==1.26.4
|
||||
nvidia-cublas-cu12==12.6.4.1
|
||||
nvidia-cuda-cupti-cu12==12.6.80
|
||||
nvidia-cuda-nvrtc-cu12==12.6.77
|
||||
nvidia-cuda-runtime-cu12==12.6.77
|
||||
nvidia-cudnn-cu12==9.5.1.17
|
||||
nvidia-cufft-cu12==11.3.0.4
|
||||
nvidia-cufile-cu12==1.11.1.6
|
||||
nvidia-curand-cu12==10.3.7.77
|
||||
nvidia-cusolver-cu12==11.7.1.2
|
||||
nvidia-cusparse-cu12==12.5.4.2
|
||||
nvidia-cusparselt-cu12==0.6.3
|
||||
nvidia-nccl-cu12==2.26.2
|
||||
nvidia-nvjitlink-cu12==12.6.85
|
||||
nvidia-nvtx-cu12==12.6.77
|
||||
omegaconf==2.3.0
|
||||
onnx==1.18.0
|
||||
onnxruntime==1.22.0
|
||||
openai==1.63.2
|
||||
openai-whisper==20240930
|
||||
opencv-contrib-python==4.10.0.84
|
||||
opencv-python==4.11.0.86
|
||||
opencv-python-headless==4.11.0.86
|
||||
openpyxl==3.1.5
|
||||
opt-einsum==3.3.0
|
||||
orjson==3.10.18
|
||||
oss2==2.19.1
|
||||
packaging==24.2
|
||||
paddleocr==3.0.1
|
||||
paddlepaddle==3.0.0
|
||||
paddlepaddle-gpu==2.6.2
|
||||
paddlex==3.0.1
|
||||
pandas==1.5.3
|
||||
pillow==11.2.1
|
||||
platformdirs==4.3.8
|
||||
pooch==1.8.2
|
||||
premailer==3.10.0
|
||||
prettytable==3.16.0
|
||||
propcache==0.3.2
|
||||
protobuf==6.31.1
|
||||
psutil==7.0.0
|
||||
py-cpuinfo==9.0.0
|
||||
pyclipper==1.3.0.post6
|
||||
pycparser==2.22
|
||||
pycryptodome==3.23.0
|
||||
pydantic==2.11.5
|
||||
pydantic_core==2.33.2
|
||||
pynndescent==0.5.13
|
||||
pyparsing==3.2.3
|
||||
pypdfium2==4.30.1
|
||||
python-bidi==0.6.6
|
||||
python-dateutil==2.9.0.post0
|
||||
pytorch-lightning==2.5.1.post0
|
||||
pytorch-wpe==0.0.1
|
||||
pytz==2025.2
|
||||
PyYAML==6.0.2
|
||||
RapidFuzz==3.13.0
|
||||
rapidocr-onnxruntime==1.3.25
|
||||
regex==2024.11.6
|
||||
requests==2.32.4
|
||||
requests-toolbelt==1.0.0
|
||||
resampy==0.4.3
|
||||
ruamel.yaml==0.18.14
|
||||
ruamel.yaml.clib==0.2.12
|
||||
scikit-image==0.25.2
|
||||
scikit-learn==1.7.0
|
||||
scipy==1.15.3
|
||||
seaborn==0.13.2
|
||||
sentencepiece==0.2.0
|
||||
sentry-sdk==2.29.1
|
||||
setproctitle==1.3.6
|
||||
setuptools==78.1.1
|
||||
shapely==2.1.1
|
||||
six==1.17.0
|
||||
smmap==5.0.2
|
||||
sniffio==1.3.1
|
||||
soundfile==0.13.1
|
||||
soxr==0.5.0.post1
|
||||
SQLAlchemy==2.0.41
|
||||
sympy==1.14.0
|
||||
tenacity==8.5.0
|
||||
tensorboardX==2.6.2.2
|
||||
threadpoolctl==3.6.0
|
||||
tifffile==2025.6.1
|
||||
tiktoken==0.9.0
|
||||
tokenizers==0.19.1
|
||||
torch==2.7.1
|
||||
torch-complex==0.4.4
|
||||
torchaudio==2.7.1
|
||||
torchmetrics==1.7.2
|
||||
torchvision==0.22.1
|
||||
tqdm==4.67.1
|
||||
triton==3.3.1
|
||||
typing-inspect==0.9.0
|
||||
typing-inspection==0.4.1
|
||||
typing_extensions==4.14.0
|
||||
ujson==5.10.0
|
||||
ultralytics==8.3.153
|
||||
ultralytics-thop==2.0.14
|
||||
umap-learn==0.5.7
|
||||
Unidecode==1.4.0
|
||||
urllib3==2.4.0
|
||||
wandb==0.20.1
|
||||
wcwidth==0.2.13
|
||||
wheel==0.45.1
|
||||
whisper==1.1.10
|
||||
yarl==1.20.0
|
||||
Loading…
x
Reference in New Issue
Block a user