SAM2Long推理过程
本文最后更新于 2024-11-28 11:40
SAM2Long (Segment Anything Model for Long Videos) 是一种面向长视频语义分割任务的技术解决方案,其设计理念来源于 Segment Anything Model (SAM) 的先进能力,同时针对长视频中的时序特性和高效推理需求进行了扩展和优
1. 运行环境
类别 | 详细信息 |
---|---|
CPU | 12 vCPU Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz |
CPU 核心数 | 12 核心 |
GPU | RTX 3080x2(20GB) |
GPU 显存 | 20 GB |
CUDA 版本 | 12.1 |
操作系统 | ubuntu22.04 |
Python 版本 | 3.12 |
PyTorch 版本 | 2.3.0 |
2. 安装 Conda
参考这篇文章: https://docs.aheadai.cn/60.html
3. 创建 Conda 环境
由于sam2Long项目较新,最好创建新的 Conda 环境:
conda create -n sam2Long python==3.12
conda activate sam2Long
4. 源码与权重下载
源码下载:git克隆或者直接下载zip文件
git clone https://github.com/Mark12Ding/SAM2Long.git
wget https://mirrors.aheadai.cn/scripts/SAM2Long-main.zip
unzip SAM2Long-main.zip
cd SAM2Long-main
模型权重下载:
wget https://mirrors.aheadai.cn/scripts/sam2.1_hiera_base_plus.pt
mkdir checkpoints
mv sam2.1_hiera_base_plus.pt checkpoints/
5. 安装依赖
pip install -e .
sam2Long所用到的部分必须库如下:
`"torch>=2.3.1",
"torchvision>=0.18.1",
"numpy>=1.24.4",
"tqdm>=4.66.1",
"hydra-core>=1.3.2",
"iopath>=0.1.10",
"pillow>=9.4.0"`
"pynvml"
如果安装超时报错:
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
重新运行命令即可,下载时间比较久,请耐心等待。
6. 数据集下载
wget https://mirrors.aheadai.cn/data/DAVIS-2017-test-dev-480p.zip
unzip DAVIS-2017-test-dev-480p.zip
解压后的数据结构如下:
├── DAVIS
├──Annotations 图片第一帧标签
├──480p
├──ImageSets 图像种类设置
├──2016
├──2017
├──JPEGImages 视频帧图像
├──480p
7. 视频分割推理
下载推理脚本:
wget https://mirrors.aheadai.cn/scripts/sam2Long_vos_inference.py
mv sam2Long_vos_inference.py tools/
下载相关软件包:
pip install pynvml
在项目根目录下运行命令:
python ./tools/sam2Long_vos_inference.py \
--sam2_cfg configs/sam2.1/sam2.1_hiera_b+.yaml \
--sam2_checkpoint ./checkpoints/sam2.1_hiera_base_plus.pt \
--base_video_dir DAVIS/JPEGImages/480p \
--input_mask_dir DAVIS/Annotations/480p \
--video_list_file DAVIS/ImageSets/2017/test-dev.txt \
--output_mask_dir ./outputs/davis_2017_pred_pngs \
--num_pathway 3 \
--iou_thre 0.1 \
--uncertainty 2
8. 运行结果
运行成功后,输出如下:
Performance Metrics for tractor:
Latency for First Token: 6875.28 ms
Throughput for Tokens: 6.35 tokens/s
Latency Distribution: TP50=0.000 s, TP99=0.000 s
Avg GPU Power: 284.82 W
Max GPU Memory Usage: 5412.44 MB
Cost for 10k steps: $0.56, Total Time: 0.44 hours
completed VOS prediction on 30 videos -- output masks saved to ./outputs/davis_2017_pred_pngs
得到log文件sam2Long_inference.log,保存在https://mirrors.aheadai.cn/log/sam2Long_inference.log中,部分内容如下:
2024-11-21 16:02:53,723 - Avg GPU Power: 283.32 W
2024-11-21 16:02:53,723 - Max GPU Memory Usage: 5412.44 MB
2024-11-21 16:02:53,723 - Cost for 10k steps: $0.57, Total Time: 0.44 hours
2024-11-21 16:03:14,200 - Performance Metrics for tractor:
2024-11-21 16:03:14,200 - Latency for First Token: 6875.28 ms
2024-11-21 16:03:14,200 - Throughput for Tokens: 6.35 tokens/s
2024-11-21 16:03:14,200 - Latency Distribution: TP50=0.000 s, TP99=0.000 s
2024-11-21 16:03:14,200 - Avg GPU Power: 284.82 W
2024-11-21 16:03:14,200 - Max GPU Memory Usage: 5412.44 MB
2024-11-21 16:03:14,200 - Cost for 10k steps: $0.56, Total Time: 0.44 hours
本文系作者 @
admin
原创发布在 文档中心 | AheadAI ,未经许可,禁止转载。
有帮助?
作者写的太好了