本文最后更新于 2024-11-28 11:40

SAM2Long (Segment Anything Model for Long Videos) 是一种面向长视频语义分割任务的技术解决方案,其设计理念来源于 Segment Anything Model (SAM) 的先进能力,同时针对长视频中的时序特性和高效推理需求进行了扩展和优

1. 运行环境

类别 详细信息
CPU 12 vCPU Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz
CPU 核心数 12 核心
GPU RTX 3080x2(20GB)
GPU 显存 20 GB
CUDA 版本 12.1
操作系统 ubuntu22.04
Python 版本 3.12
PyTorch 版本 2.3.0

2. 安装 Conda

参考这篇文章: https://docs.aheadai.cn/60.html

3. 创建 Conda 环境

由于sam2Long项目较新,最好创建新的 Conda 环境:

conda create -n sam2Long python==3.12
conda activate sam2Long

4. 源码与权重下载

源码下载:git克隆或者直接下载zip文件

git clone https://github.com/Mark12Ding/SAM2Long.git
wget https://mirrors.aheadai.cn/scripts/SAM2Long-main.zip
unzip SAM2Long-main.zip
cd SAM2Long-main

模型权重下载:

wget https://mirrors.aheadai.cn/scripts/sam2.1_hiera_base_plus.pt
mkdir checkpoints
mv sam2.1_hiera_base_plus.pt checkpoints/

5. 安装依赖

pip install -e .

sam2Long所用到的部分必须库如下:

`"torch>=2.3.1",
"torchvision>=0.18.1",
"numpy>=1.24.4",
"tqdm>=4.66.1",
"hydra-core>=1.3.2",
"iopath>=0.1.10",
"pillow>=9.4.0"`
"pynvml"

如果安装超时报错:

      pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

重新运行命令即可,下载时间比较久,请耐心等待。

6. 数据集下载

wget https://mirrors.aheadai.cn/data/DAVIS-2017-test-dev-480p.zip
unzip DAVIS-2017-test-dev-480p.zip

解压后的数据结构如下:

├── DAVIS
    ├──Annotations            图片第一帧标签
        ├──480p
    ├──ImageSets              图像种类设置
        ├──2016
        ├──2017
    ├──JPEGImages             视频帧图像
        ├──480p

7. 视频分割推理

下载推理脚本:

wget https://mirrors.aheadai.cn/scripts/sam2Long_vos_inference.py
mv sam2Long_vos_inference.py tools/

下载相关软件包:

pip install pynvml

在项目根目录下运行命令:

python ./tools/sam2Long_vos_inference.py \
  --sam2_cfg configs/sam2.1/sam2.1_hiera_b+.yaml \
  --sam2_checkpoint ./checkpoints/sam2.1_hiera_base_plus.pt \
  --base_video_dir DAVIS/JPEGImages/480p \
  --input_mask_dir DAVIS/Annotations/480p \
  --video_list_file DAVIS/ImageSets/2017/test-dev.txt \
  --output_mask_dir ./outputs/davis_2017_pred_pngs \
  --num_pathway 3 \
  --iou_thre 0.1 \
  --uncertainty 2

8. 运行结果

运行成功后,输出如下:

Performance Metrics for tractor:
  Latency for First Token: 6875.28 ms
  Throughput for Tokens: 6.35 tokens/s
  Latency Distribution: TP50=0.000 s, TP99=0.000 s
  Avg GPU Power: 284.82 W
  Max GPU Memory Usage: 5412.44 MB
Cost for 10k steps: $0.56, Total Time: 0.44 hours
completed VOS prediction on 30 videos -- output masks saved to ./outputs/davis_2017_pred_pngs

得到log文件sam2Long_inference.log,保存在https://mirrors.aheadai.cn/log/sam2Long_inference.log中,部分内容如下:

2024-11-21 16:02:53,723 -   Avg GPU Power: 283.32 W
2024-11-21 16:02:53,723 -   Max GPU Memory Usage: 5412.44 MB
2024-11-21 16:02:53,723 - Cost for 10k steps: $0.57, Total Time: 0.44 hours
2024-11-21 16:03:14,200 - Performance Metrics for tractor:
2024-11-21 16:03:14,200 -   Latency for First Token: 6875.28 ms
2024-11-21 16:03:14,200 -   Throughput for Tokens: 6.35 tokens/s
2024-11-21 16:03:14,200 -   Latency Distribution: TP50=0.000 s, TP99=0.000 s
2024-11-21 16:03:14,200 -   Avg GPU Power: 284.82 W
2024-11-21 16:03:14,200 -   Max GPU Memory Usage: 5412.44 MB
2024-11-21 16:03:14,200 - Cost for 10k steps: $0.56, Total Time: 0.44 hours
本文系作者 @ admin 原创发布在 文档中心 | AheadAI ,未经许可,禁止转载。