Quick Run
In this chapter, we introduce users to some basic usage and commands of SSSegmentation to help users quickly get started running SSSegmentation.
Training and Testing Segmentors Integrated in SSSegmentation
SSSegmentation supports training and testing segmentation frameworks on a single machine or multiple machines (cluster) by utilizing nn.parallel.DistributedDataParallel.
In this section, you will learn how to train and test these supported segmentors using the scripts provided by SSSegmentation.
Training and Testing on A Single Machine
1. Training a segmentor
We provide scripts/dist_train.sh to launch training jobs on a single machine. The basic usage is as follows,
bash scripts/dist_train.sh ${NGPUS} ${CFGFILEPATH} [optional arguments]
This script accepts several optional arguments, including:
${NGPUS}: The total number of processes for your training, which is usually the total number of GPUs you are using for distributed training.${CFGFILEPATH}: The config file path which is used to customize segmentors.--ckptspath: Specify the checkpoint from which to resume training. To automatically resume from the latest checkpoint inSEGMENTOR_CFG['work_dir'], set it tof"{SEGMENTOR_CFG['work_dir']}/checkpoints-epoch-latest.pth".--slurm: Please add--slurmif you are using slurm to spawn training jobs.
Here, we provide some examples about training a segmentor on a single machine,
# train ANNNet
bash scripts/dist_train.sh 4 ssseg/configs/annnet/annnet_resnet50os16_ade20k.py
# load checkpoints-epoch-44.pth and then resume training ANNNet
bash scripts/dist_train.sh 4 ssseg/configs/annnet/annnet_resnet50os16_ade20k.py --ckptspath annnet_resnet50os16_ade20k/checkpoints-epoch-44.pth
# auto resume
bash scripts/dist_train.sh 4 ssseg/configs/annnet/annnet_resnet50os16_ade20k.py --ckptspath annnet_resnet50os16_ade20k/checkpoints-epoch-latest.pth
The last command will be very useful if you are training your segmentors in an unstable environment, e.g., your program will be interrupted and restarted frequently.
2. Testing a segmentor
We provide scripts/dist_test.sh to launch testing jobs on a single machine. The basic usage is as follows,
bash scripts/dist_test.sh ${NGPUS} ${CFGFILEPATH} ${CKPTSPATH} [optional arguments]
This script accepts several optional arguments, including:
${NGPUS}: The total number of processes for your testing, which is usually the total number of GPUs you are using for distributed testing.${CFGFILEPATH}: The config file path which is used to customize segmentors.${CKPTSPATH}: Specify the checkpoint to use for performance testing. To automatically test the latest checkpoint inSEGMENTOR_CFG['work_dir'], set it tof"{SEGMENTOR_CFG['work_dir']}/checkpoints-epoch-latest.pth".--eval_env: Select the environment for evaluating segmentor performance, supportserverenvironment (only save the test results which could be submitted to the corresponding dataset’s official website to obtain the segmentation performance) andlocalenvironment (the default environment, test segmentors with the local images and annotations provided by the corresponding dataset).--slurm: Please add--slurmif you are using slurm to spawn testing jobs.--ema: Please add--emaif you want to load ema weights of segmentors for performance testing.
Here, we provide some examples about testing a segmentor on a single machine,
# test ANNNet on ADE20k
bash scripts/dist_test.sh 4 ssseg/configs/annnet/annnet_resnet50os16_ade20k.py annnet_resnet50os16_ade20k/checkpoints-epoch-130.pth
# test ANNNet on Cityscapes
bash scripts/dist_test.sh 4 ssseg/configs/annnet/annnet_resnet50os16_cityscapes.py annnet_resnet50os16_cityscapes/checkpoints-epoch-220.pth
Training and Testing on Multiple Machines with Slurm
In SSSegmentation, we support training with multiple machines using Slurm, where Slurm is a good job scheduling system for computing clusters.
1. Training a segmentor
On a cluster managed by Slurm, you can use scripts/slurm_train.sh to spawn training jobs. It supports both single-node and multi-node training. The basic usage is as follows,
bash scripts/slurm_train.sh ${PARTITION} ${JOBNAME} ${NGPUS} ${CFGFILEPATH} [optional arguments]
This script accepts several optional arguments, including:
${PARTITION}: Specify the name of the partition (or queue) where the job will be submitted, defining a set of resources and job scheduling policies.${JOBNAME}: Specify the name of the job, which is used for job identification, monitoring, and logging in the job queue.${NGPUS}: The total number of processes for your training, which is usually the total number of GPUs you are using for distributed training.${CFGFILEPATH}: The config file path which is used to customize segmentors.--ckptspath: Specify the checkpoint from which to resume training. To automatically resume from the latest checkpoint inSEGMENTOR_CFG['work_dir'], set it tof"{SEGMENTOR_CFG['work_dir']}/checkpoints-epoch-latest.pth".--slurm: Please add--slurmif you are using slurm to spawn testing jobs.
Here is an example of using 16 GPUs to train PSPNet on Slurm partition named dev,
bash scripts/slurm_train.sh dev pspnet 16 ssseg/configs/pspnet/pspnet_resnet101os8_ade20k.py --slurm
Please note that, --slurm is required to set for environment initialization if you are using slurm to spawn training jobs.
2. Testing a segmentor
On a cluster managed by Slurm, SSSegmentation provides scripts/slurm_test.sh to spawn testing jobs. The basic usage is as follows,
bash scripts/slurm_test.sh ${PARTITION} ${JOBNAME} ${NGPUS} ${CFGFILEPATH} ${CKPTSPATH} [optional arguments]
This script accepts several optional arguments, including:
${PARTITION}: Specify the name of the partition (or queue) where the job will be submitted, defining a set of resources and job scheduling policies.${JOBNAME}: Specify the name of the job, which is used for job identification, monitoring, and logging in the job queue.${NGPUS}: The total number of processes for your testing, which is usually the total number of GPUs you are using for distributed testing.${CFGFILEPATH}: The config file path which is used to customize segmentors.${CKPTSPATH}: Specify the checkpoint to use for performance testing. To automatically test the latest checkpoint inSEGMENTOR_CFG['work_dir'], set it tof"{SEGMENTOR_CFG['work_dir']}/checkpoints-epoch-latest.pth".--eval_env: Select the environment for evaluating segmentor performance, supportserverenvironment (only save the test results which could be submitted to the corresponding dataset’s official website to obtain the segmentation performance) andlocalenvironment (the default environment, test segmentors with the local images and annotations provided by the corresponding dataset).--slurm: Please add--slurmif you are using slurm to spawn testing jobs.--ema: Please add--emaif you want to load ema weights of segmentors for performance testing.
Here is an example of using 16 GPUs to test PSPNet on Slurm partition named dev,
bash scripts/slurm_test.sh dev pspnet 16 ssseg/configs/pspnet/pspnet_resnet101os8_ade20k.py pspnet_resnet101os8_ade20k/checkpoints-epoch-130.pth --slurm
Please note that, --slurm is required to set for environment initialization if you are using slurm to spawn testing jobs.
Training and Testing on Multiple Machines with AML
In SSSegmentation, we also support training with multiple machines using AML, where Azure Machine Learning (AML) is a cloud-based platform that enables data scientists and developers to build, train, and deploy machine learning models efficiently at scale. It offers end-to-end tools for automating workflows, managing experiments, and utilizing Azure’s compute resources for robust ML model training and deployment.
1. Training a segmentor
On a cluster managed by AML, you can use scripts/aml_train.sh to spawn training jobs in the pre-defined job.yaml file. It supports both single-node and multi-node training. The basic usage is as follows,
bash scripts/aml_train.sh ${NGPUS_PER_NODE} ${CFGFILEPATH} [optional arguments]
This script accepts several optional arguments, including:
${NGPUS_PER_NODE}: The number of processes per node, which is usually the number of GPUs per node you are using for distributed training.${CFGFILEPATH}: The config file path which is used to customize segmentors.--ckptspath: Specify the checkpoint from which to resume training. To automatically resume from the latest checkpoint inSEGMENTOR_CFG['work_dir'], set it tof"{SEGMENTOR_CFG['work_dir']}/checkpoints-epoch-latest.pth".--slurm: Please add--slurmif you are using slurm to spawn training jobs.
Here is an example of spawning training jobs in the pre-defined job.yaml file,
jobs:
- name: fcn_resnet50os8_ade20k
sku: 2xG8
command:
- bash scripts/aml_train.sh 8 ssseg/configs/fcn/fcn_resnet50os8_ade20k.py
2. Testing a segmentor
On a cluster managed by AML, SSSegmentation provides scripts/aml_test.sh to spawn testing jobs in the pre-defined job.yaml file. The basic usage is as follows,
bash scripts/aml_test.sh ${NGPUS_PER_NODE} ${CFGFILEPATH} ${CKPTSPATH} [optional arguments]
This script accepts several optional arguments, including:
${NGPUS_PER_NODE}: The number of processes per node, which is usually the number of GPUs per node you are using for distributed testing.${CFGFILEPATH}: The config file path which is used to customize segmentors.${CKPTSPATH}: Specify the checkpoint to use for performance testing. To automatically test the latest checkpoint inSEGMENTOR_CFG['work_dir'], set it tof"{SEGMENTOR_CFG['work_dir']}/checkpoints-epoch-latest.pth".--eval_env: Select the environment for evaluating segmentor performance, supportserverenvironment (only save the test results which could be submitted to the corresponding dataset’s official website to obtain the segmentation performance) andlocalenvironment (the default environment, test segmentors with the local images and annotations provided by the corresponding dataset).--slurm: Please add--slurmif you are using slurm to spawn testing jobs.--ema: Please add--emaif you want to load ema weights of segmentors for performance testing.
Here is an example of spawning testing jobs in the pre-defined job.yaml file,
jobs:
- name: fcn_resnet50os8_ade20k
sku: 2xG8
command:
- bash scripts/aml_test.sh ssseg/configs/fcn/fcn_resnet50os8_ade20k.py fcn_resnet50os8_ade20k/checkpoints-epoch-130.pth
Inference with Segmentors Integrated in SSSegmentation
SSSegmentation provides pre-trained models for semantic segmentation in Model Zoo, and supports multiple standard datasets, including Pascal VOC, Cityscapes, ADE20K, etc. This section will show how to use existing pre-trained models to inference on given images.
Specifically, SSSegmentation provides scripts/inference.sh to apply the trained segmentors to segment images. The basic usage is as follows,
bash scripts/inference.sh ${CFGFILEPATH} ${CKPTSPATH} [optional arguments]
This script accepts several optional arguments, including:
${CFGFILEPATH}: The config file path which is used to customize segmentors.${CKPTSPATH}: Specify the checkpoint to use for inference.--outputdir: Destination directory for saving the output image(s).--imagepath: Path to the image for inference by the segmentor.--imagedir: Directory containing images for inference by the segmentor.--ema: Please add –ema if you want to load ema weights of segmentors for inference.
Here are some example commands,
# inference on a given image
bash scripts/inference.sh ssseg/configs/pspnet/pspnet_resnet101os8_ade20k.py pspnet_resnet101os8_ade20k/checkpoints-epoch-130.pth --imagepath dog.jpg
# inference on given images
bash scripts/inference.sh ssseg/configs/pspnet/pspnet_resnet101os8_ade20k.py pspnet_resnet101os8_ade20k/checkpoints-epoch-130.pth --imagedir dogs
Please note that, if you specify --imagedir and --imagepath at the same time, only the value following --imagedir will be used.
And the image format should be in [png, jpg, jpeg].