Run in a container
We have produced both docker and singularity containers that hold all the necessary software for
PEPPRO. You can run
PEPPRO as an individual pipeline on a single sample using these containers by directly calling
docker run or
singularity exec. Or, you can rely on
looper, which is already set up to run any pipeline in existing containers using the
divvy templating system. Instructions for both follow:
First, make sure your environment is set up to run either docker or singularity containers. Then, pull the container image:
Docker: You can pull the docker databio/peppro image from dockerhub like this:
docker pull databio/peppro
Or build the image using the included Dockerfile (you can use a recipe in the included Makefile):
cd peppro/ make docker
Singularity: You can download the singularity image or build it from the docker image using the Makefile:
cd peppro/ make singularity
Now you'll need to tell the pipeline where you saved the singularity image. You can either create an environment variable called
$SIMAGES that points to the folder where your image is stored, or you can tweak the
pipeline_interface.yaml file so that the
compute.singularity_image attribute is pointing to the right location on disk.
If your containers are set up correctly, then won't need to install any additional software.
Running individual samples in a container
Individual jobs can be run in a container by simply running the
peppro.py command through
docker run or
singularity exec. You can run containers either on your local computer, or in an HPC environment, as long as you have
singularity installed. For example, run it locally in singularity like this:
singularity exec --bind $REFGENIE $SIMAGES/peppro pipelines/peppro.py --help
docker, you can use:
docker run --rm -it databio/peppro pipelines/peppro.py --help
Be sure to mount the volumes you need with
--volume. If you're utilizing any environment variables (e.g.
$REFGENIE), don't forget to include those in your docker command with the
The pipeline has been successfully run in both a Linux and MacOS environment. With
docker you need to bind mount your volume that contains the pipeline and your
$REFGENIE location, as well as provide the container the same environment variables your host environment is using.
In the first example, we're mounting our home user directory (
/home/jps3ag/) which contains the parent directories to our
$REFGENIE folder and to the pipeline itself. We'll also provide the pipeline two environment variables,
Here's that example command in a Linux environment to run the test example through the pipeline:
docker run --rm -it --volume /home/jps3ag/:/home/jps3ag/ \ -e REFGENIE='/home/jps3ag/genomes/genome_config.yaml' \ -e HOME='/home/jps3ag/' \ databio/peppro \ /home/jps3ag/src/peppro/pipelines/peppro.py --single-or-paired single \ --prealignments rCRSd human_repeats \ --genome hg38 \ --sample-name test1 \ --input /home/jps3ag/src/peppro/examples/data/test_r1.fq.gz \ -O $HOME/peppro_test
In this second example, we'll perform the same command in a Mac environment using Docker for Mac.
This necessitates a few minor changes to run that same example:
Remember to allocate sufficient memory (6-8GB should generally be more than adequate) in Docker for Mac.
docker run --rm -it --volume /Users/jps3ag/:/Users/jps3ag/ \ -e REFGENIE="/Users/jps3ag/genomes/genome_config.yaml" \ -e HOME="/Users/jps3ag/" \ databio/peppro \ /Users/jps3ag/src/peppro/pipelines/peppro.py --single-or-paired single \ --prealignments rCRSd human_repeats \ --genome hg38 \ --sample-name test1 \ --input /Users/jps3ag/src/peppro/examples/data/test_r1.fq.gz \ -O $HOME/peppro_test
First, build a singularity container from the docker image and create a running instance (be sure to mount your directories containing your
$REFGENIE configuration file, corresponding
refgenie genomes, and the pipeline.
singularity build peppro docker://databio/peppro:latest singularity instance.start -B /home/jps3ag/:/home/jps3aq/ peppro peppro_instance
Second, run your command.
singularity exec instance://peppro_instance \ /home/jps3ag/src/peppro/pipelines/peppro.py --single-or-paired single \ --prealignments rCRSd human_repeats \ --genome hg38 \ --sample-name test1 \ --input /home/jps3ag/src/peppro/examples/data/test_r1.fq.gz \ -O $HOME/peppro_test
Third, close your instance when finished.
singularity instance.stop peppro_instance
Running multiple samples in a container with looper
To run multiple samples in a container, you simply need to configure
looper to use a container-compatible template. The looper documentation has detailed instructions for how to run pipelines in containers.