Documentation

Knot

The primary object in cloudknot is the Knot. It represents a collection of AWS Batch resources and has methods for interacting with AWS Batch components. For more detail, you can familiarize yourself with the components of AWS Batch. Knot instantiation creates the required AWS resources. You can build a Knot on top of a customized Pars (see below) or just use the default Pars (default behavior). You can submit and view jobs using the map and view_jobs methods. In particular, map returns a list of futures for each submitted job’s results. You can also inspect the cloudknot.aws.BatchJob instance for each job by accessing the knot’s jobs parameter. To see Knot in action, see Examples.

cloudknot.Knot

class cloudknot.Knot(name=None, pars=None, pars_policies=(), docker_image=None, base_image=None, func=None, image_script_path=None, image_work_dir=None, image_github_installs=(), username=None, repo_name=None, image_tags=None, no_image_cache=False, job_definition_name=None, job_def_vcpus=None, memory=None, n_gpus=None, retries=None, compute_environment_name=None, instance_types=None, min_vcpus=None, max_vcpus=None, desired_vcpus=None, volume_size=None, image_id=None, ec2_key_pair=None, bid_percentage=None, job_queue_name=None, priority=None, aws_resource_tags=None)[source]

A collection of resources and methods to submit jobs to AWS Batch.

This object collects AWS resources that should be created once for each type of batch run. The resource set consists of a PARS; a docker image made from an input function or python script; a remote docker repo to house said image; and an AWS batch job definition, compute environment, and job queue. It also contains methods to submit batch jobs for a range of arguments.

Initialize a Knot instance.

Parameters
  • name (str, optional) – The name for this knot. Must be less than 46 characters. Must satisfy regular expression pattern: [a-zA-Z][-a-zA-Z0-9]* Default=’${AWS-username}-default’

  • pars (Pars, optional) – The PARS on which to base this knot’s AWS resources Default: instance returned by Pars()

  • pars_policies (tuple of strings) – tuple of names of AWS policies to attach to each role Default: ()

  • docker_image (DockerImage, optional) – The pre-existing DockerImage instance to adopt. i.e., you may construct your own Docker Image using ` d = cloudknot.DockerImage(*args) ` and then supply that docker image as a keyword arg using ` knot = cloudknot.Knot(..., docker_image=d) `

  • base_image (string) – Docker base image on which to base this Dockerfile. You may not specify both docker_image and base_image. Default: None will use the python base image for the current version of python

  • func (function) – Python function to be dockerized

  • image_script_path (str) – Path to file with python script to be dockerized

  • image_work_dir (string) – Directory to store Dockerfile, requirements.txt, and python script with CLI Default: parent directory of script if script_path is provided else DockerImage creates a new directory, accessible by the docker_image.build_path property.

  • image_github_installs (string or sequence of strings) – Github addresses for packages to install from github rather than PyPI (e.g. git://github.com/nrdg/cloudknot.git or git://github.com/nrdg/cloudknot.git@newfeaturebranch) Default: ()

  • username (string) – default username created in Dockerfile and in batch job definition Default: ‘cloudknot-user’

  • repo_name (str, optional) – Name of the AWS ECR repository to store the created Docker image Default: return value of cloudknot.get_ecr_repo()

  • image_tags (str or sequence of str) – Tags to be applied to this Docker image

  • no_image_cache (bool) – If True, do not use image cache for Docker build. This forces a rebuild of the image even if it already exists. Default: False

  • job_definition_name (str, optional) – Name for this knot’s AWS Batch job definition Default: name + ‘-ck-jd’

  • job_def_vcpus (int, optional) – number of virtual cpus to be used to this knot’s job definition Default: 1

  • memory (int, optional) – memory (MiB) to be used for this knot’s job definition Default: 8000

  • n_gpus (int, optional) – number of GPUs to be used for this knot’s job definition Default: 0

  • retries (int, optional) – number of times a job can be moved to ‘RUNNABLE’ status. May be between 1 and 10 Default: 1

  • compute_environment_name (str) – Name for this knot’s AWS Batch compute environment Default: name + ‘-ck-ce’

  • instance_types (string or sequence of strings, optional) – Compute environment instance types Default: (‘optimal’,)

  • min_vcpus (int, optional) – minimum number of virtual cpus for instances launched in this compute environment Default: 0

  • max_vcpus (int, optional) – maximum number of virtual cpus for instances launched in this compute environment Default: 256

  • desired_vcpus (int, optional) – desired number of virtual cpus for instances launched in this compute environment Default: 8

  • volume_size (int, optional) – the size (in GiB) of the Amazon EBS volumes used for instances launched by AWS Batch. If not provided, cloudknot will use the default Amazon ECS-optimized AMI version based on Amazon Linux 1, which has an 8-GiB root volume and an additional 22-GiB volume used for the Docker image. If provided, cloudknot will use the ECS-optimized AMI based on Amazon Linux 2 and increase the attached volume size to the value of volume_size. If this parameter is provided, you may not specify the image_id.

  • image_id (string or None, optional) – optional AMI id used for instances launched in this compute environment Default: None

  • ec2_key_pair (string or None, optional) – optional EC2 key pair used for instances launched in this compute environment Default: None

  • bid_percentage (int, optional) – Compute environment bid percentage if using spot instances Default: None, which means that on-demand instances are provisioned.

  • job_queue_name (str, optional) – Name for this knot’s AWS Batch job queue Default: name + ‘-ck-jq’

  • priority (int, optional) – Default priority for jobs in this knot’s job queue Default: 1

  • aws_resource_tags (dict or list of dicts) – Additional AWS resource tags to apply to this repository

Pars

While Knot creates job-specific resources, Pars creates persistent resources that can be used for different types of AWS Batch workflows. PARS stands for Persistent AWS Resource Set. You can use one Pars for all of your cloudknot jobs. Or you may need to create Pars with different permission sets for different types of jobs. See Examples for more details.

cloudknot.Pars

class cloudknot.Pars(name=None, batch_service_role_name=None, ecs_instance_role_name=None, spot_fleet_role_name=None, policies=(), use_default_vpc=True, ipv4_cidr=None, instance_tenancy=None, aws_resource_tags=None)[source]

PARS stands for Persistent AWS Resource Set.

This object collects AWS resources that could, in theory, be created only once for each cloudknot user and used for all of their subsequent AWS batch jobs. This set consists of IAM roles, a VPC with subnets for each availability zone, and a security group.

Initialize a PARS instance.

Parameters
  • name (str) – The name of this PARS. If pars name exists in the config file, Pars will retrieve those PARS resource parameters. Otherwise, Pars will create a new PARS with this name. Must be less than 46 characters. Must satisfy regular expression pattern: [a-zA-Z][-a-zA-Z0-9]* Default: ‘${AWS-username}-default’

  • batch_service_role_name (str) – Name of this PARS’ batch service IAM role. If the role already exists, Pars will adopt it. Otherwise, it will create it. Default: name + ‘-cloudknot-batch-service-role’

  • ecs_instance_role_name (str) – Name of this PARS’ ECS instance IAM role. If the role already exists, Pars will adopt it. Otherwise, it will create it. Default: name + ‘-cloudknot-ecs-instance-role’

  • spot_fleet_role_name (str) – Name of this PARS’ spot fleet IAM role. If the role already exists, Pars will adopt it. Otherwise, it will create it. Default: name + ‘-cloudknot-spot-fleet-role’

  • policies (tuple of strings) – tuple of names of AWS policy ARNs to attach to each role Default: ()

  • use_default_vpc (bool) – if True, create or retrieve the default VPC if False, use other input args to create a non-default VPC

  • ipv4_cidr (string) – IPv4 CIDR block to be used for creation of a new VPC

  • instance_tenancy (string) – Instance tenancy for this VPC, one of [‘default’, ‘dedicated’] Default: ‘default’

  • aws_resource_tags (dict or list of dicts) – Additional AWS resource tags to apply to this repository

DockerImage

DockerImage is basically Knot without any of the AWS resources or submit capabilities. It will take your existing code, create a command line interface, and Dockerize it for later upload to AWS. If your function has simple dependencies, then Knot will do all of this for you. If you need more customization, then you may need to use DockerImage first. See Examples for more details.

cloudknot.DockerImage

class cloudknot.DockerImage(name=None, func=None, script_path=None, dir_name=None, base_image=None, github_installs=(), ignore_installed=False, pin_pip_versions=False, username=None, overwrite=False)[source]

Class for dockerizing a python script or function.

If given a python function, DockerImage will create a CLI version for that function, write a requirements.txt file for all import statements in the function, and write a Dockerfile to containerize that python script. If given a path to a python script, DockerImage will assume it has a CLI and will skip the first step, building a requirements.txt file and a Dockerfile as before.

If the input script or function contains imports that cannot be identified by pipreqs (i.e. cannot be installed with pip install package, those packages will not be included in requirements.txt, DockerImage will throw a warning, and the user must install those packages by hand in the Dockerfile.

Parameters
  • name (str) – Name of DockerImage, only used to retrieve DockerImage from config file info. Do not use to create new DockerImage. Must satisfy regular expression pattern: [a-zA-Z][-a-zA-Z0-9]*

  • func (function) – Python function to be dockerized

  • script_path (string) – Path to file with python script to be dockerized

  • dir_name (string) – Directory to store Dockerfile, requirements.txt, and python script with CLI Default: parent directory of script if script_path is provided else DockerImage creates a new directory, accessible by the build_path property.

  • base_image (string) – Docker base image on which to base this Dockerfile Default: None will use the python base image for the current version of python

  • github_installs (string or sequence of strings) – Github addresses for packages to install from github rather than PyPI (e.g. git://github.com/nrdg/cloudknot.git or git://github.com/nrdg/cloudknot.git@newfeaturebranch) Default: ()

  • ignore_installed (bool, default=False) – If True, add the –ignore-installed flag when installing all GitHub packages.

  • pin_pip_versions (bool, default=False) – If True, pin packages in pip requirements file to most recent version.

  • username (string) – Default user created in the Dockerfile Default: ‘cloudknot-user’

  • overwrite (bool, default=False) – If True, allow overwriting any existing Dockerfiles, requirements files, or python scripts previously created by cloudknot

Initialize a DockerImage instance.

Parameters
  • name (str) – Name of DockerImage, only used to save/retrieve DockerImage from config file info. Must satisfy regular expression pattern: [a-zA-Z][-a-zA-Z0-9]*

  • func (function) – Python function to be dockerized

  • script_path (string) – Path to file with python script to be dockerized

  • dir_name (string) – Directory to store Dockerfile, requirements.txt, and python script with CLI Default: parent directory of script if script_path is provided else DockerImage creates a new directory, accessible by the build_path property.

  • base_image (string) – Docker base image on which to base this Dockerfile Default: None will use the python base image for the current version of python

  • github_installs (string or sequence of strings) – Github addresses for packages to install from github rather than PyPI (e.g. git://github.com/nrdg/cloudknot.git or git://github.com/nrdg/cloudknot.git@newfeaturebranch) Default: ()

  • ignore_installed (bool, default=False) – If True, add the –ignore-installed flag when installing all GitHub packages.

  • pin_pip_versions (bool, default=False) – If True, pin packages in pip requirements file to most recent version.

  • username (string) – Default user created in the Dockerfile Default: ‘cloudknot-user’

  • overwrite (bool, default=False) – If True, allow overwriting any existing Dockerfiles, requirements files, or python scripts previously created by cloudknot

Clobbering and AWS resource persistence

Each cloudknot object has a clobber method that will delete all associated AWS resources and remove references to those resources from the cloudknot config file. If you do not clobber an instance, you can retrieve it in a later cloudknot session by using its name in the initialization arguments. This will reduce the overhead of AWS resource creation (especially if it means avoiding pushing another Docker image for Knots or DockerImages). However, if you know you are done with a resource on AWS, it is good practice to clobber the associated cloudknot object to reduce clutter and avoid name collisions.

API

For details on the rest of the cloudknot API, please see the following module pages.