Using CCQ

ccqsub Command and Arguments

CCQ is a wrapper for common schedulers such as Torque and Slurm. It gives the schedulers autoscaling capability within AWS. When a user submits a job using this script, the job script will be parsed, the number of instances needed determined, and the instances will be launched for the job at run-time. Commands that are specified in the job script take precedence over the options specified on the command line. These commands can be denoted inside of the jobscript itself by using the #CC directive.

The commands that are available through the #CC directive are -it (instance type), -nt (network type), -ni (number of instances requested), -op (optimization type), -p (criteria priority), -cpu (number of CPUs requested), -mem (amount of memory in MB requested), -s (scheduler to use), -st (scheduler type), -jn (job name), -vt (volume type), -us (use spot), -sp (spot price).

The output files from the job will appear in the CloudyCluster user’s home directory on the instance the job was submitted from by default if you are using a CloudyCluster instance to submit your files. However there may be a minute or two delay between when the job finishes and when the files appear on the machine the job was submitted from due to extra processing needed.

If you submit from a host outside of CloudyCluster the output files will be stored in the CloudyCluster user’s home directory on the Login Instance associated with the Cluster that the job was submitted to. This behavior can be changed by specifying the -o and -e PBS directives in your job scripts or using the -o and -e ccqsub command line arguments. These tell the scheduler to output the files to the directories specified inside of the job script instead of the defaults. If an output file is missing check the /opt/CloudyCluster/ccqsub/undeliveredJobOutput/{CloudyCluster_user_name} directory on the Scheduler the job was submitted on as this is the default directory for undeliverable job output files.

This is the list of possible options that can be used with ccqsub. The CCQ Directive may be used with CCQ in a job script.

- Option Description
-h, –help Show help message and exit
-V Show program’s version number and exit
-i I, -i The path to the file containing the app key for use when validating the user on the requested resources.
-ru RU, -ru RU The remote username to run the job as. This parameter only applies when app keys are used. If the app key belongs to a Proxy user, the remote user name is the username that the job should run as. If the app key does not belong to a Proxy user, then the job is run as the user who the app key belongs too. This argument cannot be used without specifying the -i argument as well.
-js {job_script_location} The path to the job script file that you want to submit to the Scheduler/Target.
-jn {job_name} The name of the job that will be saved so you can resubmit the job later without having to resubmit the job script itself.

The options below serve as both commandline options and job script directives that begin with #CC in the job script.

- Option / #CC Directive Description
-nt low | moderate | high | 10GB Specifies the amount of network capacity needed for the job. If not specified, defaults to “default” which means it will not factor into the calculation of instance type needed for the job.
-ni {number_instances} The number of instances that you want the job to run on. The default setting is one instance.
-cpu {cpu_count} The number of CPUs that you want per instance that your job is running on. The default setting is one CPU per instance.
-mem {mem_size_in_MB} The amount of memory in MB per instance. The default setting is 1000 MB (1 GB) per instance.
-s {name_of_scheduler_to_use} Specifies the name of the Scheduler/Target that you want to use. The default value is to use the default Scheduler/Target for the Scheduler/Target type you have requested. This default variable can be set using the ccq.config file with the variable defaultScheduler={schedulerName}.
-st Torque | Slurm | default Specifies the type of Scheduler/Target that you want to use. The accepted values are Torque, Condor, SGE, and Slurm. If the Scheduler/Target type is not specified with a job script then ccqsub will attempt to figure out from the job script what type of Scheduler/Target the job is to be run on. If no job script is submitted then the value will default to the default Scheduler/Target for the Cluster.
-us yes no
-sp {target_spot_price} The targeted spot instance price that you will be willing to pay to run the instances. A valid spot price is a number formatted as 1.23 that contains no letters, and must be greater than 0. This argument must be specified if using spot instances. If using a Spot Fleet and multiple instance types, multiple Spot Prices can be specified using a comma separated list. The number of entries in the comma separated list must equal the number of instance types in the -it argument.
-sw {spot_instance_weights} The weighted value of each instance type. The number of instances launched is determined by the Spot Fleet Total size divided by the weights. Please see the AWS User Guide for a more detailed explanation of how the weights affect a Spot Fleet Request. If no weights are specified then the weight for each instance type will be set to 1. If using multiple instance types, multiple Spot Instance Weights can be specified using a comma separated list. The number of entries in the comma separated list must equal the number of instance types in the -it argument.
-sf The weighted value of each instance type. The number of instances launched is determined by the Spot Fleet Total size divided by the weights. Please see the AWS User Guide for a more detailed explanation of how the weights affect a Spot Fleet Request. If no weights are specified then the weight for each instance type will be set to 1. If using multiple instance types, multiple Spot Instance Weights can be specified using a comma separated list. The number of entries in the comma separated list must equal the number of instance types in the -it argument.
-ft {spot_fleet_type} Sets the type of the Spot Fleet. By choosing “lowestPrice” the fleet size specified will be based off of the number of instances launched. If this argument is set to “cores” then fleet size is determined by the number of cores each instance has and until the total number of cores in all instances is greater than the fleet size more instances will be launched. NOTE: if “cores” is selected DO NOT request a specific number of instances in the job script (ex: #SBATCH -N or #PBS -l nodes=2). This is because the number of instances launched is variable and if you specify too many or too few instances in your script the job will fail. This argument is required for using a Spot Fleet.
-fs {spot_fleet_total_size} The total size of the Spot Fleet. By using the -ft argument the user can choose if this size is determined by the number of instances launched or a number of total cores for all instances combined. This parameter is required when using the -sf option. A valid total size is a number greater than 0 and within your AWS Account limits.
-it {instance_type} Specifies the AWS EC2 instance type that the job is to be run on. If no instance type is specified, then the amount of RAM and CPUs will be used to determine an appropriate AWS EC2 Instance. A default instance type can be set using the “defaultInstanceType” directive in the CCQ Config file.
-op cost | performance Specifies whether to use the instance type that is most cost effective or one that will give better performance regardless of cost. The default is “cost”.
-p mcn | mnc | cmn | cnm | ncm | nmc Specifies the priority that is considered when calculating the appropriate instance type for the job. Where m = memory, n = network, and c = cpu. For example specifying “-p ncm” would mean that when calculating the instance type the priority is Network requirements, Cpu requirements, then Memory requirements. This means that Networking is considered first, then the number of Cps, then the amount of memory when choosing an instance type. The default is “mcn” or Memory, Cpus, and then Network.
-vt magnetic | ssd Specifies the type of Volume to launch with the EC2 instances for the job on. The default is “ssd”. This value can also be set using the volumeType={volumeType} variable in the ccq.config file.
-cl {days_for_login_cert_to_be_valid_for} Specifies the number of days that the generated CCQ login certificate is valid. This certificate is used so that you do not have to enter your username/password combination each time you submit a job. The default is 1 day, and the value must be an integer greater than or equal to 0. Setting the certificate valid length to 0 will disable the generation of login certificates. If the certLength variable is set in the ccq.config file then the value in the ccq.config file will override the value entered via the commandline.
-pr Specifies that CCQ should print the estimated price for a specific job script but not run the job. No resources will be launched and the estimated price of the job will be shown. This only includes in the instance costs per hour.
-o {stdout_file_location} The path to the file where you want the Standard Output from your job to be written. The default location is the directory where ccqsub was invoked. The name of the file will be the job name combined with the job id on the machine the job was submitted from.
-e {stderr_file_location} The path to the file where you want the Standard Error from your job to be written. The default location is the directory where ccqsub was invoked. The name of the file will be the job name combined with the job id on the machine the job was submitted from.
-ti Specifies that CCQ should immediately terminate the instances created by the CCQ job as soon as the CCQ job has completed and not to wait to see if they can be used for other jobs. This argument only applies if the job creates a new compute group. If the job re- uses existing instances they will not be terminated upon job completion.
-ps Specifies that CCQ should skip the Provisioning stage where it checks to make sure the job’s user is on the Compute Nodes before continuing. This may be desired if the users are already baked into the Image. If this option is given and the users are not on the Image the job could fail.
-si true|false Specifies if the Compute Instances should enter the Ready state without waiting for the other instances in it’s group to enter the Ready state. This is used for HTC (High Throughput Computing) mode where lots of smaller jobs are submitted by the CCQ job and utilize the other compute instances as they come up. The default value is False.
-tl {days}:{hours}:{minutes} Specifies the amount of time that the job is allowed to run before CCQ will automatically terminate all the instances. If the job completes successfully within the time limit then the instances will be deleted via the CCQ auto-delete process. The default value is that there is not a time limit and the job will run for as long as it needs to. The format to specify a time limit is: {days}:{hours}:{minutes}, this is the amount of time from the initial processing of the job that CCQ will let the job run. You may also specify “unlimited” if you do not want the instances to terminate until you delete them.
-cp Specifies that this CCQ job should only create placeholder/parent instances and not actually submit a job to the HPC Scheduler. This allows for the compute instances to be created dynamically and remain running as long as the specified time limit. The use of this argument requires the -tl argument as well. The default value is False.
-ai {ami_id} Specifies if the AMI Id of the AMI that CCQ should use to launch the Compute Instances for the job. This MUST be an AMI that contains the CloudyCluster software or IT WILL NOT WORK. If no AMI is specified then the CloudyCluster AMI the Scheduler instance is using will be used. The ami format is of the form: ami-xxxxxxxx
-mi {maximum_idle_time} Specifies the maximum amount of time that the instances created by the job should remain running if no jobs are running on the instances. The maximum idle time is specified in terms of minutes and the default is 5.