The goal of the CIPRES REST API (CRA) is to allow users to access phylogenetic software supported by CIPRES outside the confines of a point and click browser interface. Unlike the CIPRES Science Gateway (CSG) website , which stores jobs and data indefinitely, the CRA is intended to be a convenient way to run phylogenetic programs on large HPC resources, but does not provide long term data storage. The CRA currently stores jobs for only 4 weeks. This time period is long enough to troubleshoot problems and to ensure that job results aren't lost, but organization and preservation of jobs and results from the CRA is the responsibility of the user.
To use the CRA, you must register as a user, and register any application(s) you wish to develop, as well. Instructions for registration are found below.
The base URL for the API is https://cipresrest.sdsc.edu/cipresrest/v1.
The examples in this guide use the unix curl
command and assume you have registered with the CRA and have set the following environment variables:
For example, using the bash shell:
$ # Remember to replace "MyPassWord" and "insects-..." with your information $ export URL=https://cipresrest.sdsc.edu/cipresrest/v1 $ export PASSWORD=MyPassWord $ export KEY=insects-095D20923FAE439982B6D5EBD2E339C9
curl
is of course just one of many ways to interact with a REST API.
There are numerous java, php, perl, python, etc., libraries that make it easy to use REST services.
To get started, sign in or register for a CIPRES REST account. Once you've signed in, you can visit "My Profile" to change your account information and password. To register an application, use the Application Management console, found under the "Developer" drop down menu.
When you register an application, you must choose between DIRECT and UMBRELLA authentication models.
DIRECT is the more common choice, and the choice you want if you wish to use the API from your application immediately. DIRECT authentication means that the username and password of the person running the application will be sent in HTTP basic authentication headers, and jobs will be submitted on behalf of the authenticated user only. If people other than you will be running your application, they will need to register for their own CRA accounts and provide their credentials to your application, so that your code can submit jobs for them.
UMBRELLA is a special case used by web applications that submit jobs on behalf of multiple registered users. Web applications that use UMBRELLA authentication also authenticate with a username and password, that of the person who registered the application. The UMBRELLA application provides the identity of the user that submitted a given job using custom request headers. As a result, users registered with an UMBRELLA application need not register with the CRA. Because UMBRELLA authentication involves a trust relationship (i.e. we are trusting you to accurately identify the individual who submits each request), we will need to talk to you before activating your UMBRELLA application to insure all of our requirements are met. If you are interested in registering an UMBRELLA application, please contact us.
The examples shown in this guide are for DIRECT applications, but with minor changes, they will also work for UMBRELLA Applications, as shown in UMBRELLA Authentication section.
The API requires you to send a username and password in HTTP Basic Authentication headers with each request. The use of SSL ensures that the information is transmitted securely.
In addition to sending a username and password, you must send your application ID in
a custom request header named cipres-appkey
.
Let's get started using the API.
Suppose your username is tom
, you've registered a DIRECT application named insects
,
and set URL, PASSWORD and KEY environment variables as shown in the Introduction. Here's
how you would get a list of the jobs you've submitted:
$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY\ $URL/job/tom Submitted Jobs $URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90 jobstatus NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90 $URL/v1/job/tom/NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4 jobstatus NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4
To get more information about a specific job in the list, use its jobstatus.selfUri.url
. For example, to
retrieve the full jobstatus of the first job in the list above:
$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ $URL/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90 . . .
Alternatively, when when you ask for the list of jobs, use the expand=true
query parameter to request
full jobstatus objects.
If you have a CIPRES REST account and have registered a DIRECT application, try getting your list of submitted jobs now. Since you haven't submitted any jobs yet, the list will be empty and will look like this:
Submitted Jobs
TIP: Throughout the API, XML elements named selfUri
link to the full version of the containing object.
All Uri elements, including selfUri
, contain a url
which gives the actual url,
a rel
which describes the type of data that the url returns and a title
.
It's good practice to navigate through the API
by using the Uris the API returns instead of constructing urls to specific objects yourself.
Now that we know how to list jobs; let's consider job submission.
You can submit a job by issuing a POST request to $URL/job/username
with multipart/form-data.
Remember to replace username
with your username, or the username of the person running your application.
Most tools can be run minimally using only two fields: a tool identifier and a file to be processed.
Below is an example of a simple job submission:
$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ $URL/job/tom \ -F tool=CLUSTALW \ -F input.infile_=@./sample1_in.fasta
In this example, the fields used are:
tool=CLUSTALW
tool
field identifies the tool to be used, in this case, CLUSTALW.
Job submissions must always include a tool
.
You can find a list of phylogenetic programs and their respective tool IDs by using the Tool API.input.infile_=@./sample1_in.fasta
input.infile_
field is also mandatory; it identifies the main data file to be operated on.
input.infile_
is usually a set of sequences to align or a character matrix.
In this example, we're sending the contents of the file named sample1_in.fasta.
The '@' tells curl to send sample1_in.fasta as an attached file. A submission like this, with just a tool ID and input file, will succeed for most tools, and will cause the application to run a job with whatever defaults CIPRES has selected. You can try a CLUSTALW job this way if you like, using a sample input file. Of course, many job submissions will require configuration of command line options to non-default values, and (often) submission of auxiliary files that specify starting trees, constraints, etc.
There are four types of fields for configuring a job submission:
tool
. For example tool=CLUSTALW
input.
and end with an underscore. For example,
input.infile_="abcdef..."
vparam.
and end with an
underscore. For example, vparam.runtime_=1.5
.
metadata.
and don't have a trailing underscore. For example, metadata.statusEmail=true
.
The set of available input.
and vparam.
fields varies with the tool selected and are defined in an XML document (the PISE document)
for each tool. These documents can be retrieved from the list at
Tools: How to Configure Specific Tools or through the Tool API .
However, depending on how you intend to use the REST API, you may never need to
look at a PISE document. The easiest way to learn how to configure CIPRES jobs is to use the interactive Tool Configuration Helper.
If you're interested in knowing more about the PISE tool descriptions, consult Tool Specific Parameters in the appendix.
The metadata.
fields are the same, regardless of the tool selected, and are described in the next section.
Note: Before you submit a job, you may want to check whether the request is composed correctly. How to Make a Test Run explains how to submit a job for validation before submitting it to run.
A job submission may include the following optional metadata fields:
metadata.clientJobId
metadata.clientJobName
metadata.clientToolName
metadata.statusEmail
metadata.emailAddress
statusEmail
to override the default email destination.
By default, job completion emails are sent to the user's registered email address. (Or in the case of UMBRELLA applications,
to the address in the cipres-eu-email header of the job submission request). Use this property to direct the email
somewhere else.metadata.statusUrlPut
metadata.statusUrlGet
All metadata fields are limited to 100 characters, and all are optional. Metadata will be returned with the rest of the information about the job when you later ask for the job's status.
In the following example, Tom, uses some of the metadata fields described above to supply a job ID, generated by his application, and to request email notification of job completion.$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ $URL/job/tom \ -F tool=CLUSTALW \ -F input.infile_=@./sample1_in.fasta \ -F metadata.clientJobId=101 \ -F metadata.statusEmail=true
As noted above, many runs will be more complicated than this because of the need to configure the precise command line. We suggest that you continue through this guide to learn how to check job status, download results, and handle errors, and then use the Tool Configuration Helper and/or read Tool Specific Parameters in the Appendix to create more customized runs.
Successful job submission returns a jobstatus
object that looks like this:
$URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90 jobstatus NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90 NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90 QUEUE false false clientJobId 101 2014-09-10T15:54:58-07:00 $URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output results Job Results $URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/workingdir workingdir Job Working Directory 2014-09-10T15:54:59-07:00 QUEUE Added to cipres run queue. 60
Elements of particular interest are:
jobHandle
jobStage
jobstatus.messages
to monitor the progress of a job.
messages
terminalStage
failed
minPollIntervalSeconds
The jobstatus also includes several urls:
selfUri
workingDirUri
resultsUri
The job is finished when jobstatus.terminalStage=true
. Use jobstatus.selfUri.url
to check the status of the job, like this:
$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ $URL/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90
Alternatively, you can check the status of multiple jobs in a single GET of endpoint $URL/job
by using multiple instances of
the jh=jobhandle
query parameter. In this case the URL does not include the username (so that UMBRELLA
applications can check on jobs for all their end users with a single query).
$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY\ $URL/job/?jh=NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90\&jh=NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4
We ask users to keep polling frequency as low as possible to avoid overloading CIPRES:
As a rule, jobstatus.minPollInterval
specifies the shortest polling interval that you may use. However we encourage you to poll much less
frequently when possible. For example, if you aren't returning intermediate results to your users and you submit a job with a maximum run time that's more
than hour, please consider increasing the polling interval to 15 minutes.
As an alternative to frequent polling, consider using metdata.statusEmail=true
in your job submission so that CIPRES will
email you when the job is finished. Showing courtesy here will allow us to avoid having to enforce hard limits.
If you poll for the status of multiple jobs in a single call, please use jobstatus.minPollInterval
of the most recently submitted job as
your minimum polling interval.
Once jobstatus.terminalStage=true
, you can list and then retrieve the final results. Issue a GET request to
the URL specified by jobstatus.resultsUri.url
, like this:
$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ $URL/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output $URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1544 fileDownload STDOUT NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90 STDOUT 1243 PROCESS_OUTPUT 1544 $URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1545 fileDownload STDERR NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90 STDERR 0 PROCESS_OUTPUT 1545 $URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1550 fileDownload infile.aln NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90 infile.aln 1449 aligfile 1550 $URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1551 fileDownload term.txt NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90 term.txt 338 all_outputfiles 1551 $URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1552 fileDownload batch_command.cmdline NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90 batch_command.cmdline 48 all_outputfiles 1552 ...
Use the jobfile.downloadUri.url
links to download individual result files, like this:
$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ -O -J \ $URL/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1544 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1243 0 1243 0 0 178 0 --:--:-- 0:00:06 --:--:-- 313 curl: Saved to filename 'STDOUT'
If you are interested in monitoring the progress of a job while it is running, you can use
jobstatus.workingDirUri.url
to retrieve the list of files in the job's working directory.
The job only has a working directory after it has been staged to the execution
host and is waiting to run, is running, or is waiting to be cleaned up. If you use this URL
at other times, it will return an empty list. Furthermore, if you happen to use this URL
while CIPRES is in the process of removing the working directory, you may receive a transient
error. Because of this possibility, be prepared to retry the operation.
$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ $URL/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/workingdir $URL/job/tom/NGBW-JOB-CLUSTALW-0171A3F1BFA0477CAF35B79CE075DF9C/workingdir/scheduler.conf fileDownload scheduler.conf scheduler.conf 11 2014-09-20T16:18:05-07:00 0 . . . $URL/job/tom/NGBW-JOB-CLUSTALW-0171A3F1BFA0477CAF35B79CE075DF9C/workingdir/infile.dnd fileDownload infile.dnd infile.dnd 137 2014-09-20T16:18:13-07:00 0
To retrieve a file from the working directory list, use its jobfile.downloadUri.url
. Be prepared
to handle transient errors, as well as a permanent 404 NOT FOUND error, once the working directory has been removed.
$ curl -k -u tom:tom \ -H cipres-appkey:$KEY \ -O -J \ $URL/job/tom/NGBW-JOB-CLUSTALW-0171A3F1BFA0477CAF35B79CE075DF9C/workingdir/infile.dnd curl: Saved to filename 'infile.dnd'
Once a job has finished and you've downloaded the results, it's a good idea to delete the job. You may also want to delete a job that hasn't finished yet if you, or the user of your application, realize you made a mistake and don't want to waste the compute time.
$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ -X DELETE \ $URL/job/tom/NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4
There is no data returned from a successful DELETE.
If the job is scheduled to run or is running at the time you delete it, it will be cancelled. Either way, all info associated with the job will be removed. You can verify that the job has been deleted by doing a GET of its jobstatus url. Http status 404 (NOT FOUND) will be returned along with an error object. We demonstrate this below by using curl's -i option, which tells curl to include the http header in its output.
$ curl -i -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ $URL/job/tom/NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4 HTTP/1.1 404 Not Found Server: Apache-Coyote/1.1 Content-Type: application/xml Transfer-Encoding: chunked Date: Thu, 11 Sep 2014 21:43:54 GM Job not found. Job Not Found Error: org.ngbw.sdk.jobs.JobNotFoundException: NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4 4
To cancel a running job without deleting it, you can send a PUT request with the action parameter set to 'cancel.'
$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ -X PUT \ -F action=cancel \ $URL/job/tom/modify/NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4
Http status codes are used to indicate whether an API request succeeded or failed.
When the http status indicates failure (with a status other than 200) an error
object is returned.
A basic error
object looks like this:
Job Not Found Job Not Found Error: org.ngbw.sdk.jobs.JobNotFoundException: NGBW-JOB-CLUSTALW-261679BE83E245AD8EEECB4592A52B81 4
The displayMessage
is a user friendly description of the error. The contents of the message
are not meant for end users, but
may be helpful in debugging. The code
indicates the type of error, for example code = 4 is "not found", as shown
in the source code
for ErrorData.java
A job validation error may contain a list of field errors. For example:
$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ $URL/job/tom \ -F tool=CLUSTALW \ -F metadata.clientJobId=110 \ -F input.infile_=@./sample1_in.fasta \ -F vparam.runtime_="one hour" \ -F vparam.foo_=bar Form validation error. Validation Error: 5
runtime_ Must be a Double. foo_ Does not exist.
jobstatus
, results
, jobfile
,
error
, etc., are not fully documented yet, however the basic
schema
is available. You can also view the
java source code for these data structures. CIPRES maps the java classes to XML using JAXB. If you happen
to be implementing in java you may want to use the java source code, linked to above, with JAXB, to unmarshall the XML documents that the CRA returns.
We may find it necessary to add elements to the schema as time goes by but your application should continue to work provided it ignores any elements it doesn't recognize.
The tool API provides information about the phylogenetic tools that can be run on CIPRES. It's public: no credentials and
no special headers are required, so it's easy to use a browser or curl
to explore it.
You can use the Tool API to learn the IDs of the tools you're interested in running and to download their PISE XML descriptions.
Definition: Strictly speaking, a CIPRES tool is an interface for configuring command line job submissions. It is defined by a PISE XML document found in the Tool API. Each tool deploys jobs for a single phylogenetic program (e.g. CLUSTALW, MrBayes, RaXML, etc.). However, more than one tool may invoke the same program. For example, the RAxML program, is run by two tools, one that provides a simple "blackbox" interface (RAXMLHPC2BB), and one that exposes nearly all RAxML options (RAXMLHPC2_TGB).
Go to $URL/tool
in the browser, or use curl, as shown below, to see a list of the available tools:
$ curl $URL/tool MRBAYES_321RESTARTBETA Tree Inference Using Bayesian Analysis - run on XSEDE $URL/tool/MRBAYES_321RESTARTBETA tool MRBAYES_321RESTARTBETA $URL/tool/MRBAYES_321RESTARTBETA/doc/pise Pise XML MRBAYES_321RESTARTBETA pise $URL1/tool/MRBAYES_321RESTARTBETA/doc/portal2 Html Web Page MRBAYES_321RESTARTBETA type $URL/tool/MRBAYES_321RESTARTBETA/doc/example Html Web Page MRBAYES_321RESTARTBETA type $URL/tool/MRBAYES_321RESTARTBETA/doc/param Html Web Page MRBAYES_321RESTARTBETA type PROBCONS Probabilistic Consistency-based Multiple Alignment of Amino/Nucleic Acid Sequences . . . $URL/tool/PROBCONS tool PROBCONS
Each tool description includes the toolId
, toolName
, and a number of "Uri" elements, which are links to various
documents for the specific tool.
As we mentioned earlier, it's good practice to navigate through the API using these returned links rather than hardcoding the urls.
For example, all the urls in the table below can all be extracted from the data returned by the top level resource at
$URL/tool
.
GET | $URL/tool | Use this to get a list of the available tools. |
GET | $URL/tool/toolId | Use this to get the URLs that link to the tool's documents (ie. the documents listed below). |
GET | $URL/tool/toolId/doc/pise | Uset this URL to download the tool's PISE XML file. |
GET | $URL/tool/toolId/doc/portal2 | Use this URL, in a browser, to read a detailed description of the tool. This URL returns http status 303 and a Location header that redirects to the html page on the CIPRES Science Gateway that gives a detailed, human readable, description of the tool. |
GET | $URL/tool/toolId/doc/example | Not implemented yet. Will give examples showing how to submit jobs to use this tool. |
GET | $URL/tool/toolId/doc/param | Not implemented yet. Will give a human readable description of each of the tool's parameters. |
This directory contains a perl script that makes use of libwww-perl to access the CRA. It repeatedly prompts the user to retrieve a list of supported tools, submit a job, show the user's jobs, show a job's results or download a job's results.
To download the perl sample code run
Java Client Library and Example (View)$ svn export https://svn.sdsc.edu/repo/scigap/tags/rest-R10.226/rest_client_examples/examples/perl_demo
This directory contains source code for a client library for communicating with the CRA and a sample program that uses the client library. When you build with maven, as explained in the cipres_java_client directory's Readme.txt, javadoc documentation and a jar named cipres_java_client.jar are created. The best way to learn to use the library is to study the example included with the source code, org.ngbw.directclient.example.Example.java .
You must get the source for restdatatypes and build the restdatatypes.jar before building cipres_java_client:
$ svn export https://svn.sdsc.edu/repo/scigap/tags/rest-R10.226/rest/datatypes $ svn export https://svn.sdsc.edu/repo/scigap/tags/rest-R10.226/rest/cipres_java_client
Python Client Library and Example (View)$ cd datatypes; mvn clean install $ cd ../cipres_java_client; mvn clean install
This example shows how to communicate with the CRA from python, using the Requests package. The code was originally created as an example but has been developed into a full fledged client library. You can install the library by using svn to export the code from the link above or you can install it by running "pip install python_cipres".
The package includes two command line scripts, tooltest.py and cipresjob.py, that serve as good examples of how to import and use python_cipres. The main api is inclient.py
. To use the package, import client.py with a statement like import python_cipres.client as CipresClient
.
CIPRES has the following per user limits:
When a request is rejected due to a usage limit, the http status will be 429 (Too Many Requests).
The error.code
will be 103, which is the CIPRES generic "UsageLimit" error code.
The error
will contain a nested limitStatus
element which has
type
and ceiling
fields.
Too many active jobs. Limit is 1 org.ngbw.sdk.UsageLimitException: Too many active jobs. Limit is 1 103
active_limit 1
Currently, the limits are: concurrent_limit=10, active_limit=50, other_su_limit=30,000 and xsede_su_limit=30,000. These limits can be modified for specific applications and users. If you have a problem with the default limits, please contact us to discuss your needs.
A future release of the REST API will
Applications that are registered to use Umbrella authentication can use the commands in this guide, with additional request headers that identify the end user. Behind the scenes, CIPRES creates an account for the end user with a username of the form application_name.cipres_eu_header and it is this qualified username that goes in the URLs.
Basic authentication credentials | ALWAYS | DIRECT applications send the user's CIPRES REST username and password. UMBRELLA applications send the username and password of the person who registered the application. See Authentication. |
cipres-appkey | ALWAYS | Application ID generated by CIPRES when you registered the application. It can be changed later if necessary. |
cipres-eu | UMBRELLA | Uniquely identifies the user within your application. Up to 200 characters. Single quotes are not allowed within the name. |
cipres-eu-email | UMBRELLA | End user's email address. Up to 200 characters. You can't have 2 users with the same email address. |
cipres-eu-institution | UMBRELLA | End user's home institution. This isn't currently required but may be in the future. |
cipres-eu-country | UMBRELLA | Two letter, upper case, ISO 3166 country code for the end user's institution. Due to the way CIPRES is funded, US researchers can be given higher SU limits. The header isn't currently required, but may be in the future. If not sent, the user is assumed to be from outside the US for usage limit cutoffs. |
For example, suppose your username is mary
and you're integrating an existing web application with the CIPRES REST API.
You've registered the application with the name phylobank
and set the authentication method to UMBRELLA.
Now suppose a user named harry
logs into your application and your application needs to get a list of jobs that
harry has submitted to CIPRES. First, you go to your database or user management component and retrieve harry's email address,
institutional affiliation, and optional ISO 3166 2 letter country code. Now you're ready to issue this curl command (or the equivalent
statement in the language you're using):
$ curl -i -u mary:password \ -H cipres-appkey:$KEY \ -H cipres-eu:harry \ -H cipres-eu-email:harry@ucsddd.edu \ -H cipres-eu-institution:UCSD \ -H cipres-eu-country:US \ $URL/job/phylobank.harry
Notice that although the value of the cipres-eu
header is harry
, in the URL, you must use phylobank.harry
.
Here you submit a basic clustalw job for harry and get back a jobstatus object.
$ curl -u mary:password \ -H cipres-appkey:$KEY \ -H cipres-eu:harry \ -H cipres-eu-email:harry@ucsddd.edu \ -H cipres-eu-institution:UCSD \ -H cipres-eu-country:US \ $URL/job/phylobank.harry\ -F tool=CLUSTALW \ -F input.infile_=@./sample1_in.fasta \ $URL/cipresrest/v1/job/phylobank.harry/NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A jobstatus NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A QUEUE false false clientJobId 010007AQ 2014-09-12T12:36:31-07:00 $URL/cipresrest/v1/job/phylobank.harry/NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A/output results Job Results $URL/cipresrest/v1/job/phylobank.harry/NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A/workingdir workingdir Job Working Directory 2014-09-12T12:36:31-07:00 QUEUE Added to cipres run queue. 60
You can check the status of a single job, using the jobstatus.selfUri.url
that was returned when the job was submitted, like this:
$ curl -u mary:mary \ -H cipres-appkey:$KEY \ -H cipres-eu:harry \ -H cipres-eu-email:harry@ucsddd.edu \ -H cipres-eu-institution:UCSD \ -H cipres-eu-country:US \ $URL/job/phylobank.harry/NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A
or you can get the status of multiple jobs, submitted on behalf of multiple users with a single GET of
$URL/job
. Indicate which jobs you're interested in with query parameters named jh
(for "job handle").
Use separate jh
parameters for each job. With this request, the cipres-appkey
header is required, but
end user headers are not. For example:
$ curl -u mary:mary \ -H cipres-appkey:$KEY \ $URL/job/?jh=NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A\&jh=NGBW-JOB-CLUSTALW-553D534D355C4631BBDCF217BB792A01
If you're using curl in a typical unix shell, you must place a backslash before the &
that separates the query parameters
to escape it from interpretation by the shell.
The other things you may need to do are 1) retrieve files from a job's working directory while it's running, 2) retrieve final results once a job has finished, 3) cancel and/or delete a job. The DIRECT application examples in this guide are applicable to UMBRELLA applications too. Just remember to add the appropriate CIPRES end user headers and prefix the username in the URL with the the application name and a period.
The code for creating command lines and configuring jobs in CIPRES evolved from the Pasteur Institute Software Environment (PISE). PISE is an XML-based standard for generation of web forms that create Unix command lines. For complex phylogenetic programs, there is often significant interdependence of parameters. That is, some options are relevant only if others are selected, some combinations may give non-sensical results, and so forth. The PISE XML documents are rich, and embody all of the information required to create successful, meaningful command lines. Where possible, they also prevent creation of incorrect commands that would cause an immediate error. Thus, the PISE XML documents are the definitive reference for configuring CRA job submissions. The PISE XML documents for all the CRA tools are available through the Tools: How to Configure Specific Tools list or through the Tool API . (Please see the Tool API section, for the precise definition of a CIPRES "tool").
The relationship between the "input" and "vparam" fields that you send to the REST API and the PISE XML document is as follows:
input.parameter_name_
. Each such field corresponds
to a <parameter> in the tool's PISE XML file, where the name of the parameter is parameter_name and the parameter's
type is InFile
.
Every PISE file defines one special InFile
parameter that is, by convention, named infile
. This parameter
has the attribute isinput=1
, which means that it is the primary input, and must always be included in any run
of this tool. Other InFile parameters allow you to submit optional files containing constraints, guide trees, etc.
(as appropriate for the tool, and the particular analysis). vparam.parameter_name_
. Each such field corresponds
to a <parameter> in the tool's PISE XML file where the name of the parameter is parameter_name and the parameter's
type is Switch
, String
, Integer
, etc. These parameters are used to configure the
command line and certain other aspects of a run, such as how long the job is allowed to run. They are called visible parameters,
because in the CSG website, they correspond to textareas, radio buttons and other visible form controls.
Continuing with the job submission example used earlier in this guide, here's how Tom could submit a CLUSTALW job that uses a guidetree, produces phylip output and has a limited maximum run time:
The fields, that Tom added are:$ curl -u tom:$PASSWORD \ -H cipres-appkey:$KEY \ $URL/job/tom \ -F tool=CLUSTALW \ -F metadata.clientJobId=102 \ -F metadata.statusEmail=true \ -F input.infile_=@./sample1_in.fasta \ -F input.usetree_@./guidetree.dnd \ -F vparam.runtime_=1 \ -F vparam.phylip_alig_=1
-F input.usetree_=@/.guidetree.dnd
input.usetree
causes CIPRES to add a -usetree
option to the CLUSTALW command line. This tells CLUSTALW
to use the specified file as a guide tree for the alignment.
usetree
is the name of a parameter of type InFile
, in CLUSTALW 's PISE XML document, clustalw.xml.
-F vparam.runtime_=1
runtime
parameter, found in clustalw.xml.
By convention, runtime
is found in every tool's PISE XML file.
If not specified in the job submission, maximum run time would have been set to the default value specified in the PISE XML file, typically 0.5 h.runtime
,
but will see an entity that includes a shared definition of the parameter, like this:
<ENTITY runtime SYSTEM "http://www.phylo.org/dev/rami/XMLDIR/triton_run_time.xml">The exact contents of
triton_run_time.xml
are shown in Example 4, Shared Definition of Runtime Parameter.
-F vparam.phylip_alig_=1
phylip_alig
parameter, set to 1
means that CLUSTALW should be run with the
-output=PHYLIP
command line option. This is defined in clustalw.xml, as explained in
PISE Example 1 below.
Note: In general, only parameters that differ from the defaults specified in the tool's PISE file, need to be specified in the job submission.
Note: With mutually exclusive parameters, you must only send the one that is enabled. In other words, don't send the disabled one with a value of zero, just send the enabled one.
Each tool's PISE XML document contains a collection of parameter
elements.
Most parameter elements correspond to fields you can
use to configure a job submission. Those marked with ishidden="1" and those of type "Results" or "OutFile" are the exceptions.
It is easiest to explain the XML format through a set of examples:
A parameter usually defines a single command line flag or input file. Here is an example from clustalw.xml.
phylip_alig Phylip alignment output format (-output) perl ($value)?" -output=PHYLIP":""
0 2
Each parameter
has a name (in this case, "phylip_alig") and a type. In
this case, the type is "Switch", which means the allowed values are "0" and "1".
To use a parameter from the PISE XML in a CRA job submission, prefix the parameter name with
"vparam" (or with "input", if the type=InFile) and add a trailing underscore. So to use this parameter you'd send
vparam.phylip_alig_="1"
or vparam.phylip_alig_="0"
. If you send this parameter, regardless of whether you
set it to 0 or 1, the perl expression in the format
element will be run. Thus if you set it to 1, "-output=PHYLIP"
will be added to the command line, and if you set it to 0, nothing is added to the command line here because
($value)
will evaluate to false.
The effect of including vparam.phylip_alig_=0
and not including any setting for phylip_alig, is the same.
CIPRES PISE XML documents supply default values using the vdef
element, so it is typically only necessary to
send fields where the default value is not correct for the run, even in cases where the parameter has an ismandatory
attribute.
Note that the PISE XML documents also contain all the information needed to generate a web form.
Many elements, such as prompt
, label
, comment
, issimple
etc. provide
information necessary for web form generation, but are not relevant to the CRA.
PISE <parameter> elements will have one of the following types:
InFile
- an input file. Every tool has one input file that is mandatory. This is
indicated with the attribute isinput=1
, and by convention is named "infile".
Other input files are optional, or are required only when certain other parameters are set
as specified in precond
or ctrl
elements.
Excl
is a single choice list, the selected value must be in the set of value
elements given in the
vlist
or flist
element.
List
is a multiple choice list. Allowed values are given in value
elements.
To send multiple values, use multiple form fields with the same name, e.g
-F vparam.hgapresidues_=G -F vparam.hgapresidues_=A
.
Switch
must be either "0" or "1".
Integer,
Float,
String
Results
specify which files will be returned when the job completes. Users have no direct control
over the naming of output files.
name
is the name of the parameter. Prefix it with "vparam." or "input." and suffix it with an underscore to use in the CRA.
vdef
gives the parameter's default value, if any.
ctrl
elements set constraints on values.
ctrls
, like preconds, have perl expressions. If any of a parameter's ctrl
elements evaluates to true,
the job submission will not pass validation.
precond
elements determine whether the parameter is enabled or disabled.
When a parameter is disabled, you may not include a value for it in the job submission.
A parameter is enabled when all of its precond elements evaluate to true.
format
- a perl expression that creates part of the command line.
paramfile
- when a paramfile
is present, CIPRES creates the file in the job's
working directory and sends the output from the corresponding format
to the named paramfile instead of the command line.
The user has no control over the creation or naming of these files, they are created and submitted on the back end.
Note: If you don't include a particular parameter
in your job submission,
and that parameter has a default value (i.e. a vdef
element), and the default value doesn't conflict
wth the preconds of any parameters you sent, then the CRA automatically adds that parameter, set to its default value,
to your submission. On the other hand, if the preconds do conflict, the parameter is not added. When a parameter isn't
present in a job submission, it will be skipped when the PISE XML file is processed. This means that its ctrl
and format
code snippets won't be evaluated.
This is an example of a parameter that specifies an additional input file. To include it in your
job submission you would use vparam.input.usetree_
, as we did in the example job submission
shown earlier. When CIPRES receives the input file contents, it disregards your original filename
for the data (i.e guidetree.dnd, in the job submission example), and stores the data in a file named "usetree.dnd" in the
job's working directory. The filename CIPRES uses is specified by the parameter's filenames
element.
When the format
code is executed, it adds "-usetree=usetree.dnd" to clustalw's
command line.
usetree File for old guide tree (-usetree) perl " -usetree=usetree.dnd"
2 You can give a previously computed tree (.dnd file) - on the same data perl ($actions =~ /align/ )
usetree.dnd
The following example is for GARLI v.2.0, taken from garli2_tgb.xml. It adds a setting to a configuration file, garli.conf,
that garli will read. The paramfile
element
is what tells CIPRES to direct the output of the format
element to a file named garli.conf instead
of to the command line. Each time CIPRES processes a parameter with a paramfile
element, it either creates
the specified file in the job's working directory (if it doesn't already exist) or adds text to it.
This parameter defines a choice list named d_statefrequencies
.
The allowable values are given by the vlist.value
elements and are equal
, empirical
,
estimate
, and fixed
. The default value is estimate
.
The precond
for this parameter dictates that it is only allowed when a second parameter,
datatype_value
, has the value nucleotide
.
The output of the perl code in the format
element will be directed to garli.conf,
thereby adding a "statefrequencies" setting to the file.
d_statefrequencies garli.conf Base Frequencies (statefrequencies) perl $datatype_value eq "nucleotide"
perl "statefrequencies = $value\\n"
equal empirical estimate fixed estimate garli.conf 2
Many tools include a file named triton_run_time.xml
that contains a definition of
the runtime
parameter. It looks like this:
runtime 1 scheduler.conf Maximum Hours to Run (click here for help setting this correctly) 1.0 Estimate the maximum time your job will need to run (up to 72 hrs). Your job will be killed if it doesn't finish within the time you specify, however jobs with shorter maximum run times are often scheduled sooner than longer jobs. Maximum Hours to Run must be between 0.1 - 72.0. perl $runtime < 0.1 || $runtime > 72.0
perl "runhours=$value\\n"
This defines a field named vparam.runtime_
. The ctrl
element
says that you are allowed to set values between .1 hrs and 72.0 hrs. The default value is 1 hr.
This definition is used by many tools that run on the TSCC
supercomputer. Tools that run
on XSEDE resources define runtime
differently, usually allowing up to 168 hours, with a
default of .5 hrs.
This parameter works by writing a line that looks like "runhours=1" (for example) to a file named scheduler.conf. CIPRES uses the information in scheduler.conf to limit the runtime as specified.
Some tools, notably BEAST and MrBayes allow or require users to configure most options in the main input file. This can greatly simplify the development of REST submissions since there will be little to configure via the API. Others, such as GARLI and RAxML have PISE XML files that contain a significant numbers of parameters, and require familiarity with the defaults and potential interaction between parameters. There are several strategies that can be useful in learning to configure a job.
Once you have a job submission ready, you can validate it by
POST'ing it to $URL/job/username/validate
instead of POST'ing it to $URL/job/username
.
CIPRES will validate the parameters but won't actually submit the job. If your submission is fine CIPRES will return a jobstatus
object with a commandline
element that shows the Linux command line that would be run if the job were submitted.
On the other hand if there are errors in the submission, CIPRES will return an error object
that explains the problems.