Introduction

Register

Authenticate

List Jobs

Submit Jobs

Optional Metadata

Job Status

Is the Job Finished?

List Results

Download Results

List Working Directory

Download Working Dir Files

Delete and Cancel a Job

Handle Errors

Data Types

Tool API

Summary

Sample Code

Usage Limits

Umbrella Authentication

Request Headers

List Jobs

Submit a Job

Check Job Status

Other Operations


Appendix

Tool Specific Parameters

PISE XML

Ex1, A Basic PISE Parameter

PISE Parameter Types

PISE Parameter Elements

Ex2, Additional Input File

Ex3, A Parameter That Builds a Configuration File

Ex4, Shared Definition of Runtime Parameter

Strategies for Using PISE XML Files

How to Make a Test Run

CIPRES REST v1.1 User Guide

Introduction

The goal of the CIPRES REST API (CRA) is to allow users to access phylogenetic software supported by CIPRES outside the confines of a point and click browser interface. Unlike the CIPRES Science Gateway (CSG) website , which stores jobs and data indefinitely, the CRA is intended to be a convenient way to run phylogenetic programs on large HPC resources, but does not provide long term data storage. The CRA currently stores jobs for only 4 weeks. This time period is long enough to troubleshoot problems and to ensure that job results aren't lost, but organization and preservation of jobs and results from the CRA is the responsibility of the user.

To use the CRA, you must register as a user, and register any application(s) you wish to develop, as well. Instructions for registration are found below.

The base URL for the API is https://cipresrest.sdsc.edu/cipresrest/v1.

The examples in this guide use the unix curl command and assume you have registered with the CRA and have set the following environment variables:

For example, using the bash shell:

$ # Remember to replace "MyPassWord" and "insects-..." with your information
$ export URL=https://cipresrest.sdsc.edu/cipresrest/v1
$ export PASSWORD=MyPassWord
$ export KEY=insects-095D20923FAE439982B6D5EBD2E339C9

curl is of course just one of many ways to interact with a REST API. There are numerous java, php, perl, python, etc., libraries that make it easy to use REST services.

Register

To get started, sign in or register for a CIPRES REST account. Once you've signed in, you can visit "My Profile" to change your account information and password. To register an application, use the Application Management console, found under the "Developer" drop down menu.

TIP: CIPRES REST API accounts are separate from those used by the CSG website , so you'll still need to register to use the CRA even if you are already a CSG user.

When you register an application, you must choose between DIRECT and UMBRELLA authentication models.

DIRECT is the more common choice, and the choice you want if you wish to use the API from your application immediately. DIRECT authentication means that the username and password of the person running the application will be sent in HTTP basic authentication headers, and jobs will be submitted on behalf of the authenticated user only. If people other than you will be running your application, they will need to register for their own CRA accounts and provide their credentials to your application, so that your code can submit jobs for them.

UMBRELLA is a special case used by web applications that submit jobs on behalf of multiple registered users. Web applications that use UMBRELLA authentication also authenticate with a username and password, that of the person who registered the application. The UMBRELLA application provides the identity of the user that submitted a given job using custom request headers. As a result, users registered with an UMBRELLA application need not register with the CRA. Because UMBRELLA authentication involves a trust relationship (i.e. we are trusting you to accurately identify the individual who submits each request), we will need to talk to you before activating your UMBRELLA application to insure all of our requirements are met. If you are interested in registering an UMBRELLA application, please contact us.

The examples shown in this guide are for DIRECT applications, but with minor changes, they will also work for UMBRELLA Applications, as shown in UMBRELLA Authentication section.

Authenticate

The API requires you to send a username and password in HTTP Basic Authentication headers with each request. The use of SSL ensures that the information is transmitted securely.

In addition to sending a username and password, you must send your application ID in a custom request header named cipres-appkey.

List Jobs

Let's get started using the API. Suppose your username is tom, you've registered a DIRECT application named insects, and set URL, PASSWORD and KEY environment variables as shown in the Introduction. Here's how you would get a list of the jobs you've submitted:


$ curl -u tom:$PASSWORD \
-H cipres-appkey:$KEY\
$URL/job/tom

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<joblist>
    <title>Submitted Jobs</title>
    <jobs>
        <jobstatus>
            <selfUri>
                <url>$URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90</url>
                <rel>jobstatus</rel>
                <title>NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90</title>
            </selfUri>
        </jobstatus>
        <jobstatus>
            <selfUri>
                <url>$URL/v1/job/tom/NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4</url>
                <rel>jobstatus</rel>
                <title>NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4</title>
            </selfUri>
        </jobstatus>
    </jobs>
</joblist>

            

To get more information about a specific job in the list, use its jobstatus.selfUri.url. For example, to retrieve the full jobstatus of the first job in the list above:

$ curl -u tom:$PASSWORD \
    -H cipres-appkey:$KEY \
    $URL/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<jobstatus>
	. . .
</jobstatus>

            

Alternatively, when when you ask for the list of jobs, use the expand=true query parameter to request full jobstatus objects.

If you have a CIPRES REST account and have registered a DIRECT application, try getting your list of submitted jobs now. Since you haven't submitted any jobs yet, the list will be empty and will look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<joblist>
    <title>Submitted Jobs</title>
    <jobs/>
</joblist>

TIP: Throughout the API, XML elements named selfUri link to the full version of the containing object. All Uri elements, including selfUri, contain a url which gives the actual url, a rel which describes the type of data that the url returns and a title. It's good practice to navigate through the API by using the Uris the API returns instead of constructing urls to specific objects yourself.

Submit Jobs

Now that we know how to list jobs; let's consider job submission. You can submit a job by issuing a POST request to $URL/job/username with multipart/form-data. Remember to replace username with your username, or the username of the person running your application. Most tools can be run minimally using only two fields: a tool identifier and a file to be processed.

Below is an example of a simple job submission:

$ curl -u tom:$PASSWORD \
    -H cipres-appkey:$KEY \
    $URL/job/tom \
    -F tool=CLUSTALW \
    -F input.infile_=@./sample1_in.fasta 

            

In this example, the fields used are:

tool=CLUSTALW
The tool field identifies the tool to be used, in this case, CLUSTALW. Job submissions must always include a tool. You can find a list of phylogenetic programs and their respective tool IDs by using the Tool API.
 
input.infile_=@./sample1_in.fasta
The input.infile_ field is also mandatory; it identifies the main data file to be operated on. input.infile_ is usually a set of sequences to align or a character matrix. In this example, we're sending the contents of the file named sample1_in.fasta. The '@' tells curl to send sample1_in.fasta as an attached file.

A submission like this, with just a tool ID and input file, will succeed for most tools, and will cause the application to run a job with whatever defaults CIPRES has selected. You can try a CLUSTALW job this way if you like, using a sample input file. Of course, many job submissions will require configuration of command line options to non-default values, and (often) submission of auxiliary files that specify starting trees, constraints, etc.

There are four types of fields for configuring a job submission:

The set of available input. and vparam. fields varies with the tool selected and are defined in an XML document (the PISE document) for each tool. These documents can be retrieved from the list at Tools: How to Configure Specific Tools or through the Tool API . However, depending on how you intend to use the REST API, you may never need to look at a PISE document. The easiest way to learn how to configure CIPRES jobs is to use the interactive Tool Configuration Helper. If you're interested in knowing more about the PISE tool descriptions, consult Tool Specific Parameters in the appendix.

The metadata. fields are the same, regardless of the tool selected, and are described in the next section.

Note: Before you submit a job, you may want to check whether the request is composed correctly. How to Make a Test Run explains how to submit a job for validation before submitting it to run.

Optional Metadata

A job submission may include the following optional metadata fields:

metadata.clientJobId
Your application's unique ID for the job.We highly recommended that you use this field. You may encounter situations where it isn't clear whether or not a submission reached the CRA, in which case, the best thing to do is request a list of your jobs and see whether or not it includes one with the clientJobId you just tried to submit.
 
metadata.clientJobName
A name that the user of your application will recognize the job by.
 
metadata.clientToolName
A name that the user will recognize the tool by.
 
metadata.statusEmail
If "true", email will be sent on job completion. (Delivery, of course, depends upon an valid email address, and functioning delivery infrastructure).
 
metadata.emailAddress
Use this along with statusEmail to override the default email destination. By default, job completion emails are sent to the user's registered email address. (Or in the case of UMBRELLA applications, to the address in the cipres-eu-email header of the job submission request). Use this property to direct the email somewhere else.
 
metadata.statusUrlPut
Use this field to specify a URL in your web application where CIPRES will PUT a notification when the job is finished. CIPRES can't guarantee that your application will receive the PUT request so you may still need to poll occasionally. Not implemented yet.
 
metadata.statusUrlGet
Use this field to specify a URL in your web application that CIPRES will GET, with a jh=jobhandle query parameter, when the job is finished. CIPRES can't guarantee that your application will receive the request so you may still need to poll occasionally. Not implemented yet.

All metadata fields are limited to 100 characters, and all are optional. Metadata will be returned with the rest of the information about the job when you later ask for the job's status.

In the following example, Tom, uses some of the metadata fields described above to supply a job ID, generated by his application, and to request email notification of job completion.

$ curl -u tom:$PASSWORD \
	-H cipres-appkey:$KEY \
	$URL/job/tom \
	-F tool=CLUSTALW \
	-F input.infile_=@./sample1_in.fasta \
	-F metadata.clientJobId=101 \
	-F metadata.statusEmail=true
	
        

As noted above, many runs will be more complicated than this because of the need to configure the precise command line. We suggest that you continue through this guide to learn how to check job status, download results, and handle errors, and then use the Tool Configuration Helper and/or read Tool Specific Parameters in the Appendix to create more customized runs.

Job Status

Successful job submission returns a jobstatus object that looks like this:


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<jobstatus>
    <selfUri>
        <url>$URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90</url>
        <rel>jobstatus</rel>
        <title>NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90</title>
    </selfUri>
    <jobHandle>NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90</jobHandle>
    <jobStage>QUEUE</jobStage>
    <terminalStage>false</terminalStage>
    <failed>false</failed>
    <metadata>
        <entry>
            <key>clientJobId</key>
            <value>101</value>
        </entry>
    </metadata>
    <dateSubmitted>2014-09-10T15:54:58-07:00</dateSubmitted>
    <resultsUri>
        <url>$URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output</url>
        <rel>results</rel>
        <title>Job Results</title>
    </resultsUri>
    <workingDirUri>
        <url>$URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/workingdir</url>
        <rel>workingdir</rel>
        <title>Job Working Directory</title>
    </workingDirUri>
    <messages>
        <message>
            <timestamp>2014-09-10T15:54:59-07:00</timestamp>
            <stage>QUEUE</stage>
            <text>Added to cipres run queue.</text>
        </message>
    </messages>
    <minPollIntervalSeconds>60</minPollIntervalSeconds>
</jobstatus>

Elements of particular interest are:

jobHandle
Is a unique, CIPRES assigned, job identifier. It has the format: NGBW-JOB-toolID-unique_identifier
 
jobStage
Unfortunately, the current version of CIPRES sets jobstatus.jobStage in a way that's somewhat inconsistent and difficult to explain. You're better off using jobstatus.messages to monitor the progress of a job.
 
messages
CIPRES adds a message at each major processing point, as well as when problems are encountered. Each message has a timestamp, processing stage, and textual description. A job progresses through the following stages:
  • QUEUE - The job has been validated and placed in CIPRES's queue.
  • COMMANDRENDERING - The job has reached the head of the queue and CIPRES has created the command line that will be run.
  • INPUTSTAGING - CIPRES has created a temporary working directory for the job on the execution host and copied the input files over.
  • SUBMITTED - The job has been submited to the scheduler on the execution host.
  • LOAD_RESULTS - The job has finished running on the execution host and CIPRES has begun to transfer the results.
  • COMPLETED - Results successfully transferred and available.
terminalStage
If true, CIPRES has finished processing the job. If false, there is more to do.
 
failed
This will only be set to true only when the job is finished (i.e. terminalStage=true) and the job has failed. CIPRES has a narrow definition of failure that does not take the tool's output or exit code into consideration. A job will only have failed=true if a network or system error prevents the tool from being run or prevents CIPRES from being able to obtain the job's results.
 
minPollIntervalSeconds
If you poll for the status of this job, this is the minimum polling in interval (in seconds) that you may use.
 

The jobstatus also includes several urls:

selfUri
Use this to poll for updated job status.
 
workingDirUri
Use this to monitor the files in the job's working directory, while the job is running.
 
resultsUri
Use this to get the list of result files, once the job has finshed.
 

Is the Job Finished?

The job is finished when jobstatus.terminalStage=true. Use jobstatus.selfUri.url to check the status of the job, like this:

$ curl -u tom:$PASSWORD \
    -H cipres-appkey:$KEY \
    $URL/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90

Alternatively, you can check the status of multiple jobs in a single GET of endpoint $URL/job by using multiple instances of the jh=jobhandle query parameter. In this case the URL does not include the username (so that UMBRELLA applications can check on jobs for all their end users with a single query).

$ curl -u tom:$PASSWORD \
    -H cipres-appkey:$KEY\
    $URL/job/?jh=NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90\&jh=NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4

We ask users to keep polling frequency as low as possible to avoid overloading CIPRES: As a rule, jobstatus.minPollInterval specifies the shortest polling interval that you may use. However we encourage you to poll much less frequently when possible. For example, if you aren't returning intermediate results to your users and you submit a job with a maximum run time that's more than hour, please consider increasing the polling interval to 15 minutes. As an alternative to frequent polling, consider using metdata.statusEmail=true in your job submission so that CIPRES will email you when the job is finished. Showing courtesy here will allow us to avoid having to enforce hard limits.

If you poll for the status of multiple jobs in a single call, please use jobstatus.minPollInterval of the most recently submitted job as your minimum polling interval.

List Results

Once jobstatus.terminalStage=true, you can list and then retrieve the final results. Issue a GET request to the URL specified by jobstatus.resultsUri.url, like this:

$ curl -u tom:$PASSWORD \
    -H cipres-appkey:$KEY \
    $URL/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<results>
    <jobfiles>
        <jobfile>
            <downloadUri>
                <url>$URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1544</url>
                <rel>fileDownload</rel>
                <title>STDOUT</title>
            </downloadUri>
            <jobHandle>NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90</jobHandle>
            <filename>STDOUT</filename>
            <length>1243</length>
            <parameterName>PROCESS_OUTPUT</parameterName>
            <outputDocumentId>1544</outputDocumentId>
        </jobfile>
        <jobfile>
            <downloadUri>
                <url>$URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1545</url>
                <rel>fileDownload</rel>
                <title>STDERR</title>
            </downloadUri>
            <jobHandle>NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90</jobHandle>
            <filename>STDERR</filename>
            <length>0</length>
            <parameterName>PROCESS_OUTPUT</parameterName>
            <outputDocumentId>1545</outputDocumentId>
        </jobfile>
        <jobfile>
            <downloadUri>
                <url>$URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1550</url>
                <rel>fileDownload</rel>
                <title>infile.aln</title>
            </downloadUri>
            <jobHandle>NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90</jobHandle>
            <filename>infile.aln</filename>
            <length>1449</length>
            <parameterName>aligfile</parameterName>
            <outputDocumentId>1550</outputDocumentId>
        </jobfile>
        <jobfile>
            <downloadUri>
                <url>$URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1551</url>
                <rel>fileDownload</rel>
                <title>term.txt</title>
            </downloadUri>
            <jobHandle>NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90</jobHandle>
            <filename>term.txt</filename>
            <length>338</length>
            <parameterName>all_outputfiles</parameterName>
            <outputDocumentId>1551</outputDocumentId>
        </jobfile>
        <jobfile>
            <downloadUri>
                <url>$URL/v1/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1552</url>
                <rel>fileDownload</rel>
                <title>batch_command.cmdline</title>
            </downloadUri>
            <jobHandle>NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90</jobHandle>
            <filename>batch_command.cmdline</filename>
            <length>48</length>
            <parameterName>all_outputfiles</parameterName>
            <outputDocumentId>1552</outputDocumentId>
        </jobfile>
        <jobfile>

...
</jobfiles>
</results>

Download Results

Use the jobfile.downloadUri.url links to download individual result files, like this:

$ curl -u tom:$PASSWORD \
    -H cipres-appkey:$KEY \
    -O -J \
    $URL/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/output/1544

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1243    0  1243    0     0    178      0 --:--:--  0:00:06 --:--:--   313
curl: Saved to filename 'STDOUT'

List Working Directory

If you are interested in monitoring the progress of a job while it is running, you can use jobstatus.workingDirUri.url to retrieve the list of files in the job's working directory. The job only has a working directory after it has been staged to the execution host and is waiting to run, is running, or is waiting to be cleaned up. If you use this URL at other times, it will return an empty list. Furthermore, if you happen to use this URL while CIPRES is in the process of removing the working directory, you may receive a transient error. Because of this possibility, be prepared to retry the operation.

$ curl -u tom:$PASSWORD \
    -H cipres-appkey:$KEY \
    $URL/job/tom/NGBW-JOB-CLUSTALW-3957CC6EBF5E448095A5666B41EDDF90/workingdir

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workingdir>
    <jobfiles/>
</workingdir>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workingdir>
    <jobfiles>
        <jobfile>
            <downloadUri>
                <url>$URL/job/tom/NGBW-JOB-CLUSTALW-0171A3F1BFA0477CAF35B79CE075DF9C/workingdir/scheduler.conf</url>
                <rel>fileDownload</rel>
                <title>scheduler.conf</title>
            </downloadUri>
            <filename>scheduler.conf</filename>
            <length>11</length>
            <dateModified>2014-09-20T16:18:05-07:00</dateModified>
            <parameterName></parameterName>
            <outputDocumentId>0</outputDocumentId>
        </jobfile>
        <jobfile>
            <downloadUri>
                <url>$URL/job/tom/NGBW-JOB-CLUSTALW-0171A3F1BFA0477CAF35B79CE075DF9C/workingdir/infile.dnd</url>
                <rel>fileDownload</rel>
                <title>infile.dnd</title>
            </downloadUri>
            <filename>infile.dnd</filename>
            <length>137</length>
            <dateModified>2014-09-20T16:18:13-07:00</dateModified>
            <parameterName></parameterName>
            <outputDocumentId>0</outputDocumentId>
        </jobfile>
		. . .
    </jobfiles>
</workingdir>

Download Working Dir Files

To retrieve a file from the working directory list, use its jobfile.downloadUri.url. Be prepared to handle transient errors, as well as a permanent 404 NOT FOUND error, once the working directory has been removed.

$ curl -k -u tom:tom \
    -H cipres-appkey:$KEY \
    -O -J \ 
    $URL/job/tom/NGBW-JOB-CLUSTALW-0171A3F1BFA0477CAF35B79CE075DF9C/workingdir/infile.dnd

curl: Saved to filename 'infile.dnd'

Delete and Cancel a Job

Once a job has finished and you've downloaded the results, it's a good idea to delete the job. You may also want to delete a job that hasn't finished yet if you, or the user of your application, realize you made a mistake and don't want to waste the compute time.

$ curl -u tom:$PASSWORD \
    -H cipres-appkey:$KEY \
    -X DELETE \
    $URL/job/tom/NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4

There is no data returned from a successful DELETE.

If the job is scheduled to run or is running at the time you delete it, it will be cancelled. Either way, all info associated with the job will be removed. You can verify that the job has been deleted by doing a GET of its jobstatus url. Http status 404 (NOT FOUND) will be returned along with an error object. We demonstrate this below by using curl's -i option, which tells curl to include the http header in its output.

$ curl -i -u tom:$PASSWORD \
    -H cipres-appkey:$KEY \
    $URL/job/tom/NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4

HTTP/1.1 404 Not Found
Server: Apache-Coyote/1.1
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Thu, 11 Sep 2014 21:43:54 GM

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<error>
    <displayMessage>Job not found.</displayMessage>
    <message>Job Not Found Error: org.ngbw.sdk.jobs.JobNotFoundException: NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4</message>
    <code>4</code>
</error>

To cancel a running job without deleting it, you can send a PUT request with the action parameter set to 'cancel.'

$ curl -u tom:$PASSWORD \
    -H cipres-appkey:$KEY \
    -X PUT \
    -F action=cancel \
    $URL/job/tom/modify/NGBW-JOB-CLUSTALW-CC460782E5FF464CB96791B1E6053AA4

Handle Errors

Http status codes are used to indicate whether an API request succeeded or failed. When the http status indicates failure (with a status other than 200) an error object is returned. A basic error object looks like this:

<error>
    <displayMessage>Job Not Found</displayMessage>
    <message>Job Not Found Error: org.ngbw.sdk.jobs.JobNotFoundException: NGBW-JOB-CLUSTALW-261679BE83E245AD8EEECB4592A52B81
    </message>
    <code>4</code>
</error>

The displayMessage is a user friendly description of the error. The contents of the message are not meant for end users, but may be helpful in debugging. The code indicates the type of error, for example code = 4 is "not found", as shown in the source code for ErrorData.java

A job validation error may contain a list of field errors. For example:

$ curl -u tom:$PASSWORD \
    -H cipres-appkey:$KEY \
    $URL/job/tom \
    -F tool=CLUSTALW \
    -F metadata.clientJobId=110 \
    -F input.infile_=@./sample1_in.fasta \
    -F vparam.runtime_="one hour" \
    -F vparam.foo_=bar

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<error>
    <displayMessage>Form validation error.</displayMessage>
    <message>Validation Error: </message>
    <code>5</code>
    <paramError>
        <param>runtime_</param>
        <error>Must be a Double.</error>
    </paramError>
    <paramError>
        <param>foo_</param>
        <error>Does not exist.</error>
    </paramError>
</error>

Data Types

The XML documents or data structures returned by the API, such as jobstatus, results, jobfile, error, etc., are not fully documented yet, however the basic schema is available. You can also view the java source code for these data structures. CIPRES maps the java classes to XML using JAXB. If you happen to be implementing in java you may want to use the java source code, linked to above, with JAXB, to unmarshall the XML documents that the CRA returns.

We may find it necessary to add elements to the schema as time goes by but your application should continue to work provided it ignores any elements it doesn't recognize.

Tool API

The tool API provides information about the phylogenetic tools that can be run on CIPRES. It's public: no credentials and no special headers are required, so it's easy to use a browser or curl to explore it. You can use the Tool API to learn the IDs of the tools you're interested in running and to download their PISE XML descriptions.

Definition: Strictly speaking, a CIPRES tool is an interface for configuring command line job submissions. It is defined by a PISE XML document found in the Tool API. Each tool deploys jobs for a single phylogenetic program (e.g. CLUSTALW, MrBayes, RaXML, etc.). However, more than one tool may invoke the same program. For example, the RAxML program, is run by two tools, one that provides a simple "blackbox" interface (RAXMLHPC2BB), and one that exposes nearly all RAxML options (RAXMLHPC2_TGB).

Go to $URL/tool in the browser, or use curl, as shown below, to see a list of the available tools:

$ curl $URL/tool

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<tools>
<tool>
	<toolId>MRBAYES_321RESTARTBETA</toolId>
	<toolName>Tree Inference Using Bayesian Analysis - run on XSEDE</toolName>
	<selfUri>
		<url>$URL/tool/MRBAYES_321RESTARTBETA</url>
		<rel>tool</rel>
		<title>MRBAYES_321RESTARTBETA</title>
	</selfUri>
	<piseUri>
		<url>$URL/tool/MRBAYES_321RESTARTBETA/doc/pise</url>
		<rel>Pise XML</rel>
		<title>MRBAYES_321RESTARTBETA pise</title>
	</piseUri>
	<portal2Uri>
		<url>$URL1/tool/MRBAYES_321RESTARTBETA/doc/portal2</url>
		<rel>Html Web Page</rel>
		<title>MRBAYES_321RESTARTBETA type</title>
	</portal2Uri>
	<exampleUri>
		<url>$URL/tool/MRBAYES_321RESTARTBETA/doc/example</url>
		<rel>Html Web Page</rel>
		<title>MRBAYES_321RESTARTBETA type</title>
	</exampleUri>
	<parameterUri>
		<url>$URL/tool/MRBAYES_321RESTARTBETA/doc/param</url>
		<rel>Html Web Page</rel>
		<title>MRBAYES_321RESTARTBETA type</title>
	</parameterUri>
</tool>
<tool>
	<toolId>PROBCONS</toolId>
	<toolName>Probabilistic Consistency-based Multiple Alignment of Amino/Nucleic Acid Sequences</toolName>
	<selfUri>
		<url>$URL/tool/PROBCONS</url>
		<rel>tool</rel>
		<title>PROBCONS</title>
	</selfUri>
	. . .
</tools>

Each tool description includes the toolId, toolName, and a number of "Uri" elements, which are links to various documents for the specific tool.

As we mentioned earlier, it's good practice to navigate through the API using these returned links rather than hardcoding the urls. For example, all the urls in the table below can all be extracted from the data returned by the top level resource at $URL/tool.

Summary

GET $URL/tool Use this to get a list of the available tools.
GET $URL/tool/toolId Use this to get the URLs that link to the tool's documents (ie. the documents listed below).
GET $URL/tool/toolId/doc/pise Uset this URL to download the tool's PISE XML file.
GET $URL/tool/toolId/doc/portal2 Use this URL, in a browser, to read a detailed description of the tool. This URL returns http status 303 and a Location header that redirects to the html page on the CIPRES Science Gateway that gives a detailed, human readable, description of the tool.
GET $URL/tool/toolId/doc/example Not implemented yet. Will give examples showing how to submit jobs to use this tool.
GET $URL/tool/toolId/doc/param Not implemented yet. Will give a human readable description of each of the tool's parameters.

Sample Code

Perl Example (View)

This directory contains a perl script that makes use of libwww-perl to access the CRA. It repeatedly prompts the user to retrieve a list of supported tools, submit a job, show the user's jobs, show a job's results or download a job's results.

To download the perl sample code run

$ svn export https://svn.sdsc.edu/repo/scigap/tags/rest-R10.224/rest_client_examples/examples/perl_demo
Java Client Library and Example (View)

This directory contains source code for a client library for communicating with the CRA and a sample program that uses the client library. When you build with maven, as explained in the cipres_java_client directory's Readme.txt, javadoc documentation and a jar named cipres_java_client.jar are created. The best way to learn to use the library is to study the example included with the source code, org.ngbw.directclient.example.Example.java .

You must get the source for restdatatypes and build the restdatatypes.jar before building cipres_java_client:

$ svn export https://svn.sdsc.edu/repo/scigap/tags/rest-R10.224/rest/datatypes
$ svn export https://svn.sdsc.edu/repo/scigap/tags/rest-R10.224/rest/cipres_java_client 
$ cd datatypes; mvn clean install
$ cd ../cipres_java_client; mvn clean install 
Python Client Library and Example (View)

This example shows how to communicate with the CRA from python, using the Requests package. The code was originally created as an example but has been developed into a full fledged client library. You can install the library by using svn to export the code from the link above or you can install it by running "pip install python_cipres".

The package includes two command line scripts, tooltest.py and cipresjob.py, that serve as good examples of how to import and use python_cipres. The main api is in client.py. To use the package, import client.py with a statement like import python_cipres.client as CipresClient.

Usage Limits

CIPRES has the following per user limits:

CONCURRENT_LIMIT
The number of concurrent REST API requests.
XSEDE_SU_LIMIT
The number of XSEDE SUs, where 1 SU = 1 hour of computing time on one XSEDE CPU, per year, reset on July 1st.
OTHER_SU_LIMIT
The number of SUs on non-xsede resources such as UCSD's TSCC cluster, per year, reset on July 1st.
SUBMITTED_TODAY_LIMIT
The number of jobs a user may submit in a single day.
ACTIVE_LIMIT
The number of active jobs allowed, where an active job is any job that isn't fully completed (i.e. jobstatus.terminalStage is still false). This includes jobs that are queued, running, or awaiting cleanup.

When a request is rejected due to a usage limit, the http status will be 429 (Too Many Requests). The error.code will be 103, which is the CIPRES generic "UsageLimit" error code. The error will contain a nested limitStatus element which has type and ceiling fields.

<error>
    <displayMessage>Too many active jobs.  Limit is 1</displayMessage>
    <message>org.ngbw.sdk.UsageLimitException: Too many active jobs.  Limit is 1</message>
    <code>103</code>
    <limitStatus>
        <type>active_limit</type>
        <ceiling>1</ceiling>
    </limitStatus>
</error>

Currently, the limits are: concurrent_limit=10, active_limit=50, other_su_limit=30,000 and xsede_su_limit=30,000. These limits can be modified for specific applications and users. If you have a problem with the default limits, please contact us to discuss your needs.

A future release of the REST API will

Umbrella Authentication

Applications that are registered to use Umbrella authentication can use the commands in this guide, with additional request headers that identify the end user. Behind the scenes, CIPRES creates an account for the end user with a username of the form application_name.cipres_eu_header and it is this qualified username that goes in the URLs.

Request Headers

All requests to the job API use request headers. Which headers are required depends on the type of authentication the application uses, as noted below. "ALWAYS" means the header is always required, regardless of the type of authentication being used. "UMBRELLA" means that the header is required by applications that are registered to use umbrella authentication.
Basic authentication credentials ALWAYS DIRECT applications send the user's CIPRES REST username and password. UMBRELLA applications send the username and password of the person who registered the application. See Authentication.
cipres-appkey ALWAYS Application ID generated by CIPRES when you registered the application. It can be changed later if necessary.
cipres-eu UMBRELLA Uniquely identifies the user within your application. Up to 200 characters. Single quotes are not allowed within the name.
cipres-eu-email UMBRELLA End user's email address. Up to 200 characters. You can't have 2 users with the same email address.
cipres-eu-institution UMBRELLA End user's home institution. This isn't currently required but may be in the future.
cipres-eu-country UMBRELLA Two letter, upper case, ISO 3166 country code for the end user's institution. Due to the way CIPRES is funded, US researchers can be given higher SU limits. The header isn't currently required, but may be in the future. If not sent, the user is assumed to be from outside the US for usage limit cutoffs.

List Jobs

For example, suppose your username is mary and you're integrating an existing web application with the CIPRES REST API. You've registered the application with the name phylobank and set the authentication method to UMBRELLA.

Now suppose a user named harry logs into your application and your application needs to get a list of jobs that harry has submitted to CIPRES. First, you go to your database or user management component and retrieve harry's email address, institutional affiliation, and optional ISO 3166 2 letter country code. Now you're ready to issue this curl command (or the equivalent statement in the language you're using):

$ curl -i -u mary:password \
    -H cipres-appkey:$KEY \
    -H cipres-eu:harry \
    -H cipres-eu-email:harry@ucsddd.edu \
    -H cipres-eu-institution:UCSD \
    -H cipres-eu-country:US \
    $URL/job/phylobank.harry

Notice that although the value of the cipres-eu header is harry, in the URL, you must use phylobank.harry.

Submit a Job

Here you submit a basic clustalw job for harry and get back a jobstatus object.

$ curl -u mary:password \
    -H cipres-appkey:$KEY \
    -H cipres-eu:harry \
    -H cipres-eu-email:harry@ucsddd.edu \
    -H cipres-eu-institution:UCSD \
    -H cipres-eu-country:US \
    $URL/job/phylobank.harry\
    -F tool=CLUSTALW \
    -F input.infile_=@./sample1_in.fasta \

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<jobstatus>
    <selfUri>
        <url>$URL/cipresrest/v1/job/phylobank.harry/NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A</url>
        <rel>jobstatus</rel>
        <title>NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A</title>
    </selfUri>
    <jobHandle>NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A</jobHandle>
    <jobStage>QUEUE</jobStage>
    <terminalStage>false</terminalStage>
    <failed>false</failed>
    <metadata>
        <entry>
            <key>clientJobId</key>
            <value>010007AQ</value>
        </entry>
    </metadata>
    <dateSubmitted>2014-09-12T12:36:31-07:00</dateSubmitted>
    <resultsUri>
        <url>$URL/cipresrest/v1/job/phylobank.harry/NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A/output</url>
        <rel>results</rel>
        <title>Job Results</title>
    </resultsUri>
    <workingDirUri>
        <url>$URL/cipresrest/v1/job/phylobank.harry/NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A/workingdir</url>
        <rel>workingdir</rel>
        <title>Job Working Directory</title>
    </workingDirUri>
    <messages>
        <message>
            <timestamp>2014-09-12T12:36:31-07:00</timestamp>
            <stage>QUEUE</stage>
            <text>Added to cipres run queue.</text>
        </message>
    </messages>
    <minPollIntervalSeconds>60</minPollIntervalSeconds>
</jobstatus>

Check Job Status

You can check the status of a single job, using the jobstatus.selfUri.url that was returned when the job was submitted, like this:

$ curl -u mary:mary \
    -H cipres-appkey:$KEY \
    -H cipres-eu:harry \
    -H cipres-eu-email:harry@ucsddd.edu \
    -H cipres-eu-institution:UCSD \
    -H cipres-eu-country:US \
    $URL/job/phylobank.harry/NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A
 

or you can get the status of multiple jobs, submitted on behalf of multiple users with a single GET of $URL/job. Indicate which jobs you're interested in with query parameters named jh (for "job handle"). Use separate jh parameters for each job. With this request, the cipres-appkey header is required, but end user headers are not. For example:

$ curl -u mary:mary \
    -H cipres-appkey:$KEY \
    $URL/job/?jh=NGBW-JOB-CLUSTALW-CB8D053F9033487E9B4F9BAF8A3AA47A\&jh=NGBW-JOB-CLUSTALW-553D534D355C4631BBDCF217BB792A01

If you're using curl in a typical unix shell, you must place a backslash before the & that separates the query parameters to escape it from interpretation by the shell.

Other Operations

The other things you may need to do are 1) retrieve files from a job's working directory while it's running, 2) retrieve final results once a job has finished, 3) cancel and/or delete a job. The DIRECT application examples in this guide are applicable to UMBRELLA applications too. Just remember to add the appropriate CIPRES end user headers and prefix the username in the URL with the the application name and a period.

Appendix


Tool Specific Parameters

The code for creating command lines and configuring jobs in CIPRES evolved from the Pasteur Institute Software Environment (PISE). PISE is an XML-based standard for generation of web forms that create Unix command lines. For complex phylogenetic programs, there is often significant interdependence of parameters. That is, some options are relevant only if others are selected, some combinations may give non-sensical results, and so forth. The PISE XML documents are rich, and embody all of the information required to create successful, meaningful command lines. Where possible, they also prevent creation of incorrect commands that would cause an immediate error. Thus, the PISE XML documents are the definitive reference for configuring CRA job submissions. The PISE XML documents for all the CRA tools are available through the Tools: How to Configure Specific Tools list or through the Tool API . (Please see the Tool API section, for the precise definition of a CIPRES "tool").

The relationship between the "input" and "vparam" fields that you send to the REST API and the PISE XML document is as follows:

Continuing with the job submission example used earlier in this guide, here's how Tom could submit a CLUSTALW job that uses a guidetree, produces phylip output and has a limited maximum run time:

$ curl -u tom:$PASSWORD \
    -H cipres-appkey:$KEY \
    $URL/job/tom \
    -F tool=CLUSTALW \
    -F metadata.clientJobId=102   \
    -F metadata.statusEmail=true \
    -F input.infile_=@./sample1_in.fasta \
    -F input.usetree_@./guidetree.dnd \
    -F vparam.runtime_=1 \
    -F vparam.phylip_alig_=1
The fields, that Tom added are:

-F input.usetree_=@/.guidetree.dnd
input.usetree causes CIPRES to add a -usetree option to the CLUSTALW command line. This tells CLUSTALW to use the specified file as a guide tree for the alignment. usetree is the name of a parameter of type InFile, in CLUSTALW 's PISE XML document, clustalw.xml.
 
-F vparam.runtime_=1
Configures a maximum run time of 1 hour, using the runtime parameter, found in clustalw.xml. By convention, runtime is found in every tool's PISE XML file. If not specified in the job submission, maximum run time would have been set to the default value specified in the PISE XML file, typically 0.5 h.
 
When you look at the PISE XML for some tools, you may not see a <parameter> named runtime, but will see an entity that includes a shared definition of the parameter, like this:
<ENTITY runtime SYSTEM "http://www.phylo.org/dev/rami/XMLDIR/triton_run_time.xml">
The exact contents of triton_run_time.xml are shown in Example 4, Shared Definition of Runtime Parameter.
 
-F vparam.phylip_alig_=1
The phylip_alig parameter, set to 1 means that CLUSTALW should be run with the -output=PHYLIP command line option. This is defined in clustalw.xml, as explained in PISE Example 1 below.
 

Note: In general, only parameters that differ from the defaults specified in the tool's PISE file, need to be specified in the job submission.

Note: With mutually exclusive parameters, you must only send the one that is enabled. In other words, don't send the disabled one with a value of zero, just send the enabled one.

PISE XML FORMAT

Each tool's PISE XML document contains a collection of parameter elements. Most parameter elements correspond to fields you can use to configure a job submission. Those marked with ishidden="1" and those of type "Results" or "OutFile" are the exceptions. It is easiest to explain the XML format through a set of examples:

Ex1, A basic PISE Parameter

A parameter usually defines a single command line flag or input file. Here is an example from clustalw.xml.

<parameter issimple="1" type="Switch">
    <name>phylip_alig</name>
    <attributes>
        <prompt>Phylip alignment output format (-output)</prompt>
        <format>
            <language>perl</language>
            <code> ($value)?" -output=PHYLIP":""</code>
        </format>
        <vdef>
            <value>0</value>
        </vdef>
        <group>2</group> 
    </attributes>
</parameter>

Each parameter has a name (in this case, "phylip_alig") and a type. In this case, the type is "Switch", which means the allowed values are "0" and "1". To use a parameter from the PISE XML in a CRA job submission, prefix the parameter name with "vparam" (or with "input", if the type=InFile) and add a trailing underscore. So to use this parameter you'd send vparam.phylip_alig_="1" or vparam.phylip_alig_="0". If you send this parameter, regardless of whether you set it to 0 or 1, the perl expression in the format element will be run. Thus if you set it to 1, "-output=PHYLIP" will be added to the command line, and if you set it to 0, nothing is added to the command line here because ($value) will evaluate to false. The effect of including vparam.phylip_alig_=0 and not including any setting for phylip_alig, is the same.

CIPRES PISE XML documents supply default values using the vdef element, so it is typically only necessary to send fields where the default value is not correct for the run, even in cases where the parameter has an ismandatory attribute.

Note that the PISE XML documents also contain all the information needed to generate a web form. Many elements, such as prompt, label, comment, issimple etc. provide information necessary for web form generation, but are not relevant to the CRA.

PISE Parameter Types

PISE <parameter> elements will have one of the following types:

PISE Parameter Elements

Note: If you don't include a particular parameter in your job submission, and that parameter has a default value (i.e. a vdef element), and the default value doesn't conflict wth the preconds of any parameters you sent, then the CRA automatically adds that parameter, set to its default value, to your submission. On the other hand, if the preconds do conflict, the parameter is not added. When a parameter isn't present in a job submission, it will be skipped when the PISE XML file is processed. This means that its ctrl and format code snippets won't be evaluated.

Ex2, Additional Input File

This is an example of a parameter that specifies an additional input file. To include it in your job submission you would use vparam.input.usetree_, as we did in the example job submission shown earlier. When CIPRES receives the input file contents, it disregards your original filename for the data (i.e guidetree.dnd, in the job submission example), and stores the data in a file named "usetree.dnd" in the job's working directory. The filename CIPRES uses is specified by the parameter's filenames element. When the format code is executed, it adds "-usetree=usetree.dnd" to clustalw's command line.

<parameter type="InFile">
	<name>usetree</name>
	<attributes>
		<prompt>File for old guide tree (-usetree)</prompt>
		<format>
			<language>perl</language>
			<code> " -usetree=usetree.dnd"</code>
		</format>
		<group>2</group>
		<comment><value>You can give a previously computed tree (.dnd file) - on the same data</value></comment>
		<precond>
			<language>perl</language>
			<code>($actions =~ /align/ )</code>
		</precond>
		<filenames>usetree.dnd</filenames>
	</attributes>
</parameter>

Ex3, A Parameter That Builds a Configuration File

The following example is for GARLI v.2.0, taken from garli2_tgb.xml. It adds a setting to a configuration file, garli.conf, that garli will read. The paramfile element is what tells CIPRES to direct the output of the format element to a file named garli.conf instead of to the command line. Each time CIPRES processes a parameter with a paramfile element, it either creates the specified file in the job's working directory (if it doesn't already exist) or adds text to it.

This parameter defines a choice list named d_statefrequencies. The allowable values are given by the vlist.value elements and are equal, empirical, estimate, and fixed. The default value is estimate. The precond for this parameter dictates that it is only allowed when a second parameter, datatype_value, has the value nucleotide. The output of the perl code in the format element will be directed to garli.conf, thereby adding a "statefrequencies" setting to the file.

<parameter type="Excl"  ismandatory="1">
    <name>d_statefrequencies</name>
    <attributes>
        <paramfile>garli.conf</paramfile>
        <prompt>Base Frequencies (statefrequencies)</prompt>
        <precond>
            <language>perl</language>
            <code>$datatype_value eq "nucleotide"</code>
        </precond>
        <format>
            <language>perl</language>
            <code>"statefrequencies = $value\\n" </code>
        </format>
        <vlist>
            <value>equal</value>
            <label>equal</label>
            <value>empirical</value>
            <label>empirical</label>
            <value>estimate</value>
            <label>estimate</label>
            <value>fixed</value>
            <label>fixed</label>
        </vlist>
        <vdef>
            <value>estimate</value>
        </vdef>
        <paramfile>garli.conf</paramfile>
        <group>2</group>
    </attributes>
</parameter>

    

Ex4, Shared Definition of Runtime Parameter

Many tools include a file named triton_run_time.xml that contains a definition of the runtime parameter. It looks like this:


<parameter type="Float" issimple="1" ismandatory="1">
	<name>runtime</name>
	<attributes>
		<group>1</group>
		<paramfile>scheduler.conf</paramfile>
		<prompt>Maximum Hours to Run (click here for help setting this correctly)</prompt>
		<vdef><value>1.0</value></vdef>
		<comment>
			<value>
				Estimate the maximum time your job will need to run (up to 72 hrs).  
				Your job will be killed if it doesn't finish within the time you specify, however jobs with shorter
				maximum run times are often scheduled sooner than longer jobs. 
			</value>
		</comment>
		<ctrls>
			<ctrl>
				<message>Maximum Hours to Run must be between 0.1 - 72.0.</message>
				<language>perl</language>
				<code>$runtime &lt; 0.1 || $runtime &gt; 72.0</code>
			</ctrl>
		</ctrls>
		<format>
			<language>perl</language>
			<code>"runhours=$value\\n"</code>
		</format>
	</attributes>
</parameter>

This defines a field named vparam.runtime_. The ctrl element says that you are allowed to set values between .1 hrs and 72.0 hrs. The default value is 1 hr. This definition is used by many tools that run on the TSCC supercomputer. Tools that run on XSEDE resources define runtime differently, usually allowing up to 168 hours, with a default of .5 hrs.

This parameter works by writing a line that looks like "runhours=1" (for example) to a file named scheduler.conf. CIPRES uses the information in scheduler.conf to limit the runtime as specified.

Strategies for Using PISE XML Files

Some tools, notably BEAST and MrBayes allow or require users to configure most options in the main input file. This can greatly simplify the development of REST submissions since there will be little to configure via the API. Others, such as GARLI and RAxML have PISE XML files that contain a significant numbers of parameters, and require familiarity with the defaults and potential interaction between parameters. There are several strategies that can be useful in learning to configure a job.

  1. Use the interactive Tool Configuration Helper, a web page that displays a list of tool IDs. When you select a tool from the list, a form, generated from the tool's PISE XML, appears. In the form, configure the tool as desired (explanatory text is available for most options). You'll be able to see the default values, explore the interdependency of the fields, and the logical organization of the form. When you press "View", you'll to see the list of "input." and "vparam." fields that correspond to the way you've configured the job in the GUI.
  2. Read the CIPRES tool documentation: find the tool in Tools: How to Configure Specific Tools. The "REST Tool Info" links on that page explain how to use the tool specific parameters to configure different types of analysis. We are still writing these pages so you may find that there is documentation for some tools but not for others. Please let us know which tools you need documented.
  3. Download and examine the PISE XML file of interest, and identify the elements that control the command line flags you are interested in using. The utility of this strategy will depend on the complexity of the interface (in terms of preconds and ctrls) and how many non-default values you need to use. This strategy should work fine for simpler PISE XML documents. You can download PISE XML Files from the list in Tools: How to Configure Specific Tools or through the Tool API .
Please contact us if you have questions about how to configure specific tools.

How to Make a Test Run

Once you have a job submission ready, you can validate it by POST'ing it to $URL/job/username/validate instead of POST'ing it to $URL/job/username. CIPRES will validate the parameters but won't actually submit the job. If your submission is fine CIPRES will return a jobstatus object with a commandline element that shows the Linux command line that would be run if the job were submitted. On the other hand if there are errors in the submission, CIPRES will return an error object that explains the problems.