TeraGrid Symptoms/Warnings/Error Messages

This page is created to help disseminate information about warning messages/failure messages you might see in your output.  

java.io.IOException: java.io.IOException: Cannot allocate memory

 

If you submit a job, and it immediately shows: "Vew Error" instead of "View Status", you might see this message when you click the "View Error" button:

Fri Feb 19 14:16:40 PST 2010 > INITIALIZE : SUCCESS : NGBW-JOB-MRBAYESHYBRID_ABE-2932F25FE6EC4021B62497DA3BCD0AEE : Task successfully initialized. Fri Feb 19 14:16:40 PST 2010 > INPUTCHECK : SUCCESS : NGBW-JOB-MRBAYESHYBRID_ABE-2932F25FE6EC4021B62497DA3BCD0AEE : Input data successfully checked, data valid. Fri Feb 19 14:16:40 PST 2010 > COMMANDRENDERING : ERROR : NGBW-JOB-MRBAYESHYBRID_ABE-2932F25FE6EC4021B62497DA3BCD0AEE : java.io.IOException: java.io.IOException: Cannot allocate memory

It indicates that there are so many jobs running at this time that the application cannot accept any more. Please be patient, and re-submit the job in a few hours using the "Clone" button on the task pane.

 
Queue does not have enough per-user job slots. Job not submitted.  

You might get this message from the Lonestar "testing" queue (for jobs less than or equal to 0.5 hr):

Welcome to the 64-Bit Lonestar Linux Cluster
  --> Submitting 4 procs/node...
  --> Submitting exclusive job...
  --> Verifying HOME file-system availability...
  --> Requesting low paging rate...
  --> Requesting running sshd...
  --> Requesting valid file system(s)...
  --> Requesting valid interconnect...
  --> Validating project accounting... 
  --> Adding default memory limit (1925000 KB)

Queue does not have enough per-user job slots. Job not submitted.

This means the queue is incredibly busy. If you submit your job for a time longer than 0.5 hr it will go to the normal queue for processing. If the "testing" queue is this slammed, you can expect that the queue time will be longer than usual. Be patient, or shift to Abe if that's an option.


 
   
Mon Mar 15 08:24:28 PDT 2010 > PROCESSING : ERROR : NGBW-JOB-MRBAYESHYBRID_ABE-4DC0679382E34E12A63988D425B1B460 : Error processing job: 530-globus_xio: Authentication Error 530-OpenSSL Error: s3_srvr.c:2010: in library: SSL routines, function SSL3_GET_CLIENT_CERTIFICATE: no certificate returned 530-globus_gsi_callback_module: Could not verify credential 530-globus_gsi_callback_module: The certificate has expired: Credential with subject: /C=US/O=National Center for Supercomputing Applications/CN=Cipres Community User has expired. 530 End.   

After 7 days, the connection between the TeraGrid and CIPRES Science Gateway expires. There is no way to prevent this today. If you get this message, the connection expired. However, your job will complete, and we can recover your data. You just have to notify me, voa the bug tracker or at mmiller at sdsc dot edu. I will get your data to you.

 
   

Unable to write into your HOME file system (/home/01205/cipres)

You might get this message from a job submitted to Lonestar:

Welcome to the 64-Bit Lonestar Linux Cluster
  --> Submitting 4 procs/node...
  --> Submitting exclusive job...
  --> Verifying HOME file-system availability...

Unable to write into your HOME file system (/home/01205/cipres)
Please verify that you are not over your disk quota before
submitting subsequent jobs.

Please contact TACC Consulting if you have believe you have
received this message in error.

Request aborted by esub. Job not submitted.

It signifies that the Lonestar work area is out of disc space. Just send a report to us, and we will quickly resolve that issue, jobs on Abe will be unaffected. This condition can cause a job to fail to run without additional warning messages. It might return to you a message that the job has completed, but without producing any obvious results files. This problem should be rare, and limited to Lonestar.

 
   
Symptom: Job Completed message received, but No Output appears from a Job Submitted to LoneStar.  
On LONESTAR only, there is a rare condition where the Lonestar cipres disc runs out of space. It will cause your job to report it has completed, but there will be no results in the folder when you open it. Just send a report to us, and we will quickly resolve that issue.  
Return to Login Page  

If there is a tool or a feature you need, you can add it yourself or let us know.