Question: Flagstat Crashes In Non-Linear Workflows With Torque
0
Andrew Warren • 10 wrote:
I was getting the same behavior as you on asynchronous workflows on a
multicore computer that is acting as both head and compute node for
the
torque system. Even after recompiling with a higher NCONNECTS I was
getting
the same error. I suspect that this is due to galaxy opening up
multiple
connections to check the status of currently running jobs. Because
there can
be many status checks in an asynchronous workflow the pbs system is
randomly
busy depending on when the job submission comes in. To deal with this
I
modified the lib/galaxy/jobs/runners/pbs.py script to make multiple
attempts
at submitting in the following way:
@@ -286,6 +286,12 @@ class PBSJobRunner( BaseJobRunner ):
log.debug("(%s) submitting file %s" % ( galaxy_job_id,
job_file ) )
log.debug("(%s) command is: %s" % ( galaxy_job_id,
command_line ) )
job_id = pbs.pbs_submit(c, job_attrs, job_file,
pbs_queue_name,
None)
+ ##Modified to give ten tries for qsubbing a job
+ num_try=0
+ while(not job_id and num_try<10):
+ job_id = pbs.pbs_submit(c, job_attrs, job_file,
pbs_queue_name, None)
+ num_try+=1
+
pbs.pbs_disconnect(c)
# check to see if it submitted
I haven't had any problems since.
Cheers,
Andrew