Cloudman rebooting/resizing not working

Question: Cloudman rebooting/resizing not working

2.4 years ago by

United States

Hello,

I am currently trying to reboot a Galaxy Cloudman instance in order to retrieve some data.. It, however, will not restart. The Galaxy main volume is completely full and I think that might be the issue. I tried resizing it, but using the "Grow" option only creates a snapshot of the volume and then proceed to stay with that "Volume Manipulation" window. I tried rebooting it again, and it shows the same thing (won't restart the instance and it won't grow).

The log shows this:

17:57:42 - Existing worker instance 'i-2f46dbbf' found alive (will configure it later). 17:57:54 - Rebooting instance 'i-2f46dbbf; 54.242.105.11; w1'. 17:57:54 - Completed the initial cluster startup process. Configuring a previously existing cluster of type Galaxy. 17:58:04 - Adding volume vol-6c0687c8 (galaxyIndices FS)... 17:58:26 - Supervisor service prerequisites OK; starting the service. 17:58:26 - Adding volume vol-0aa2b8f7 (galaxy FS)... 17:58:28 - Migration service prerequisites OK; starting the service. 17:58:33 - Postgres service prerequisites OK; starting the service. 17:58:34 - ---> PROBLEM, running command '/bin/su - postgres -c "/usr/lib/postgresql/9.3/bin/pg_ctl -w -D /mnt/galaxy/db -l /tmp/pgSQL.log -o\"-p 5930\" start"' returned code '1', the following stderr: 'pg_ctl: could not start server Examine the log output. ' and stdout: 'waiting for server to start.... stopped waiting ' 17:58:34 - Slurmctld service prerequisites OK; starting the service. 17:58:34 - NodeJSProxy service prerequisites OK; starting the service. 17:58:45 - Slurmd service prerequisites OK; starting the service. 18:00:51 - Instance 'i-2f46dbbf; 54.242.105.11; w1' reported alive 18:01:11 - Could not get a handle on job manager service to add node 'i-2f46dbbf; 54.242.105.11; w1' 18:01:11 - Waiting on worker instance 'i-2f46dbbf; 54.242.105.11; w1' to configure itself. 18:01:22 - Instance 'i-2f46dbbf; 54.242.105.11; w1' ready

The PROBLEM line, comes from the "Postgres" service - if I go to admin, its status says "starting" but the log shows:

FATAL: could not access status of transaction 0 DETAIL: Could not write to file "pg_notify/0000" at offset 0: No space left on device.

Again maybe signifying a volume too full. I've tried as many things as I can think of, but just really not sure what to do. I'd be happy to just at least access the data, even without a functional Galaxy instance, which I can just remake a new one.

Thank you for your help, Diego

admin cloud galaxy cloudman • 793 views

ADD COMMENT • link •

modified 2.4 years ago by Enis Afgan • 690 • written 2.4 years ago by diegofbuenaventura • 0

2.4 years ago by

Enis Afgan • 690

United States

Enis Afgan • 690 wrote:

Hi Diego,

Perhaps the easiest thing would be to just access the instance via command line and manually delete a few files to free up enough space for everything to start up. Instructions on accessing the instance via command line are available on the wiki: https://wiki.galaxyproject.org/CloudMan/FAQ#Command_line_access. Once you're logged in, remove the Galaxy log file for example; you can use the following command to do so: sudo rm /mnt/galaxy/galaxy-app/main.log. Beyond that file, the easiest thing might be to remove a few initial datasets that you're likely to not need to retrieve as they were likely uploaded. Those will be located in /mnt/galaxy/files/files/000/dataset_###.dat so either just remove them or move them to /mnt/transient_nfs/ directory so you can then place them back once the system has started. Once logged in, you can check the state of the galaxy file system using command df -h and just keep removing files until there's a few hundred megabytes available. Then restart the cluster again.

Post back here if this does not work and we'll explore other options.

ADD COMMENT • link modified 2.4 years ago • written 2.4 years ago by Enis Afgan • 690

This worked! I deleted an old dataset and it was able to boot up normally, as expected.

Thank you very much for your help and for the really quick response! Much appreciated.

Diego

ADD REPLY • link written 2.4 years ago by diegofbuenaventura • 0

Similar posts • Search »