Question: Cloudman rebooting/resizing not working
0
gravatar for diegofbuenaventura
2.4 years ago by
United States
diegofbuenaventura0 wrote:

Hello,

I am currently trying to reboot a Galaxy Cloudman instance in order to retrieve some data.. It, however, will not restart. The Galaxy main volume is completely full and I think that might be the issue. I tried resizing it, but using the "Grow" option only creates a snapshot of the volume and then proceed to stay with that "Volume Manipulation" window. I tried rebooting it again, and it shows the same thing (won't restart the instance and it won't grow).

The log shows this:

17:57:42 - Existing worker instance 'i-2f46dbbf' found alive (will configure it later). 17:57:54 - Rebooting instance 'i-2f46dbbf; 54.242.105.11; w1'. 17:57:54 - Completed the initial cluster startup process. Configuring a previously existing cluster of type Galaxy. 17:58:04 - Adding volume vol-6c0687c8 (galaxyIndices FS)... 17:58:26 - Supervisor service prerequisites OK; starting the service. 17:58:26 - Adding volume vol-0aa2b8f7 (galaxy FS)... 17:58:28 - Migration service prerequisites OK; starting the service. 17:58:33 - Postgres service prerequisites OK; starting the service. 17:58:34 - ---> PROBLEM, running command '/bin/su - postgres -c "/usr/lib/postgresql/9.3/bin/pg_ctl -w -D /mnt/galaxy/db -l /tmp/pgSQL.log -o\"-p 5930\" start"' returned code '1', the following stderr: 'pg_ctl: could not start server Examine the log output. ' and stdout: 'waiting for server to start.... stopped waiting ' 17:58:34 - Slurmctld service prerequisites OK; starting the service. 17:58:34 - NodeJSProxy service prerequisites OK; starting the service. 17:58:45 - Slurmd service prerequisites OK; starting the service. 18:00:51 - Instance 'i-2f46dbbf; 54.242.105.11; w1' reported alive 18:01:11 - Could not get a handle on job manager service to add node 'i-2f46dbbf; 54.242.105.11; w1' 18:01:11 - Waiting on worker instance 'i-2f46dbbf; 54.242.105.11; w1' to configure itself. 18:01:22 - Instance 'i-2f46dbbf; 54.242.105.11; w1' ready

The PROBLEM line, comes from the "Postgres" service - if I go to admin, its status says "starting" but the log shows:

FATAL: could not access status of transaction 0 DETAIL: Could not write to file "pg_notify/0000" at offset 0: No space left on device.

Again maybe signifying a volume too full. I've tried as many things as I can think of, but just really not sure what to do. I'd be happy to just at least access the data, even without a functional Galaxy instance, which I can just remake a new one.

Thank you for your help, Diego

admin cloud galaxy cloudman • 793 views
ADD COMMENTlink modified 2.4 years ago by Enis Afgan690 • written 2.4 years ago by diegofbuenaventura0
3
gravatar for Enis Afgan
2.4 years ago by
Enis Afgan690
United States
Enis Afgan690 wrote:

Hi Diego,

Perhaps the easiest thing would be to just access the instance via command line and manually delete a few files to free up enough space for everything to start up. Instructions on accessing the instance via command line are available on the wiki: https://wiki.galaxyproject.org/CloudMan/FAQ#Command_line_access. Once you're logged in, remove the Galaxy log file for example; you can use the following command to do so: sudo rm /mnt/galaxy/galaxy-app/main.log. Beyond that file, the easiest thing might be to remove a few initial datasets that you're likely to not need to retrieve as they were likely uploaded. Those will be located in /mnt/galaxy/files/files/000/dataset_###.dat so either just remove them or move them to /mnt/transient_nfs/ directory so you can then place them back once the system has started. Once logged in, you can check the state of the galaxy file system using command df -h and just keep removing files until there's a few hundred megabytes available. Then restart the cluster again.

Post back here if this does not work and we'll explore other options.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Enis Afgan690

This worked! I deleted an old dataset and it was able to boot up normally, as expected.

Thank you very much for your help and for the really quick response! Much appreciated.

Diego

ADD REPLYlink written 2.4 years ago by diegofbuenaventura0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour