Mease Lab HPC Setup

Instructions / reference information for using mease-lab-to-nwb on the bwForCluster Helix.

Important: The previous bwForCluster MLS&WISO has been replaced by bwForCluster Helix

Migration to the new cluster

If you were already using MLS&WISO with username hd_ab123:

Register for “bwForCluster Helix” at bwservices.uni-heidelberg.de and set a service password
You can then login with ssh hd_ab123@helix.bwservices.uni-heidelberg.de using this service password (OTP as before)
SDS is still in the same place, kinit no longer required: /mnt/sds-hd/sd19b001
Your old home directory is accessible (read-only) at: /mnt/mls_home/hd/hd_hd/hd_ab123

Account registration

Join the RV
- You’ll need an acronym and password from your advisor
- Submit them in your request to join the RV as a coworker at the link above
Get permission from the University
- This is a separate step from joining the RV
Set up 2FA
- 2-factor authentication using an authenticator app on your phone, e.g. Google Authenticator
- The app then displays a new 6-digit code every 30 seconds
- Use this code when asked for an OTP (one-time password) when logging in the cluster
Set a service password
- After logging in with your uniid, you should see a list of services
- Register for “bwForCluster Helix” and set a service password

ssh hd_ab123@helix.bwservices.uni-heidelberg.de

It will ask you for your bwForCluster service password, and your OTP (the current 6-digit token displayed in your authenticator app)

SDS

The files on SDS are located at

/mnt/sds-hd/sd19b001

Initial setup

Once you are logged in and can access SDS, do

source /mnt/sds-hd/sd19b001/HPC_INSTALLATION_HELIX/init.sh

You should then be in the measelab conda environment, with these programs installed & on path:

matlab
mease-lab-to-nwb
setup-jupyter
spike sorters
- HDSort
- Herdingspikes2
- IronClust
- Kilosort 1.0, 2.0, 2.5, 2.5 (solve_zero_padding fork), 3.0 (git master)
- Klusta
- Tridesclous
- Waveclus
- Combinato

To do this automatically every time you log on or run a job (recommended), add the above line to your ~/.bashrc with this command:

echo "source /mnt/sds-hd/sd19b001/HPC_INSTALLATION_HELIX/init.sh" >> ~/.bashrc

Then each time you login you should see (measelab) at the start of your command line to show that you are in the measelab conda environment.

Interactive Jupyter use

There is a helper script for setting up and using a remote jupyter server on the cluster. To use it, type

setup-jupyter

it will ask you how long you need the job to run for, how much RAM you need, how many cpus, etc.
it will then submit a batch job to the queue and wait for it to start running
- if it doesn’t start running after a few minutes, there may be no available nodes with the specs you asked for
- to get something to run now, consider trying again with reduced requirements, e.g. any for gpu, 60 for gb of RAM
once the job is running, it prints out instructions for what to do next
- open an ssh shell within your terminal with the escape key sequence: Enter, ~, C
  - if this works you’ll see a new line starting with ssh>
- copy and paste the line starting with -L and press Enter
  - it should then say Forwarding port.
  - press Enter again to return to the normal command line
- open the web address starting with localhost: in your web browser

Type setup-jupyter --help or see setup-jupyter for more information.

Jupyter tips

If you prefer the jupyter notebook interface to jupyter lab, go to “Help -> Launch Classic Notebook”
The Jupyter working directory will be the same directory you ran setup-jupyter from
SDS files are located at /mnt/sds-hd/sd19b001
When you are finished using jupyter, go to “File -> Shutdown”
- This stops the job running on HPC, otherwise it will keep running there for the entire allocated time.
On your ssh connection to the cluster, you can see all of your current jobs (queued and running) with squeue
- If you have any jobs queued or running that you won’t use any more, you can cancel them with scancel JOBID, where JOBID is the number displayed by squeue
If you have lost your connection to a running jupyter server, try find-jupyter
If you are stuck in the queue waiting for your job to start, consider reducing your requirements
- The RAM, CPU & GPU for each type of node are listed here
- The actual RAM available is a little bit less than the listed values
- You can see a list of idle nodes (but unfortunately not which specific GPU types are idle) with sinfo_t_idle
If you run out of disk space in your home directory you can get various errors
- e.g. File Load Error when trying to open a notebook
- Deleting files from within jupyterlab doesn’t actually delete them but moves them to a trash folder
  - To see how much space this is taking up: du -sh .local/share/Trash
  - To empty out the trash folder: rm -rf .local/share/Trash

bwVisu

This is a new service for using graphical user interface programs - in particular Phy - on HPC.

Account registration

Assumes you have already set up your MLS&WISO HPC account
Set a service password
- After logging in with your uniid, you should see a list of services
- Register for “bwVisu” and set a service password

go to bwvisu-web.urz.uni-heidelberg.de/accounts/login
login in using
- username: hd_UNIID, where UNIID is your uni-id (same as for MLS&WISO ssh login)
- password: is the bwVisu service password you set above (not the same as your MLS&WISO service password!)
- OTP: the current 6-digit token displayed in your authenticator app (same as for MLS&WISO ssh login)

Use

scroll down to Phy, then click on the blue “v2” button
click the green “Start new job” button, enter runtime and press “start” again
click on the new “Phy - v2 xyz123” button under “Running jobs”
click on the Xpra Web Client link to open Phy in a new webpage
Press “Enter” to close a dialog box that may or may not be initially visible
You should then see a dialog box asking you to choose your params.py file
- The “Home” directory is your home directory on MLS&WISO
- For SDS: Other locations -> Computer -> mnt -> sds-hd -> sd19b001
- Note: for SDS access you need to have an active kerberos ticket on MLS&WISO

Interactive command-line use

To see how many idle nodes are currently available:

sinfo_t_idle

To run an interactive job on a node with a gpu (i.e. log on to it and run commands there):

srun --partition=single --time=0:30:00 --nodes=1 --ntasks-per-node=1 --mem=16gb --gres=gpu:A40:1 --pty /bin/bash

This asks for 30mins with 1 cpu, 1 A40 GPU, and 16GB of ram.

(Note not all of the system RAM is available, e.g. if the machine has 64gb the most you can ask for is around 60gb)

If you don’t mind which type of GPU you get, you can simply use

--gres=gpu:1

Once the job starts you will be logged into the machine - if you didn’t add the source line to your ~/.bashrc file you will have to run it again manually.

Batch jobs

Longer jobs can be submitted as batch jobs to a queue, and will run when resources are available.

See mease-hpc-setup examples or wiki.bwhpc.de/e/Helix/Slurm for more information on the batch system.

If you have a running batch job and you want to log in to the node where it is running you can do

srun --jobid=123456 --pty /bin/bash

Then e.g. htop to see CPU/RAM use, or nvidia-smi to see GPU use.

MUA-Analysis

The MUA-Analysis repo is cloned at /mnt/sds-hd/sd19b001/MUA-analysis.

See /mnt/sds-hd/sd19b001/Liam/MUA_examples/example1 for an example of how to run this on HPC.

Files

The two matlab files are copied from Example_experiment, with these changes:

the location of MUA-analysis on HPC is added to the matlab path
- addpath(genpath('/mnt/sds-hd/sd19b001/MUA-analysis'))
tell matlab to use 12 cpu cores
- parpool('local', 12)
the dataDir in example_experimental_parameters.m is modified
- SDS on HPC is located at /mnt/sds-hd/sd19b001
- HPC is linux so / is used to separate directories in paths (versus \ on Windows)

After modifying example_experimental_parameters.m I ran matlab -batch example_experimental_parameters to re-generate the .mat file.

There is also a file submit.sh: this describes what resources your analysis job needs and what command it should run.

The file slurm-1271232.out and the folder /mnt/sds-hd/sd19b001/Liam/ECE_testing_data/2021-08-20_M6_S1_ECE_Processing_example are generated outputs from running the analysis.

submit.sh

#!/bin/bash

#SBATCH --partition=gpu-single
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00

time matlab -batch ECE_Workflow_example

first line just says this is a bash script
then #SBATCH lines ask for
- 12 cpu cores
- 1 gpu
- max job runtime of 1 hour
last line is the actual job: runs the ECE_Workflow_example matlab script

To run the analysis

to submit the job: sbatch submit.sh
you can then see status of your jobs with squeue
the output will be written to slurm-12345.out, where 12345 will be your job id
the files generated by the analysis (for this example) are found at /mnt/sds-hd/sd19b001/Liam/ECE_testing_data/2021-08-20_M6_S1_ECE_Processing_example

Notes

MUA-Analysis repo is cloned at /mnt/sds-hd/sd19b001/MUA-analysis

Mease Lab HPC Setup

Migration to the new cluster

Account registration

Login

SDS

Initial setup

Interactive Jupyter use

Jupyter tips

bwVisu

Account registration

Login

Use

Interactive command-line use

Batch jobs

MUA-Analysis

Files

submit.sh

To run the analysis

Notes