Skip to content
Snippets Groups Projects
Commit d5e45a08 authored by Erik Strand's avatar Erik Strand
Browse files

Update README

parent 28ea429d
Branches
No related tags found
No related merge requests found
# satori
# Satori
## Logging In
Before logging in for the first time, you'll need to activate your account by following these
[instructions](https://mit-satori.github.io/satori-getting-started.html#logging-in-to-satori).
Now you can `ssh` in to either of the login nodes like this (replacing `strand` with your username).
```
ssh strand@satori-login-001.mit.edu
ssh strand@satori-login-002.mit.edu
```
According to [this](https://mit-satori.github.io/satori-ssh.html), the first login node should be
used for submitting jobs, and the second for compiling code or transferring large files. But it also
says that if one isn't available, just try the other. Both have 160 cores.
## Modules
Satori is set up to use [Environment Modules](https://modules.readthedocs.io/en/latest/index.html)
to control which executables, libraries, etc. are on your path(s). So you'll want to become familiar
with the `module` command.
- `module avail` lists all available modules
- `module spider <module name>` gives you info about a module, including which other modules have to
to be loaded first
- `module load <module name>` loads a specific module
- `module list` shows all the currently loaded modules
- `module unload <module name>` unloads a specific modeul
- `module purge` unloads all modules
Satori also uses [Spack](https://spack.io/) to manage versions of many tools, so generally speaking
you should always have this module loaded: `module load spack`. If you run `module avail` before and
after loading spack, you'll see that a lot more modules become visible.
For compiling C/C++ and CUDA code, these are the modules I start with.
```
module load spack git cuda gcc/7.3.0 cmake
```
Note: I'd like to use gcc 8, but I get build errors when I use it.
## Running Jobs
Let's start with these simple CUDA [hello world](https://gitlab.cba.mit.edu/pub/hello-world/cuda)
programs.
With the modules above loaded, you should be able to clone the repo and build it. (The first time
through, you probably want to do a little git
[setup](https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup).)
```
git clone ssh://git@gitlab.cba.mit.edu:846/pub/hello-world/cuda.git
cd cuda
make -j
```
Since all these programs are very lightweight, I admit I tested them all on the login node directly.
Running `get_gpu_info` in particular revealed that the login nodes each have two V100 GPUs. (The
compute nodes have four.)
But let's do things the right way, using [slurm](https://slurm.schedmd.com/overview.html). We'll
start by making a submission script for `saxpy`. I called mine `saxpy.slurm`, and put it in its own
directory outside the repo.
```
#!/bin/bash
#SBATCH -J saxpy # sets the job name
#SBATCH -o saxpy_%j.out # determines the main output file (%j will be replaced with the job number)
#SBATCH -e saxpy_%j.err # determines the error output file
#SBATCH --mail-user=erik.strand@cba.mit.edu
#SBATCH --mail-type=ALL
#SBATCH --gres=gpu:1 # requests one GPU per node...
#SBATCH --nodes=1 # and one node...
#SBATCH --ntasks-per-node=1 # running only instance of our command.
#SBATCH --mem=256M # We ask for 256 megabyte of memory (plenty for our purposes)...
#SBATCH --time=00:01:00 # and one minute of time (again, more than we really need).
~/code/cuda/saxpy
echo "Run completed at:"
date
```
All the lines that start with `#SBATCH` are parsed by slurm to determine which resources you need.
You can also pass these on the command line, but I like to put everything in a file so I don't
forget what I asked for.
To submit the job, run `sbatch saxpy.slurm`. Slurm will then tell you the job id.
```
[strand@satori-login-002 saxpy]$ sbatch saxpy.slurm
Submitted batch job 61187
```
To query jobs in the queue, use `squeue`. If you run it with no arguments, you'll see all the queued
jobs. To ask about a specific job, use `-j`. To ask about all jobs that you've submitted, use `-u`.
```
[strand@satori-login-002 saxpy]$ squeue -u strand
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
61187 sched_sys saxpy strand R 0:00 1 node0023
```
Since we only asked for one minute of compute time, our job is started very quickly. So if you run
`squeue` and don't see anything, it might just be because the job already finished.
You'll know the job is finished when its output files appear. They should show up in the directory
where you queued the job with `sbatch`.
```
[strand@satori-login-002 saxpy]$ cat saxpy_61187.out
Performing SAXPY on vectors of dim 1048576
CPU time: 323 microseconds
GPU time: 59 microseconds
Max error: 0
Run completed at:
Mon Mar 1 19:40:43 EST 2021
```
Now let's try submitting `saxpy_multi_gpu`, and giving it multiple GPUs. We can use basically the
same batch script, just with the new executable and GPU count (i.e. `--gres=gpu:4`). It doesn't
matter for this program, but for real work you may also want to add `#SBATCH --exclusive` to make
sure you're not competing with other jobs on the same GPU.
We submit the job in the same way: `sbatch saxpy_multi_gpu.slurm`. Soon after I had this output
file.
```
Performing SAXPY on vectors of dim 1048576.
Found 4 GPUs.
CPU time: 579 microseconds
GPU 0 time: 55 microseconds
GPU 1 time: 85 microseconds
GPU 2 time: 60 microseconds
GPU 3 time: 61 microseconds
GPU 0 max error: 0
GPU 1 max error: 0
GPU 2 max error: 0
GPU 3 max error: 0
Run completed at:
Mon Mar 1 20:27:16 EST 2021
```
## TODO
- MPI hello world
- Interactive sessions
## Questions
- How can I load CUDA 11?
- Why is gcc 8 broken?
- Is there a module for cmake 3.19? If not, can I make one?
- Is there a dedicated test queue?
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment