Commit fc871cc4 authored by Erik Strand's avatar Erik Strand

Update README

parent a05e3b3e
......@@ -100,3 +100,35 @@ situations where no warp is ready to execute its next instruction. This reduces
multiprocessor's *occupancy*, which is basically the amount of time it spends doing useful things.
Balancing the number of threads with resource usage to increase occupancy is one of the most
important concerns for writing really fast GPU code.
## Setup
The first step is to figure out what GPU you're going to use. Most desktops have GPUs, though to run
CUDA code you'll have to make sure you have an NVIDIA GPU. Higher end laptops (especially gaming
ones) also often have dedicated GPUs. Note that no Macs have NVIDIA GPUs; the two companies have a
bit of a feud going on. You can also rent time on GPU equipped machines in the cloud. Amazon's
[P3 instances](https://aws.amazon.com/ec2/instance-types/p3/) have up to eight V100 GPUS,
which can deliver up to a petaflop(!). Finally, you could purchase a
[Jetson](https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/) development kit.
Next you need to install CUDA. If you're using an Amazon GPU instance or a Jetson, this will be set
up for you. If you're setting up your own computer, you can download and install CUDA from
NVIDIA's [website](https://developer.nvidia.com/cuda-downloads). CUDA comes with a whole suite of
tools, the most important of which is `nvcc`, the CUDA compiler.
CUDA assumes you already have a C++ compiler installed. So if you're setting up your own computer
you may also need to install [`gcc`](https://gcc.gnu.org/). While you're at it, it's probably a good
idea to install [`make`](https://www.gnu.org/software/make/) as well. We'll use it to build the
examples in this repo. There are more detailed instructions for installing these tools
[here](https://gitlab.cba.mit.edu/pub/hello-world/c_cpp_and_make).
## Building and Running
Once everything is installed, you just need to run `make` from this directory. This should build two
example programs.
The first, `get_gpu_info`, just looks for NVIDIA GPUs in your system and prints some stats on each
one. It doesn't actually run anything on any of them.
The second, `saxpy` runs a basic linear algebra routine on the CPU and GPU, and compares the
run times.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment