Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
C
cuda
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Operations
Operations
Metrics
Incidents
Environments
Packages & Registries
Packages & Registries
Package Registry
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
pub
hello-world
cuda
Commits
747c016d
Commit
747c016d
authored
Nov 12, 2020
by
Erik Strand
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add a basic saxpy example
parent
dc832c05
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
61 additions
and
1 deletion
+61
-1
.gitignore
.gitignore
+1
-0
Makefile
Makefile
+4
-1
saxpy.cu
saxpy.cu
+56
-0
No files found.
.gitignore
View file @
747c016d
...
...
@@ -4,3 +4,4 @@
# binaries
get_gpu_info
saxpy
Makefile
View file @
747c016d
get_gpu_info
:
get_gpu_info.cu
nvcc get_gpu_info.cu
-o
get_gpu_info
nvcc get_gpu_info.cu
-o
get_gpu_info
saxpy
:
saxpy.cu
nvcc saxpy.cu
-o
saxpy
saxpy.cu
0 → 100644
View file @
747c016d
// This code performs a single precision a*X plus Y operation on the GPU.
// Adapated from https://developer.nvidia.com/blog/easy-introduction-cuda-c-and-c/
#include <stdio.h>
// All CUDA kernels return void. To get data back to the CPU you have to copy it explicitly.
__global__
void
saxpy
(
int
n
,
float
a
,
float
*
x
,
float
*
y
)
{
int
const
i
=
blockIdx
.
x
*
blockDim
.
x
+
threadIdx
.
x
;
if
(
i
<
n
)
{
y
[
i
]
=
a
*
x
[
i
]
+
y
[
i
];
}
}
int
main
()
{
// We'll put 2^20 numbers in each vector.
int
N
=
1048576
;
// Allocate host (CPU) memory.
float
*
h_x
,
*
h_y
;
h_x
=
(
float
*
)
malloc
(
N
*
sizeof
(
float
));
h_y
=
(
float
*
)
malloc
(
N
*
sizeof
(
float
));
// Allocate device (GPU) memory.
float
*
d_x
,
*
d_y
;
cudaMalloc
(
&
d_x
,
N
*
sizeof
(
float
));
cudaMalloc
(
&
d_y
,
N
*
sizeof
(
float
));
// Initialize data.
for
(
int
i
=
0
;
i
<
N
;
++
i
)
{
h_x
[
i
]
=
1.0
f
;
h_y
[
i
]
=
2.0
f
;
}
// Copy data to the GPU.
cudaMemcpy
(
d_x
,
h_x
,
N
*
sizeof
(
float
),
cudaMemcpyHostToDevice
);
cudaMemcpy
(
d_y
,
h_y
,
N
*
sizeof
(
float
),
cudaMemcpyHostToDevice
);
// Perform SAXPY on the data.
saxpy
<<<
(
N
+
255
)
/
256
,
256
>>>
(
N
,
2.0
f
,
d_x
,
d_y
);
// Copy the result back to the host.
cudaMemcpy
(
h_y
,
d_y
,
N
*
sizeof
(
float
),
cudaMemcpyDeviceToHost
);
float
maxError
=
0.0
f
;
for
(
int
i
=
0
;
i
<
N
;
i
++
)
maxError
=
max
(
maxError
,
abs
(
h_y
[
i
]
-
4.0
f
));
printf
(
"Max error: %f
\n
"
,
maxError
);
cudaFree
(
d_x
);
cudaFree
(
d_y
);
free
(
h_x
);
free
(
h_y
);
return
0
;
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment