Table of Contents
The Story behind the Succinctly Series of Books
Chapter 1 Introduction
Chapter 2 Creating a CUDA Project
Chapter 3 Architecture
Chapter 4 First Kernels
Chapter 5 Porting from C++
Chapter 6 Shared Memory
Chapter 7 Blocking with Shared Memory
Chapter 8 NVIDIA Visual Profiler (NVVP)
Chapter 9 Nsight
Chapter 10 CUDA Libraries
Conclusion
More Information
Detailed Table of Contents
Chapter 1 Introduction
CUDA stands for Compute Unified Device Architecture. It is a suite of technologies for programming NVIDIA graphics cards and computer hardware. CUDA C is an extension to C or C++; there are also extensions to other languages like FORTRAN, Python, and C#. CUDA is the official GPGPU architecture developed by NVIDIA. It is a mature architecture and has been actively developed since 2007. It is regularly updated and there is an abundance of documentation and libraries available.
GPGPU stands for general-purpose computing on graphics processing units. General purpose programming refers to any programming task that performs computations rather than standard graphics processing (CUDA is also excellent at graphics processing). Because graphics cards were originally intended to process graphics, there are very good reasons to want to harness their processing power for solving other problems. The most obvious reason is that they are extremely powerful processing units and they can take a lot of the workload off the CPU. The GPU often performs processing simultaneously with the CPU and is very efficient at certain types of computation—much more efficient than the CPU.
Parallel programming has become increasingly important in recent years and will continue to increase in importance in the coming years. The core clock speed of a CPU cannot increase indefinitely and we have almost reached the limit in this technology as it stands today. To increase the core clock speed of a CPU beyond the 3.5 GHz to 4.0 GHz range becomes increasingly expensive to power and keep cool. The alternative to increasing the clock speed of the processor is simply to include more than one processor in the same system. This alternative is exactly the idea behind graphics cards. They contain many hundreds (even thousands) of low-powered compute cores. Most graphics cards (at least the ones we will be programming) are called massively parallel devices. They work best when there are hundreds or thousands of active threads, as opposed to a CPU which is designed to execute perhaps four or five simultaneous threads. CUDA is all about harnessing the power of thousands of concurrent threads, splitting large problems up, and turning them inside out. It is about efficiently using graphics hardware instead of just leaving the GPU idle while the CPU struggles through problems with its handful of threads.
Studying CUDA gives us particular insight into how NVIDIA’s hardware works. This is of great benefit to programmers who use these devices for graphics processing. The view of the hardware from the perspective of CUDA tends to be at a much lower level than that of a programmer who uses the GPUs to only produce graphics. CUDA gives us insight into the structure and workings of these devices outside the verbose and often convoluted syntax of modern graphics APIs.
This book is aimed at readers who wish to explore GPGPU with NVIDIA hardware using CUDA. This book is intended for folks with at least some background knowledge of C++ since all of the code examples will be using this language. I will be using the Visual Studio Express 2012 integrated development environment (IDE) but the examples should be easy to follow with the 2010 or 2013 versions of Visual Studio. Chapter 9 focuses on Nsight which is only applicable to Visual Studio Professional editions, but the Express edition of Visual Studio will suffice for all the other chapters.
Chapter 2 Creating a CUDA Project
Downloading the Tools
Visual Studio 2012 Express
Before starting any CUDA project, you need to ensure that you have a suitable IDE installed (this step can be skipped if you already have Visual Studio Express or Professional installed). Download and install Visual Studio 2012 Express. The code examples and screenshots in this book are all based on this IDE unless otherwise specified. The steps for creating a CUDA project for Visual Studio 2010 and 2013 are almost identical, and these IDEs should also be usable without any changes in my instructions. Visual Studio 2012 Express is available for download from the Microsoft website here, and Visual Studio 2013 Express is available for download here.
CUDA Toolkit
Before downloading and installing the CUDA toolkit, you should make sure your hardware is CUDA-enabled. CUDA is specific to systems running with NVIDIA graphics hardware. AMD graphics cards, Intel, or any other graphics hardware will not execute CUDA programs. In addition, only relatively modern NVIDIA graphics cards are CUDA-enabled. The GeForce 8 series cards from 2006 were the first generation of CUDA-enabled hardware. You can check if your hardware is on the following list of CUDA-enabled devices on the NVIDIA developer website here.
Once you are certain your hardware is CUDA-enabled, download and install the latest CUDA toolkit. You may like to register as an NVIDIA developer to receive news about the latest CUDA releases and CUDA-related events, and to get early access to upcoming CUDA releases. The toolkit is available from the NVIDIA developer website here.
Be sure to download the version of the toolkit that is appropriate for the environment in which you will be developing. The CUDA toolkit requires a download of around 1 GB in size so the download may take some time. The version I will be using is the Windows Desktop version for 64-bits. We will not be programming any 64-bit specific code, so the 32-bit version would also be fine. The toolkit comes with many code samples, developer drivers for the graphics card, as well as the libraries required to code CUDA. Be sure to install Visual Studio before installing the CUDA toolkit as it adds the CUDA features to the IDE.
Run the downloaded file in a manner suitable for the platform on which you have downloaded it and be sure to follow any instructions given to install the toolkit correctly.
Once you install the CUDA toolkit, I encourage you explore the installed folder. We will refer to this install folder and its libraries and headers many times while programming CUDA. If you do not change the install path during installation, the default folder that the toolkit installs to is as follows:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5
Where the C: is the main system drive and v6.5 is the version of the toolkit you installed.
Device Query
Before we begin a CUDA project, it is probably a good idea to get to know the hardware against which you will be programming. CUDA is low level; it is designed with very specific hardware in mind. Throughout the course of this book, many references are made to specific capabilities and metrics of the CUDA devices we will be programming. The CUDA toolkit installs a samples package containing a multitude of examples—one of which is called Device Query. This is a very handy application which prints interesting information about your installed hardware to a console window. Depending on the version of the CUDA toolkit you download, the DeviceQuery.exe program may be located in several different places. It is included as part of the CUDA samples, and you can find it by clicking the Samples icon in the NVIDIA program group and locating DeviceQuery.
The application can also be downloaded and installed separately from the NVIDIA website here.
Figure 2.1: Device Query Output
Tip: You may find that when you run deviceQuery.exe, it opens and closes too quickly to read. Open the folder containing the deviceQuery.exe file in Windows Explorer. Hold down Shift and right-click a blank space in the folder. You should see an option to Open command window here in the context menu. Clicking this will open a command window in which you can type “deviceQuery.exe” to run the program, and the window will not close automatically.
Figure 2.1 is the output from Device Query for the device I used throughout this book. References to various metrics such as the maximum number of threads per streaming multiprocessor (SM) can be found for your hardware by examining the output from Device Query. Where I refer to a particular value from Device Query in the text, you should look up the values for your own hardware. If you open Device Query as described previously, you might want to keep it open and minimized while working through this book for reference.
Creating a CUDA Project
CUDA is initialized the first time a CUDA runtime function is called. The CUDA runtime is a collection of basic functions for memory management and other things. To call a CUDA runtime function, the project needs to include CUDA.h and link to the CUDA Runtime Library, CUDART.lib.
To create our first project, open a new solution in Visual Studio 2012 Express, choose C++ as the language, and use an empty project. Once the project is created, right-click on the project's name in the Solution Explorer and click Build Customizations as per Figure 2.2. If you are using Visual Studio 2013, the Build Customizations option is under the submenu labeled Build Dependencies.
Figure 2.2: Build Customizations
This will open the Build Customizations window. Select the check box beside CUDA x.x (.targets, .props) as shown in Figure 2.3. Selecting this build customization will cause Visual Studio to use the CUDA toolset for files with a .cu extension, which are CUDA source files. CUDA kernels (functions designed for the GPU) are written in .cu files, but .cu files can contain regular C++ as well. Once you have selected the CUDA x.x(.targets, .props) build customization, click OK to save your changes.
Figure 2.3: CUDA 5.5 (.targets, .props)
Next, we can add a CUDA source file to our project. Right-click on the project in the Solution Explorer, click Add, and then click New Item in the context menu as per Figure 2.4.
Figure 2.4: Adding a Code File
Click C++ File (.cpp) and name your file. I have called mine Main.cu as shown in Figure 2.5. The important thing is to give the file a .cu extension instead of the default .cpp. Because of the build customization setting, this code file will be sent to the NVIDIA CUDA C Compiler (NVCC).The NVCC compiler will compile all of the CUDA code and return the remaining C++ code to the Microsoft C++ compiler.
Figure 2.5: Adding a .cu File
Next, we need to ensure that Visual Studio is aware of the directory where the CUDA header files are located; this will likely be the same on your machine as it is on mine (depending on whether or not you changed the directory during installation). If you are using Visual Studio Professional, the paths may already be set up but it is always a good idea to check. Open the properties of the project by clicking Project in the menu bar and selecting Project Name Properties (where Project Name is the name of your project). Click VC++ Directories in the left panel and click the expand arrow beside the Include Directories entry in the right-hand panel. Click Edit in the menu that appears; this will allow you to specify paths which Visual Studio should search for headers, included with triangle braces (< and >). See Figure 2.6 for reference.
Figure 2.6: Properties for Including Directories
In the Include Directories dialog box, click the New Folder icon and locate the CUDA toolkit include folder on your computer. It will most likely be in a location similar to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include as shown in Figure 2.7. When you click Select Folder, you will be presented with the box in Figure 2.8. Click OK once you have selected your folder.
Figure 2.7: CUDA Toolkit include Directory
Figure 2.8: Click OK
Once this path is added, the CUDA include directory will be searched with the other standard include paths when Visual Studio locates headers included with the right-angle brackets (< and >).
Next, we need to specify where the CUDA library files are (that’s the static link “lib” libraries). You may not need to specify where the CUDA library folder is located, depending on whether or not the CUDA installer automatically registered the folder where the CUDA libraries are installed. Unless you have specified a different directory when installing the CUDA toolkit, the library files will most likely be located in a place similar to the headers. Select Library Directories, click the drop-down arrow on the right, and click Edit as per Figure 2.9.
Figure 2.9: Properties for Library Directories
Figure 2.10: CUDA Library Directory
There are two library directories; one is for 32-bit projects and the other is for 64-bit projects. Select the one that is appropriate for your present solution platform. Remember, the CUDA version number and exact path may be different on your machine from the following examples.
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\lib\Win32
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\lib\x64
Once the library directory is added, we need to link to the CUDA runtime library: a file called CUDART.lib. Expand the Linker, and then select Input from the left panel of the Project Properties page. Type $(CUDAToolkitLibDir)\CUDART.lib; into the Additional Dependencies box. You can add the CUDART dependency to the beginning or to the end of the list but be careful not to delete all of the standard Windows library dependencies. Remember to click Apply to save these changes, and then click OK to close the box as per Figure 2.10. Later, when we link to other libraries, the steps are the same. For instance, to link to the curand.lib library, you would add $(CUDAToolkitLibDir)\CURAND.lib; to the additional dependencies.
Tip: You can actually just supply the name of the library if the path is already known to Visual Studio. For example, you can type CUDART.lib; instead of the longer $(CUDAToolkitLibDir)\CUDART.lib.
Figure 2.10: CUDA Runtime Library
Now that we have added the include path and the runtime library, we are ready to write some CUDA code. Initially, we can test that the Include directory and the link to CUDART.lib are working by writing the source code in Listing 2.1 into the .cu file we previously added to our project.
#include <iostream>
#include <cuda.h> // Main CUDA header
using namespace std;
int main() {
cout<<"Hello CUDA!"<<endl;
return 0;
}
Listing 2.1: Basic CUDA Program
Debug your project by pressing F5 or clicking Start Debugging in the Debug menu. Listing 2.1 does not do anything with CUDA, but if the project builds correctly, it is a good indication that everything is installed properly. The program will print the text “Hello CUDA!” to the screen.
Note: Each version of CUDA is designed to work with one or more specific versions of Visual Studio. Your platform toolset (Visual Studio version) may not be supported by CUDA. You can change the platform toolset in the project’s properties but only if you have other versions of Visual Studio installed. The platform toolset can be specified in the Configuration Properties | General page of the project properties. In the Platform option, you can see the versions of Visual Studio you have installed. To use other platform toolsets, you need to install different versions of Visual Studio. The code in this text was built with Visual Studio 2012 and the version of the platform toolset I used was v110.
Text Highlighting
It is very handy to have Visual Studio recognize that the .cu files are actually C++ source code files. This way, we get all the benefits of code highlighting and IntelliSense while we program. By default, Visual Studio Express is not aware that the code in these files is basically just C++ with a few CUDA extensions. Open the Visual Studio properties pages by clicking Tools and then selecting Options from the menu.
Figure 2.11: Syntax highlighting for .cu files
To enable C++ syntax highlighting and IntelliSense code suggestions for .cu files, click Text Editor and then File Extension in the panel on the left (see Figure 2.11). In the right panel, type “CU” into the Extension box. Select Microsoft Visual C++ from the Editor drop-down list, and then click Apply. These changes will be saved for future projects. Whenever a .cu file is added to a project, Visual Studio will always use C++ colors and IntelliSense.
You may notice that most of the CUDA keywords are underlined in red as if they were errors. Visual Studio treats the .cu files as regular C++ but does not understand the CUDA keywords. The CUDA keywords are understood by NVCC without any declaration. There are headers you can include that define the keywords and help Visual Studio recognize them in the same manner as regular code. The only necessary header for general CUDA C programming is the CUDA.h header. The following headers shown in Listing 2.2 may also be included (all are located in the same folder as cuda.h) as well as the #defines shown; this will enable kernel highlighting and prevent Visual Studio from underlining CUDA keywords as errors.
#define __cplusplus
#define __CUDACC__
#include <iostream>
#include <cuda.h>
#include <device_launch_parameters.h>
#include <cuda_runtime.h>
#include <device_functions.h>
Listing 2.2: Suggested Headers and Defines
Note: Programs can be built with or without these additional headers but Visual Studio reports all undefined symbols as errors. When a program has legitimate errors and fails to compile, the CUDA symbol errors can number in the hundreds. This makes it very difficult to know which errors reported by Visual Studio are real errors and which are undefined CUDA symbols. For this reason, it is usually better to include the headers in Listing 2.2 (or whatever headers define the keywords in your code) even though they are optional.
Timeout Detection and Recovery
Before we can compute intensive
Verlag: BookRix GmbH & Co. KG
Tag der Veröffentlichung: 01.02.2016
ISBN: 978-3-7396-3509-5
Alle Rechte vorbehalten