Text Histogram: Lab asssignment¶

Objective¶

The purpose of this lab is to implement an efficient histogram algorithm for an input array of ASCII characters. There are 128 ASCII characters but we will consider only those with integer codes between 32 and 126. Each single character but the 36 and 42 will map into its own bin for a fixed total of 68 bins (we do not distinguish small and capital letters). The histogram bins will be unsigned 32-bit counters that do not saturate. Use the approach of creating a privitized histogram in shared memory for each thread block, then atomically modifying the global histogram.

Instructions¶

You can either re-use the Nvidia template in the samples-etu/0_Simple folder or write your own source code from scratch. Whatever solution you take, remember that your code needs to perform the following:

allocate device memory

copy data from host memory to device memory

initialize thread block and grid dimensions

launch CUDA kernel(s)

copy result(s) from device to host

deallocate device memory

The excutable generated as a result of compiling the lab with the make command and the tuned Nvidia-provided Makefile, can be run using the following command:

perrot@cluster2:~$ ./histoText -i <inputText.txt> -o <outputHisto.csv>

where <outputHisto.csv> is the exected csv-formatted output file containing on each line: <ASCII char c>:<c count>. The input file is <inputText.txt> and contains the dataset to be processed.

Note

As a reference, you can process this English dictionnary and the resulting output file should be like this.
Obviously, this dictionnary does not contain every ASCII character. So you are strongly invited to generate or find other test files to ensure that your program is working correctly.
My own tests will include various files.
Your work will be graded according to several criteria:
- the quality of the CUDA code
- the answers to the following questions
- the performance of your program

Questions

In addition to the code itself, you are requested to answer the following question, that will help in understanding your code design.

Were there any difficulties you had with completing the optimization correctly ?
Which optimizations gave the most benifit ?
For the histogram kernel, how many global memory reads are being performed by your kernel? explain.
For the histogram kernel, how many atomic operations are being performed by your kernel? explain.
Most text files will consist of only letters, numbers and whitespace characters. What can we say about atomic access contention regarding the number of threads that are simultaneously trying to atomically increment a private histogram ?

Warning

Deadline

As I do not have access to the UTBM’s course on the Moodle platform, you are invited to send me your source code via e-mail as an attachment.
I must receive your codes by April 19th. Any code that would be sent by this date will receive a grade 0/20.