Text Histogram: Lab asssignment¶
Objective¶
The purpose of this lab is to implement an efficient histogram algorithm for an input array of ASCII characters. There are 128 ASCII characters but we will consider only those with integer codes between 32 and 126. Each single character but the 36 and 42 will map into its own bin for a fixed total of 68 bins (we do not distinguish small and capital letters). The histogram bins will be unsigned 32-bit counters that do not saturate. Use the approach of creating a privitized histogram in shared memory for each thread block, then atomically modifying the global histogram.
Instructions¶
You can either re-use the Nvidia template in the samples-etu/0_Simple folder or write your own source code from scratch. Whatever solution you take, remember that your code needs to perform the following:
allocate device memory
copy data from host memory to device memory
initialize thread block and grid dimensions
launch CUDA kernel(s)
copy result(s) from device to host
deallocate device memory
The excutable generated as a result of compiling the lab with the make command and the tuned Nvidia-provided Makefile, can be run using the following command:
perrot@cluster2:~$ ./histoText -i <inputText.txt> -o <outputHisto.csv>
where <outputHisto.csv> is the exected csv-formatted output file containing on each line: <ASCII char c>:<c count>. The input file is <inputText.txt> and contains the dataset to be processed.
Note
As a reference, you can process this
English dictionnary
and the resulting output file should belike this
.Obviously, this dictionnary does not contain every ASCII character. So you are strongly invited to generate or find other test files to ensure that your program is working correctly.
My own tests will include various files.
Your work will be graded according to several criteria:
the quality of the CUDA code
the answers to the following questions
the performance of your program
Questions
In addition to the code itself, you are requested to answer the following question, that will help in understanding your code design.
Were there any difficulties you had with completing the optimization correctly ?
Which optimizations gave the most benifit ?
For the histogram kernel, how many global memory reads are being performed by your kernel? explain.
For the histogram kernel, how many atomic operations are being performed by your kernel? explain.
Most text files will consist of only letters, numbers and whitespace characters. What can we say about atomic access contention regarding the number of threads that are simultaneously trying to atomically increment a private histogram ?
Warning
Deadline
As I do not have access to the UTBM’s course on the Moodle platform, you are invited to send me your source code via e-mail as an attachment.
I must receive your codes by April 19th. Any code that would be sent by this date will receive a grade 0/20.