Lab 11 (5 points)
CS550, Operating Systems
Caching
Name:
_____________________________________________
To submit this assignment, you may copy and paste parts of the
assignment into a text editor such as nano, vi, notepad, MS Word,
OpenOffice Writer, etc. Zip any code and scripts you create
showing the output of your solutions, and submit the zip file to
the dropbox for lab 11. Be sure to include a text document
including any written/typed/graphed results. You may work with a
partner on this lab, but each person must submit his/her own
solution.
The following lab is based in part upon labs provided at the CUDA
and C++ 11 sessions from the SC13 conference. Within this lab,
you will work with a matrix multiplication program and learn about
the effects of data locality within the CPU cache, and how this may
be indirectly affected depending upon the order in which data is
accessed.
Within this lab, you will test scaling and caching by using matrix
multiplication.
Download the files at the following link.
Review the batch file provided below.
#!/bin/bash
#SBATCH -A TG-SEE120004
#SBATCH -n 16
#SBATCH -J matMult
#SBATCH -o mm.o%j
#SBATCH -p development
#SBATCH -t 00:15:00
echo 'Starting job'
ibrun mmnf.exe 5000
echo 'Completed job'
1. What are the command line parameters provided in this batch file?
2. What do you think the echo command does?
Review the two C programs provided in the zip file.
3. What is the purpose of the code in each file?
4. What design pattern is used within the code?
5. What data is sent in each process?
6. What data is received in each process?
7. Are all of these data transfers necessary? Explain.
8. Compile and run the code on Stampede using the batch scripts
provided. Record the run time of your results.
9. Modify the number of cores used to 16, 32, 128, 256, and
512. Record and graph these run times vs the number of
processors, including the results from problem 8.
10. Explain the results. Consider caching and data locality in
your answer. Hint: consider the difference in memory location
between matrix data in the same row - A[i][j] and A[i][j+1] vs
matrix data in the same column A[i+1][j] and A[i][j]. In which
case would both values likely be pulled into cache?