Overview
A particle system where each particle attracts every other particle using gravitational forces. The attraction exerted on each particle is computed in parallel with vectorized operations. Our aligned data is divided in half to be offloaded and processed by two different xeon phi cards. When compared with serial, unvectorized code running on the lab machines, we were able to achieve nearly a 42x speed increase with 16,000 particles.
We originally chose this project because it seemed like it would offload well. For every particle position that you send to the card, you have to do N calculations, where N is the number of total particles. Because of this, the arithmatic intensity is O(N), so theoretically when offloading our program we aren't memory bound. This is the primary reason we we're able to get so much speedup from offloading.