## Aspects of GPU perfomance in algorithms with random memory access

The Direct Simulation Monte Carlo is a method of numerical simulation of rarefied gas flows. It is a main tool of numerical investigation of space vehicles aerothermodynamics at altitudes above 85 kilometers. The gas flow is simulated by a quite large set of model particles, which move and collide with each other and spacecraft surface, imitating real gas molecules. Number of model particles is proportional to gas density and simulations at altitudes lower than 85 km can require about a billion of particles of even more. Such large-scale problems require substantial computational resources.

To meet these requirements efficient multi-GPU code SMILE-GPU was developed. It allows to compute aerodynamic characteristics of "Apollo" re-entry capsule at 85 km in about 10 hours of wall-clock time using 1.6 billion of model particles and 48 Tesla M2090 accelerators.

However, switching to newer generation of accelerators Tesla K40 showed that computational performance drops dramatically with increase of percentage of occupied GPU memory. Testing revealed that memory access time increases tens of times after certain critical percentage of memory is occupied. Moreover, it seems to be the common problem of all GPUs arising from its architecture. Few modifications of the numerical algorithm were suggested to overcome this problem. One of them, based on the splitting of memory into some "virtual" blocks, resulted in 2.5 times speed up.

To reports list