LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, vol.2481, pp.111-125, 2005 (SCI-Expanded)
The performance of a NUMA architecture depends on the efficient use of local memory. Therefore, software-level techniques that improve memory locality (in addition to parallelism) are extremely important to extract the best performance from these architectures. The proposed solutions so far include OS-based automatic data migrations and compiler-based static/dynamic data distributions.