In the last decades speeding up the computation of scientific applications was mostly achieved by increasing the frequency of a single core processor. Due to the technical limitations it is not possible to further increase this frequency nowadays. Therefore, oone possibility is to increase the amount of homogenous cores on the same chip. Another one is to integrate different or heterogeneous compute units into the same system. Such systems are called hybrid or heterogeneous systems and integrate for example general purpose graphic processing units (GPUs) or field-programmable gate arrays (FPGAs). The latter offers the possibility to have special hardware accelerators for a given problem. While GPUs offer a high floating-point operation per dollar a FPGA offers a better floating-point operation per watt. Therefore, the exploitation of FPGAs can reduce the high amount of energy consumption in current supercomputers. Often a drawback is the data transfer between the host processor and special hardware unit. A new trend in the field of computer architecture is to integrate the special hardware on the same chip as the host processor. This allows the direct memory access to the host memory or even the access to the cache hierarchy of the host processor.
But with such a diversity of heterogeneous hardware compute units the application developer has the burden to find the best hardware unit for his application. Moreover, for different parts of the application different hardware units are better than the others. In this project we consider for different scientific applications which hardware units in a heterogeneous system should be used. We want to identify common kernels in this applications that can also be reused from other application developers to speedup their development time.