CABAHLA-CM: CONVERGENCIA BIG DATAHPC: DE LOS SENSORES A LAS APLICACIONES (P2018/TCS4423)
The global information technology ecosystem is currently in transition to a new generation of applications, which require intensive systems of acquisition, processing and data storage, both at the sensor and the computer level. The new scientific applications, more complex, and the increasing availability of data generated by high resolution scientific instruments in domains as diverse as climate, energy, biomedicine, etc., require the synergies between high performance computing (HPC) and large scale data analysis (Big Data). Today, the HPC world demands Big Data world techniques, while intensive data analysis requires HPC solutions.
However, the tools and cultures of HPC and Big Data have diverged because HPC has traditionally focused on strongly coupled intensive computing problems, while Big Data has been geared towards data analysis in highly scalable applications. As a result, the ecosystem described has significant shortcomings when it comes to adapting Big Data applications on emerging HPC systems, such as lack of flexibility of the storage hierarchy, the difficulty integrating dynamic flows from external devices, problems for providing data locality aware schedulers, and energy costs associated with data movement.