CABAHLA-CM: CONVERGENCIA BIG DATAHPC: DE LOS SENSORES A LAS APLICACIONES (P2018/TCS4423)
The global information technology ecosystem is currently in transition to a new generation of applications, which require intensive systems of acquisition, processing and data storage, both at the sensor and the computer level. The new scientific applications, more complex, and the increasing availability of data generated by high resolution scientific instruments in domains as diverse as climate, energy, biomedicine, etc., require the synergies between high performance computing (HPC) and large scale data analysis (Big Data). Today, the HPC world demands Big Data world techniques, while intensive data analysis requires HPC solutions.
However, the tools and cultures of HPC and Big Data have diverged because HPC has traditionally focused on strongly coupled intensive computing problems, while Big Data has been geared towards data analysis in highly scalable applications. As a result, the ecosystem described has significant shortcomings when it comes to adapting Big Data applications on emerging HPC systems, such as lack of flexibility of the storage hierarchy, the difficulty integrating dynamic flows from external devices, problems for providing data locality aware schedulers, and energy costs associated with data movement.
The overall goal of this proposal is to improve the integration of the HPC and Big Data paradigms, providing a convenient way to create software and to adapt existing hardware and software intensive in computing and data on a HPC platform. The proof of the achievement of our objective will be the ability of the proposed platform to support applications from both worlds, offering elasticity, improving the management and data capture, and optimizing the applications of local nodes and cores for heterogeneous systems. To achieve this global objective, the following specific objectives are proposed: Design of an architectural framework for the integration of HPC and Big Data environments; Exploitation of parallelism at the node level and accelerators; Management and capture of massive data integrating large scale heterogeneous systems and computation in the sensors; Development of global energy efficiency mechanisms at the local and global levels; Application of results to two real use cases for capturing and modeling sensor data for the prediction of solar radiation with high spatiotemporal resolution and for processing massive data in brain’s medical images.