The global information technology ecosystem is currently in transition to a new generation of applications, which require intensive systems of acquisition, processing and data storage, both at the sensor and the computer level. The new scientific applications, more complex, and the increasing availability of data generated by high resolution scientific instruments in domains as diverse as climate, energy, biomedicine, etc., require the synergies between high performance computing (HPC) and large scale data analysis (Big Data). Today, the HPC world demands Big Data world techniques, while intensive data analysis requires HPC solutions. However, the tools and cultures of HPC and Big Data have diverged because HPC has traditionally focused on strongly coupled intensive computing problems, while Big Data has been geared towards data analysis in highly scalable applications. As a result, the ecosystem described has significant shortcomings when it comes to adapting Big Data applications on emerging HPC systems, such as lack of flexibility of the storage hierarchy, the difficulty integrating dynamic flows from external devices, problems for providing data locality aware schedulers, and energy costs associated with data movement.
The overall goal of this proposal is to improve the integration of the HPC and Big Data paradigms, providing a convenient way to create software and to adapt existing hardware and software intensive in computing and data on a HPC platform. The proof of the achievement of our objective will be the ability of the proposed platform to support applications from both worlds, offering elasticity, improving the management and data capture, and optimizing the applications of local nodes and cores for heterogeneous systems. To achieve this global objective, the following specific objectives are proposed: Design of an architectural framework for the integration of HPC and Big Data environments; Exploitation of parallelism at the node level and accelerators; Management and capture of massive data integrating large scale heterogeneous systems and computation in the sensors; Development of global energy efficiency mechanisms at the local and global levels; Application of results to two real use cases for capturing and modeling sensor data for the prediction of solar radiation with high spatio-temporal resolution and for processing massive data in brain’s medical images.
The project brings together four research groups, with vast experience in HPC and data-intensive systems, which has a strong national and international presence. To achieve the objectives, complete work and activities plans are proposed, including collaboration with international organizations and researchers, which makes it possible to ensure that the project will have an enormous scientific-technological impact in many public and private spheres, given that the solutions provided by CABAHLA are totally interdisciplinary and amenable to many areas (see Section 1.3). The plan of activities includes first line scientific dissemination, with commitment of 90 publications, as well as the training of 20 doctors and the hiring of 8 people. In addition, this project will enhance the international presence of the group, that already collaborates with multiple universities and research centers, which will be reflected in common project proposals.
A sample of the interest of our proposal is the existence in Europe of a working group for the convergence between HPC and Big Data supported by ETP4HPC and BDVA, led by Prof. María S. Pérez and with the cooperation of several research groups in this proposal. In addition, Prof. Jesús Carretero collaborates in the preparation of the strategic research agenda of the European platform ETP4HPC in the line of data-intensive applications.
The potential impact at the socio-economic level is demonstrated through the letters of interest in the project from companies (IBM, Telefónica, Nokia, CA Technologies, HPE …) and non-profit organizations (Hospital General Gregorio Marañón and CINVESTAV ).
The objectives of the project can be found here.