Open-Source Software Lab
The Project proposed by IIAP and is based on the fact that modern technologies (programming models and distributed execution infrastructures) allowing efficient treatment of large datasets. One of the most important technologies is the map-reduce model and its implementations such as open source framework Hadoop, which allow to split a large dataset into a set of blocks which are distributed on several machines, then to execute map tasks in parallel on these blocks, and finally to execute reduce tasks for the aggregation of results. Studies showed that such a technology is much sensitive to the data representation format, and more precisely to the used compression format and compression factor. Data compression is a means to reduce data size and therefore transfer time (and IO load) between disks and memory, but at the same time compression increases treatment time (and CPU load) for decompressing data before its effective use.
In this context, the main issue is to find the best trade-off (regarding the compression format and factor) in order to balance the loads on the IO system and the computation system (CPU). High compression factor may underload IO but overload CPU, while a weak compression factor may underload CPU but overload IO. The ideal configuration is when both IO and CPU are used at 100 %. CPU (respectively IO) should not be waiting for IO (respectively CPU), therefore reaching the best performance.
The main expected result of this Project will be a system which demonstrates this principle, by selecting a compression tools and tuning the compression factor in order to reach the best performance.
Potential users of the Project result can be any researcher interested in complex scientific calculations and increasing computing power without slowing down the system as a whole.
These technologies are more based on the communication of large groups of researchers and the hallmark of the laboratory is the use of a high-speed communication channel and flexible BYOD policy implementation