MapReduce-based Image Processing System with Automated Parallelization
Abstract
The article describes a parallel image processing framework based on the Apache
Hadoop and the MapReduce programming model. The advantage of the framework is an
isolation of the details of the parallel execution from the application software developer by
providing simple API to work with the image, which is loaded into memory.
The main results of the work are the architecture of the Hadoop-based parallel image
processing framework and the prototype implementation of this architecture. The prototype
has been used to process the data from the Particle image velocimetry system (the data from
the PIV challenge project have been used). Evaluation of the prototype on the four-node
Hadoop cluster demonstrates near linear scalability.
The results can be used in science (processing images from the physics experimental
facilities, astronomical observations, and satellite pictures of a terrestrial surface), in medical
research (processing images from hi-tech medical equipment), and in enterprises (analysis
of data from security cameras, geographic information systems, etc.).
The suggested approach provides the ability to increase the performance of image processing
by using parallel computing systems, and helps to improve the work efficiency of the
application developers by allowing them to concentrate on the image processing algorithms
instead of the details of parallel implementation.