ABSTRACT

Implementing pattern recognition in a distributed manner may be a solution for the Internet-scale data generation and application problems. Distributed pattern recognition (DPR), the formal term for this type of recognition approach, can be defined as the extension of existing pattern recognition schemes to include the delegation of the recognition process across a distributed system. Most of the past initiatives in DPR have focused on providing a distributed architecture for pattern recognition [18, 19, 20, 21, 22]. However, this type of solution creates a high dependency on the hardware implementation. Because the implementation of these approaches across different architectural platforms and network environments is limited by their inflexibility, the issue of scalability in this context has yet to be solved. A DPR scheme that is based solely on an algorithmic approach, independent

of any hardware implementation, has yet to be fully realized. Though there are some recent studies on the implementation of a distributed approach for existing pattern recognition schemes [2, 23, 24, 25], these studies manipulated the methods of a particular algorithm to perform the recognition function (from sequential to parallel mechanisms). Furthermore, existing distributed approaches have been unable to reduce the computational complexity of their respective algorithms, a necessity for deployment in a distributed environment. In addition, these studies have not considered the communication costs incurred by the highly iterative features of the existing pattern recognition schemes. The deployment of pattern recognition applications for large-scale data sets

is an open issue that needs to be addressed. Several approaches have been proposed, including data reduction, active learning and distributed approaches in large-scale recognition. Nevertheless, a common denominator of these techniques is the algorithmic complexity of the recognition schemes. Because the distributed approach for pattern recognition can provide extensive support for resource availability in response to the increasing size, complexity and amount of data, it offers a significant advantage for large-scale data analysis. The ultimate goal for any DPR approach is to be able to extract useful information from a large-scale analysis of a huge collection of data. Because pattern recognition is considered to be highly problem specific and

has little prospect as a generic commodity application, DPR remains a rela-

tively unexplored area. The complexity of existing pattern recognition algorithms limits their distribution factor. Several initiatives have attempted to parallelize and distribute a pattern recognition algorithm across a distributed system. However, the parallelization process poses a significant hurdle for this type of implementation. The neural network approach is a promising tool for Internet-scale pattern

recognition. This method has the ability to perform parallel computations using interconnected neurons. However, there are several implementation issues, including convergence problems, complex iterative learning procedures, and the fact that the training data required for optimum recognition leads to low scalability. In this chapter, we will further discuss the important characteristics and

aspects of DPR.