ABSTRACT

In various problems, an enormous input space might be encountered that needs to be classified into two or more known classes. These problems are called classification problems and are very common in medical applications, where several inputs such as medical images, clinical data, or other medical inputs should be used for diagnosis, prognosis, or other purposes. As an example, in a typical medical application, classes can simply represent a positive or negative diagnosis of cancer or the possibility of existence of a tumor in a certain part of a produced medical image. To obtain a classifier, often, a set of observations needs to be provided (as a training set) for which the correct class is already known. Based on the provided training set, an AI method should be able to train itself so that the correct class for new observations can be determined. On the other hand, it is also possible that the training set data includes a set of observations without the correct class, and the algorithm is assumed to determine classes by classifying the nearest observations in one class; this is known as clustering. In clustering problems, the definition of each class (or cluster) is unknown initially and is assumed to be determined by the clustering algorithm. Clustering algorithms use a distance function to measure the similarity of data points with each other and then use this information to cluster them into a fixed number of clusters. The objective of these algorithms is to minimize the distances between observations in each cluster, while maximizing the distance of observations in one cluster with observations in another one.