Image Anal Stereol 2016;35:137-148 doi: 105566/ias.1446 Original Article AN ENSEMBLE TEMPLATE MATCHING AND CONTENT-BASED IMAGE RETRIEVAL SCHEME TOWARDS EARLY STAGE DETECTION OF MELANOMA SPIROS KOSTOPOULOS  ,1 , DIMITRIS GLOTSOS 1 , PANTELIS ASVESTAS 1 , CHRISTOS KONSTANDINOU 3 , GEORGE XENOGIANNOPOULOS 1 , KONSTANTINOS SIDIROPOULOS 2 , EIRINI- KONSTANTINA NIKOLATOU 1 , KONSTANTINOS PERAKIS 4 , SPYROS MANTZOURATOS 4 , THEOPHILOS SAKKIS 5 , GEORGE SAKELLAROPOULOS 3 , GEORGE NIKIFORIDIS 3 AND DIONISIS CAVOURAS 1 1 Medical Image and Signal Processing Laboratory, Department of Biomedical Engineering, Technological Educational Institute of Athens, Greece; 2 European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Welcome Trust Genome Campus, Hinxton, Cambridge, UK; 3 Department of Medical Physics, University of Patras, 26504, Rio, Patras, Greece; 4 UBITECH Research Department, UBITECH Ltd., Athens, Greece; 5 Dermatology Center, Aegion, Greece e-mail: skostopoulos@teiath.gr (Received November 27, 2015; revised March 29, 2016; revised June 15, 2016; accepted June 22, 2016) ABSTRACT Malignant melanoma represents the most dangerous type of skin cancer. In this study we present an ensemble classification scheme, employing the mutual information, the cross-correlation and the clustering based on proximity of image features methods, for early stage assessment of melanomas on plain photography images. The proposed scheme performs two main operations. First, it retrieves the most similar, to the unknown case, image samples from an available image database with verified benign moles and malignant melanoma cases. Second, it provides an automated estimation regarding the nature of the unknown image sample based on the majority of the most similar images retrieved from the available database. Clinical material comprised 75 melanoma and 75 benign plain photography images collected from publicly available dermatological atlases. Results showed that the ensemble scheme outperformed all other methods tested in terms of accuracy with 94.9 ± 1.5%, following an external cross-validation evaluation methodology. The proposed scheme may benefit patients by providing a second opinion consultation during the self-skin examination process and the physician by providing a second opinion estimation regarding the nature of suspicious moles that may assist towards decision making especially for ambiguous cases, safeguarding, in this way from potential diagnostic misinterpretations. Keywords: content-based image retrieval, decision support system, melanoma diagnosis, self-skin examination, template matching INTRODUCTION Malignant melanoma represents the most dangerous type of skin cancer with annual incidences of 48.000 new cases worldwide according to the World Health Organization (Lucas, et al., 2006). Increased ultra- violet (UV) radiation has proved to be the most important risk factor of the disease (Rastrelli, et al., 2014). A relative large number of inherited and non- inherited gene mutations have been implicated in the pathogenesis of melanoma. But besides UV radiation, the aetiology of the disease is largely unknown making it difficult to establish preventing strategies and effec- tive therapies. Melanomas have good prognosis when they are detected at early stages, since available treat- ments, such as surgical excision, will mostly retain affected patients disease-free for more than 5-years (Veronesi, et al., 1991, Ringborg, et al., 1996, Cohn- Cedermark, et al., 2000, Balch, et al., 2001). One of the most popular technologies that have proven to be effective in discriminating melanomas from normal moles and other skin lesions (>90% detection accuracy (Schein, et al., 2009)), comprise digital dermoscopy, which allows expert physicians to visually observe suspected lesions using polarized or non-polarized light (Tenenhaus, et al., 2010). On the other hand, routine eye examination has proven to be significantly less effective with detection rates approximately 65% 137 KOSTOPOULOS S ET AL: Ensemble template matching for melanoma detection (Schein, et al., 2009). Thus, dermoscopy may be considered as the basic instrumentation that is utilized for melanoma detection in daily practice. However, dermoscopy presents certain limitations. The quality and accuracy of diagnostic conclusions greatly depend on the experience of the observing physician. Consi- dering that early stage melanomas present very subtle visual changes as compared to benign moles, the identi- fication of malignancy evidence (Abbasi, et al., 2004) is not straightforward. Thus, the risk of exonerating sus- picious moles is accountable, endangering inappropriate patient management with debatable effects in patient prognosis (Lorentzen, et al., 2001, Pfahlberg, et al., 2008, Veierod, et al., 2009). Although dermoscopy may contribute towards the early detection of melanomas, it has been shown that many patients refer to the physician only when the malignancy has progressed and the visual signs are obvious, since they do not have the sensitivity of visually discriminating the disease at its early phases, when the visual signs are more subtle (Carli, et al., 2002). At later stages, the detection of melanomas with dermoscopy becomes more straightforward, however, the risk for a poor prognosis increases, since late phase melanomas tend to metastasize aggressively (Rastrelli, et al., 2014). Thus, it is of paramount importance to alert patients towards a visit to the physician as soon as possible. One promising strategy towards this direction is the self-skin examination (Carli, et al., 2002). Self- skin examination has been shown to improve long term survival of patients with melanoma, lowering the risk of death after 10 years of initial diagnosis by 25% (Leachman, et al., 2016, Paddock, et al., 2016). The patient assesses visually new and/or existing moles and refers to the physician when a suspicious pigmented mole is detected. However, self-assessment of one’s skin moles may be difficult, rendering the self-skin examination an inadequate strategy for wide-spread melanoma screening. The significant value of self-skin examination has driven research towards the develop- ment of new technologies that may offer pati-ents and physicians means for more effective, frequent and distant inspection of suspicious moles. Computer-based automated tools have been previously proposed, which can be used as a second opinion tool for self-skin examination and advice patients regarding the urgency for a physician visit. Moreover, such systems have been used to address another important liability in the early stage detection of melanoma, which is the risk of diagnostic misinterpretations (Stringa, 1988, Field, 1994, Grant-Kels, et al., 1999, Ming, 2000, Zagrouba, et al., 2004, Zhang, et al., 2010, Abikhair, et al., 2014). Handheld devices, such as smartphones and tablets, are becoming increasingly popular. More than 1.75 billion such users have been predicted for 2015, making these devices ideal candidates for accessing moles through smartphone applications that may be used to facilitate self-skin examination and distance monitoring of patients by the expert physicians. A number of applications are nowadays commercially available for melanoma detection on the basis of smart- phone-camera generated plain photography images (Robson, et al., 2012, Stoecker, et al., 2013, Wolf, et al., 2013, Vañó-Galván, et al., 2015). A recent comprehensive review lists 39 such applications (Kassianos, et al., 2015). However, most of these applications tend to conceal their algorithmic architecture due to reasons such as patenting. More- over, scientific analysis is usually either lacking or limited, making experts sceptical regarding the effecti- ves of these technologies as self-skin examination facili- tators for patients or second opinion consultants for experts. In this study, we present a decision support system for melanoma detection, which attempts to guide patients to meaningful alerts regarding the urgency for a physician visit and safeguard physicians’ decisions from diagnostic misinterpretations by means of second opinion consultations. In comparison to previous studies, the proposed system differs in the following: a/ the decision support system technology relies on the combination of three different template matching and content-based image retrieval algorithms, namely the mutual information, the cross-correlation and the clustering based on image features proximity approach, which are merged in a majority vote ensemble scheme. In this way, it is possible to investigate the image content properties from different and complementary perspectives and combine information involving the image’s entropy, the image’s cross-correlation and the specific morphological, textural, and color characte- ristics of each investigated mole. To the best of our knowledge such an ensemble scheme is for the first time investigated. b/ The proposed system has been tested on plain photography images collected from different dermatological atlases. In this way, it was possible to investigate the effectiveness of the ensemble scheme in the identification of melanoma in images that have been generated under different conditions and equipment (i.e., different cameras, analyses, angles, lighting etc.). c/ The proposed system has been comprehensively evaluated using an external cross- validation process in order to approximate the perfor- mance of the system to unknown data. 138 Image Anal Stereol 2016;35:137-148 139 MATERIAL AND METHODS CASE MATERIAL The dataset consisted of 75 melanoma and 75 benign moles plain photography images, each corresponding to a different case, collected from publicly available resources/databases, such as six (6) from the Loyola University Dermatology Medical Education Website 1 , thirteen (13) from the Danderm Atlas of Clinical Der- matology 2 , three (3) from the Hellenic dermatological atlas 3 , three (3) from the atlasdermatologico.com.br website 4 and fifty (50) melanomas and seventy (75) from the DERMOFIT database 5 (Ballerini, et al., 2013). IMAGE PREPARATION & PREPROCESSING Each image was preprocessed using the DullRazor algorithm (Lee, et al., 1997) that was utilized in order to eliminate hair pixels overlapping the mole region. The algorithm operates in three main stages. At the first stage the location of the pixels belonging to hair regions is identified using morphological filtering. At the second stage the pixel values of the hair regions is re-calculated by means of interpolation with nearest regions. Finally, at the third stage a smoothing filtering operation is applied to level the intensity around inter- polated regions. Following the DullRazor algorithm, images were filtered using the mean shift algorithm (Fukunaga, 1975), which is very effective in flattening the image’s texture. In this way, it was possible to obtain a preli- minary separation of the mole region from the surroun- ding background and prepare the image for the subse- quent step of image segmentation. The illumination of the image was, then, corrected using a polynomial fitting algorithm, whose terms were estimated using a least square approach (Gonzalez, et al., 2002). Finally, the image was thresholded using the minimum cross entropy thresholding method (Li, et al., 1993) in order to separate pixels of the mole region from pixels of the surrounding background regions. An example of the image pre-processing and segmentation stage is illustrated at Fig. 1. ENSEMBLE TEMPLATE MATCHING AND CONTENT-BASED IMAGE RETRIEVAL SCHEME The main task of the proposed ensemble template matching and content-based image retrieval scheme is twofold: a/ to retrieve the n most similar, to the unknown case, image samples from the available veri- fied image database and b/ to provide an automated consultation regarding the nature of the unknown sample (benign case or malignant melanoma). The ensemble scheme was designed based on three well documented methods, the mutual information (MI) (Mazurowski, et al., 2011), the cross-correlation (COR) (Asgarizadeh, et al., 2012), and the content based image retrieval based on clustering of image features (FC) (Yan, et al., 2011) in order to investigate image content from three different and complementary perspectives, involving the image’s entropy, the image’s cross- correlation and the specific morphological, textural, and color characteristics of each investigated mole. (a) (b) (c) (d) (e) (f) Fig. 1. Mole segmentation process (a) original RGB image, (b) filtered image, (c) gray scale transformation, (d) illumination correction, (e) thresholded image, (g) superimposition of the mole’s border on the original image. 1 http://www.meddean.luc.edu/lumen/MedEd/medicine/dermatology/melton/content1.htm 2 http://www.danderm-pdv.is.kkh.dk/atlas/index.html?PHPSESSID=b3fad1be23c2edb54e85b29dc7c6ba2e 3 http://www.hellenicdermatlas.com/en/ 4 http://www.atlasdermatologico.com.br/ 5 http://homepages.inf.ed.ac.uk/rbf/DERMOFIT/ KOSTOPOULOS S ET AL: Ensemble template matching for melanoma detection The determination of the most similar images using the mutual information and the cross-correlation criteria relied on testing each inputted image against all other images in the available database. The determination of the most similar images using the content based image retrieval method relied on testing each feature subset extracted from the segmented mole of the inputted image against all other feature subsets that are computed from the segmented mole of the images in the available database. An example of the content- based image retrieval process is illustrated at Fig. 2. Fig. 2. Image retrieval methods utilized in this study. A. Mutual information (MI) Mutual Information is a method originating from information theory. It has been employed in nume- rous content-based image retrieval and template matching applications. Mutual information is related to the joint entropy of two images. Mutual informa- tion between an unknown image I u , and an image from an available database I j , may be calculated as (Russakoff, et al., 2004):        j u j u j u I I H I H I H I I MI , ,    (1) with       j u j u j u I I j u I I j u I I j u I I p I I p I I , , log , , and       x x x x p x p X H log where j = 1:N, and N is the number of available images in the database. The input to the mutual information algorithm comprised single-mole images (see Fig. 2). Mutual information was then calculated for all possible pairs that included the unknown image sample and one of the available database’s image samples. The most similar images were considered those having the largest mutual information with the unknown image sample. B. Cross-correlation (COR) The cross-correlation between two images I u , and I j , is a measure of similarity and may be defined as (Gaidhane, et al., 2012):        j u I I n y u j u u m m x n m I y x I I y x I    1 2 1 2 , ,           , (2)     j u I I n y u u j m m x n m I I y x I    1 2 1 2 , ,          , where j u I I , are the mean grey-level values of the images and j u I I   are the standard deviations of the grey-level values of the images. The input to the cross-correlation algorithm comprised single-mole images (see Fig. 2). The μ parameter was then calculated for all possible image pairs that included the unknown image sample and one of the verified database’s image samples. The most similar images were considered as those having the largest μ with the unknown image sample. C. Content based image retrieval (FC) Another perspective for investigating image similarity focuses on image content (features). The content-based image retrieval algorithm utilized in this study is the fuzzy c-means clustering algorithm (Jain, et al., 1988), that was designed to partition image features into two clusters/classes: those characte- rizing benign cases, and those describing melanoma cases. Image features comprised 72 measurements related to the mole’s morphology (10), grey-level histogram (4), texture (38), and colour (20) (Loukas, et al., 2013, Ninos, et al., 2013). The fuzzy c-means algorithm operates following an iterative procedure during which each image feature set (representing a unique image sample from the available database) gets a fuzzy allocation to a cluster according to distance metric criteria. The algorithm iteratively facilitates for minimization of the objective function (Eq. 3)      N I C j j i w ij c x m C M J 11 2 , , (3) to provide a solution for the membership function matrix M and cluster centre matrix C, where is the degree of membership of x i feature-vector in the cluster j, w ij m j i c x  is the Euclidean distance between j- th cluster centre and i-th feature-vector, and     , 1 w 140 Image Anal Stereol 2016;35:137-148 is the fuzzy exponent, which determines the degree of fuzziness. The algorithm converges when     k ij k ij m m 1 , where  1 , 0   is a termination criterion and k is the iteration steps. Then an un- known feature-vector (from a “new” image sample) is assigned to the cluster with the minimum distance from its centroid. The features were normalized to zero mean and unit standard deviation (Theodoridis, et al., 2003). In order to avoid overfitting a feature selection metho- dology was followed by ranking features in descen- ding order using a class separability criterion that was based on the Wilcoxon test and the correlation between features (Theodoridis, et al., 2003). Following, only a part (one-third of the smallest class in the database, twelve features for our samples) of the ranked features were selected for further analysis by the FC algorithm. ENSEMBLE SCHEME The different and complementary information that was assessed using the above mentioned three methods was combined through a majority vote rule in order to: a. provide the n most similar images that the majo- rity of these three algorithms decides and b. classify the unknown image case as ‘benign’, if the majority of the n similar images emerges from the ‘benign mole category’, or as ‘melanoma’, if the majority of the n similar images emerges from the ‘malignant melanoma mole category’. The majority vote rule is given by (Kittler, et al., 1998): , (4)      R i i c c X d X D 1 , where c is the class (benign/melanoma), X is the unknown image sample, i = 1,2,3 is the odd number of methods involved in the majority vote scheme, d c,j is the binary decision value (0,1), 0 corresponds to melanoma and 1 to benign classes. Thus, if D 1 (X) > D 0 (X), the unknown image-mole is categorized as benign, otherwise is categorized as melanoma. PERFORMANCE EVALUATION A. Performance of MI and COR methods in identi- fying single-mole images against their-self follo- wing rotation at different angles Each image from the available database was rotated at eight (8) different angles, from -20 o to +20 o with a 5 o step. Subsequently, each rotated image was inputted to the MI and COR algorithms, which were asked to return the most similar image from the same database (including the original, un-rotated version of the inputted image). If the algorithms returned as the most similar image the un-rotated version of the inputted image, then a successful retrieval was considered, otherwise as unsuccessful. In this way, it was possible to determine the robustness of the MI and COR algorithms when images are slightly rotated. The FC algorithm is rotation invariant since it depends only on features extracted from segmented moles. The features that were included in this study are rotational invariant. B. Performance of the proposed ensemble scheme using a leave-one-out data splitting approach In order to evaluate the performance of the pro- posed ensemble scheme, the following methodology was utilized: each image sample from the available verified database (benign or malignant) was tested against the remaining database (alike to the leave one out method (Theodoridis, et al., 2003)). Then, the n most similar images to the unknown sample were retrieved along with their corresponding labels (benign, malignant). If the majority of the n most similar images were benign cases, then the unknown image was classified as benign, whereas if the majority of the most similar images were melanoma cases, then the unknown image was classified as melanoma. Based on the above classification, a truth table was constructed in order to evaluate and compare the performance of each single algorithm tested (mutual information, cross-correlation, fuzzy c-means) against the ensemble majority vote scheme. Moreover, the above evaluation process was repeated by changing the number of n most similar images from 1 to 19. The evaluation of the performance was based on five different metrics (scores 1-5) that are derived from the truth table, namely the accuracy (score 1), the sensitivity (score 2), the specificity (score 3), the diagnostic accuracy (score 4) and the Cohen-k (score 5) where:  FN FP TN TP TN TP Accuracy Score      1 , (5)  FN TP TP y Sensitivit Score   2 , (6)  FP TN TN y Specificit Score   3 , (7)  FN FP TP TP accuracy Diagnostic Score    4 , (8) 141 KOSTOPOULOS S ET AL: Ensemble template matching for melanoma detection 142 where TP is the number of true positive cases, TN is the number of true negative cases, FP is the number of false positive cases and FN is the number of false negative cases.          k i i i k i k i i i ii M M n M M M n k Choen Score 1 . . 2 11 . . 5 (9) where M is the confusion matrix, Μ .i is the sum of elements of i-th column of M and M i. is the sum of elements of i-th row of M. C. Performance of the proposed ensemble scheme using an external cross-validation data splitting approach Moreover, an external cross-validation (ECV) spit- ting of the data was also performed, in order to get less biased estimates than the leave-on-out splitting. Data were randomly split into two subsets, each com- prising 50% of all available images. Image samples from the first subset (testing data) were considered as unknown. Then, the algorithm was asked to retrieve the n most similar, to the testing cases, images by searching only the second dataset (template data), which was considered as having known labels. This process was repeated ten (10) times and the final estimate of the evaluation performance was computed as the average of all classification performances obtained for each dif-ferent repetition. The above analysis was perfor-med separately for each different n number of similar images. In this way, we considered that a less biased estimate might be obtained than using the leave-one-out method (Ambroise, et al., 2002). RESULTS Regarding the performance of the MI and COR algo- rithms in identifying single-mole images that have been rotated at different angles, results are summa- rized in Fig. 3, which illustrates a good performance with 84.2%-98.5% detection accuracy. Regarding the performance of the proposed content- based image retrieval classification scheme for the leave-one-out data splitting, results are summarized in Fig. 4 for each single algorithm (mutual information, cross-correlation and fuzzy c-means) and the ensemble scheme for different number of n similar images (n = 1:19) and for each of the five different perfor- mance evaluation metrics described in the previous paragraphs (scores 1-5). For a small number of similar images (up to 3) the fuzzy c-means outperformed the mutual information and cross-correlation algorithms for all metrics. The cross-correlation method became the most effective algorithm for more than 3 similar images. The mutual information algorithm presented the best specificity, independently of the number of similar images investigated. The ensemble scheme proved the most accurate, outperforming each single algorithm tested for all metrics. Regarding the performance of the proposed content- based image retrieval classification scheme for the external cross-validation data splitting, the ensemble scheme resulted in optimal performances for smaller numbers of similar images (see Fig. 5 and Table 1). The increase in the prediction accuracy with the majority vote scheme may be justified by the fact that the proposed methods (MI, FC and COR) combined complementary information. Moreover, and for comparison reasons, the SVM algorithm (El-Naqa, et al., 2004) was also tested, as an alternative to our method, and led to 78 ± 5% overall accuracy (with various kernels) using the ECV method. Fig. 3. The dependence of the MI and COR in detecting a single-mole image that has been rotated at different angles (MI: mutual information, COR: cross-correlation). Image Anal Stereol 2016;35:137-148 Fig. 4. Each plot corresponds to the score of each metric (accuracy, sensitivity, specificity, diagnostic accuracy and Cohen-k) for the three methods (MI: mutual information, FC: features clustering, COR: cross-correlation) and majority vote rule (MV), when the Leave One Out method was implemented. 143 KOSTOPOULOS S ET AL: Ensemble template matching for melanoma detection Fig. 5. Each plot corresponds to the mean score of each metric (accuracy, sensitivity, specificity, diagnostic accuracy and Cohen-k) for the three methods (MI: mutual information, FC: features clustering, COR: cross- correlation) and majority vote rule (MV), when the external cross-validation method was employed for ten repetitions. 144 Image Anal Stereol 2016;35:137-148 145 Table 1. Best average performances of MI, FC, COR and MV regarding the five metrics for the 10 generated datasets, and the corresponding number of similar images required. MI MnV ± Std FC MnV ± Std COR MnV ± Std MV MnV ± Std Accuracy 89.2 ± 3.8 80.1 ± 4.0 78.0 ± 3.2 94.9 ± 1.5 # of similar images required 9 5 3 5 Sensitivity 95.7 ± 2.6 80.5 ± 8.8 60.8 ± 5.4 93.5 ± 3.4 # of similar images required 5 19 1 5 Specificity 93.8 ± 9.5 84.6 ± 8.7 100.0 ± 0.0 99.5 ± 1.7 # of similar images required 17 1 13 15 Diagnostic Accuracy 0.81 ± 0.05 0.66 ± 0.05 0.58 ± 0.06 0.90 ± 0.03 # of similar images required 9 5 1 5 Cohen k 0.78 ± 0.08 0.60 ± 0.08 0.56 ± 0.04 0.90 ± 0.03 # of similar images required 9 5 3 5 DISCUSSION In this study, an ensemble template matching and content-based image retrieval scheme were designed for assisting physicians towards early detection of melanomas and alerting patients towards the urgency for a physician visit. The proposed system may assist the expert physician by a/ providing the most similar, to the examined case, images from a known database of skin mole and melanoma images and b/ providing an automated second opinion consultation regarding the nature of the examined skin lesion. Moreover, the proposed scheme can be of assis- tance to the patient by providing consultations regarding solely the necessity for evaluation of the examined skin mole by an expert physician. The ensemble scheme was constructed using three complementary approaches, the MI, COR, and the FC. These algorithms sought for similarities from a different point of view, involving the image’s entropy, the image’s cross-correlation and the specific mor- phological, textural, and color characteristics of each investigated mole, which were in total 72 features. Entropy was used to investigate the organization of the textural information in the image. Cross-corre- lation was used to investigate the texture correlation between different image patterns. Melanomas have been found to exhibit elaborated textural patterns, which can be encoded by means of the spatial distribution of the various colors and intensities of the mole pixels, thus, the MI and COR algorithms may be used to capture these diagnostic meaningful differences. More- over, with the FC algorithm it was possible to inves- tigate the morphology, texture and colour properties of the examined moles and relate these properties with patterns appearing in melanoma cases, providing, in this way, a complementary perspective of the examined image mole signatures. Although these three algorithms, when operating in a standalone mode, provided average performances in the five different metrics tested (i.e., accuracy MI 91.3%, COR 79.3%, FC 85.3%), when combined under the majority vote scheme the performances were boosted up (i.e., accuracy MV 96.0%) when tested using the leave-one-out method. The increased performance might be explained by the complemen- tarity of the nature of the information that each distinct algorithm offered to the ensemble scheme. Regarding the external cross-validation data splitting, the MV method outperformed all other methods in terms of accuracy with 94.9 ± 1.5% (MI 89.2 ± 3.8%, COR 78.0 ± 3.2% and FC 78.0 ± 3.2%). Considering the fact that the database utilized in this study comprised extraction from multiple dermato- logical atlases that contain publicly available images, the high performance of the proposed scheme may justify its effectiveness to detect melanoma signatures on plain photography images, despite the digitization equipment, the angle of photography, the lighting conditions etc., under the premise that the photographs have sufficient diagnostic quality. A lot of research efforts have been previously presented for melanoma detection based on dermo- scopy images or normal digital camera images. Two KOSTOPOULOS S ET AL: Ensemble template matching for melanoma detection main categories of studies may be identified. The first category consists of efforts focusing on statistical pattern recognition, whereas the second category comprises efforts focusing on template matching and content-based image retrieval. Regarding the first category, representative studies may be found in (Cavalcanti, et al., 2013), which proposed a k-nearest neighbor (k-NN) classifier using 52 features extracted based on the ABCD rule with 99.3% overall accuracy, in Jaleel et al. (2012), which proposed an artificial neural network (ANN) classifier with 100% prediction accuracy and in Ruiz et al. (2011), which proposed an ensemble pattern recognition scheme combining three distinct classifiers, the k-NN, the Bayesian and the ANN, with accuracy 87.76%. Regarding the second category, representative studies can be found in Ballerini et al. (2010; 2013), which proposed a content-based image retrieval system investigating textural and color features, in Maragoudakis and Maglogiannis (2011), which proposed an ontology structure model based on features extracted from skin lesion images based on agglomerative clustering and distance criteria and in Chen et al. (2016), which is a recent study proposing a content-based image retrieval system that identified melanomas on plain photography images with performances exceeding 90% for all metrics tested. In terms of classification effectiveness, a direct comparison of the proposed ensemble scheme with previous studies is difficult to be performed due to differences in the data sets and differences in evalua- tion algorithms utilized. Many previous studies have presented very high prediction rates, such as 100% in Ruiz et al. (2011); however, such prediction rates were obtaining by testing the constructed classification models using internal evaluation approaches, that have been shown to give optimistically biased estimates (Ambroise, et al., 2002). These estimates may be indi- cative of the model’s performances on the training data; however, these estimates are far from being representative of the effectiveness of the model to new, unseen data. In this study, we have attempted to approximate the performance of the proposed model in new, unseen data by using an external cross- validation approach, which enabled us to approximate the generalization prediction rate of the proposed scheme (94.9 ± 1.5%). If one wanted to select a single optimum number of similar images, we would have to optimize our system based on one of the five performance evaluation criteria that we have utilized (i.e., accuracy, sensitivity, specificity, and diagnostic accuracy or Cohen k). Using the accuracy as the performance evaluation criterion, the external cross- validation method indicated that the optimum number of similar images is 5, with 94.9% performance with the Majority Vote scheme. Moreover, another signi- ficant difference of the proposed study against previous studies is that the proposed ensemble scheme tested images originating from different dermatological atlases with great generalization potential for all criteria tested. Finally, another difference of the proposed study against the previous studies is that the template matching and content-based image retrieval scheme is used not only to retrieve the most similar, to the examined case, images, but also to characterize the nature of the unknown case using a combination of three different algorithms, which, to the best of our knowledge, is for the first time investigated. In terms of clinical effectiveness, the proposed scheme offers the possibility to both patients and physicians to exploit consultations that will guide them towards more accurate decisions. The patient may use the proposed scheme as a second opinion consultation during the self-skin examination process by photographing with a standard consumer smartphone or other type of digital camera and requesting from the proposed scheme to assess the urgency for a potential visit. In order to render our database less dependent upon the smartphone camera technology, we used the following approaches: a/ although the size of the mole has a significant importance in diagnosing melanoma, this feature was not used since the size of the mole not only depends on the magnification of the photograph, but also depends on the distance of the camera from the mole. Thus, our database does not rely on either the magnification or the distance of the camera from the mole, b/ we use the mean shift algorithm (Fukunaga, 1975), which is very effective in flattening the image’s texture, thus, we can correct for different levels of illumination, c/ we use mainly features of texture in our algorithms. These features are less depended on the technology of the smartphone camera and the viewing angle, than features of size and shape, d/ we use the DullRazor algorithm (Lee, et al., 1997) to eliminate hair pixels and smooth the image, reducing overall noise levels and facilitating the subsequent step of segmentation. When the proposed scheme identifies that the most similar images are retrieved from the melanoma category, then the consultation will be towards an urgent physician visit. In this way, the probability for early stage detection of melanoma will potentially increase, since the patient may visit the expert phy- sician soon enough. On the other hand, the physician 146 Image Anal Stereol 2016;35:137-148 may also benefit by the proposed scheme by means of: a. second opinion consultations regarding the nature of the examined moles, b. retrieval of the most similar images from a verified melanoma cases data source and c. distance monitoring of patients. In this way, potential diagnostic misinterpretations might be reduced and the overall patient management might be improved. This study is part of the MARK1 project. The MARK1 application may capture an image, assign the image to a special dermatologist and give the der- matologist a series of image processing and decision support services in order to conclude regarding the administration of the case. More information may be found at: http://mark1-project.eu/. ACKNOWLEDGMENTS Research activities of this work have been carried out within the context of the project “MARK1- A decision support system for the early detection of malignant melanoma” with ref. Number ISR_3233, under the bilateral cooperation between Greece & Israel action 2013-2015 that has been co-funded by the European union and the General Secretariat for Research & Technology, Ministry of Education, Re- search and Religious Affairs of the Hellenic Republic. REFERENCES Abbasi NR, Shaw HM, Rigel DS, Friedman RJ, McCarthy WH, Osman I, et al. (2004). Early diagnosis of cutaneous melanoma: Revisiting the ABCD criteria. JAMA 292:2771–6. Abikhair MR, Mahar PD, Cachia AR, Kelly JW (2014). Liability in the context of misdiagnosis of melanoma in australia. MED J Australia 200:119–21. Ambroise C, McLachlan G J (2002). Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS 99:6562–6. Asgarizadeh M, Pourghassem H, Shahgholian G, Robust object tracking using regional mutual information and normalized cross correlation, (2012), Proceedings - 4th International Conference on Computational Intelligence and Communication Networks, CICN 2012, Mathura, Uttar Pradesh, India, 411–5. Balch CM, et al. (2001). Long-term results of a prospective surgical trial comparing 2 cm vs. 4 cm excision mar- gins for 740 patients with 1-4 mm melanomas. Ann Surg Oncol 8:101–8. Ballerini L, Fisher R B, Aldridge B, Rees J (2013). A color and texture based hierarchical k-nn approach to the classification of non-melanoma skin lesions. In: Celebi EM, Schaefer G, eds. Color medical image analysis, Dordrecht: Springer Netherlands, 63–86. Ballerini L, Li X, Fisher RB, Rees J (2010). A query-by- example content-based image retrieval system of non- melanoma skin lesions. In: Caputo B, Müller H, Syeda- Mahmood T, Duncan JS, Wang F, Kalpathy-Cramer J, eds. Medical content-based retrieval for clinical decision support: First miccai international workshop, mcbr-cds 2009, london, uk, september 20, 2009, revised selected papers, Berlin, Heidelberg: Springer Berlin Heidelberg, 31–88. Carli P, De Giorgi V, Nardini P, Mannone F, Palli D, Giannotti B (2002). Melanoma detection rate and con- cordance between self-skin examination and clinical evaluation in patients attending a pigmented lesion clinic in italy. Br J Dermatol 146:261–6. Cavalcanti PG, Scharcanski J, Baranoski GVG (2013). A two-stage approach for discriminating melanocytic skin lesions using standard cameras. Expert Sys Appl 40:4054–64. Chen RH, Snorrason M, Enger SM, Mostafa E, Ko JM, Aoki V, Bowling J (2016). Validation of a skin-lesion image-matching algorithm based on computer vision technology. Telemed J E Health 22:45–50. Cohn-Cedermark G, et al. (2000). Long term results of a randomized study by the swedish melanoma study group on 2-cm versus 5-cm resection margins for patients with cutaneous melanoma with a tumor thickness of 0.8-2.0 mm. Cancer 89:1495–501. El-Naqa I, Yang Y, Galatsanos NP, Nishikawa RM, Wernick MN (2004). A similarity learning approach to content-based image retrieval: Application to digital mammography. IEEE T Med Imaging 23:1233–44. Field LM (1994). Clinical misdiagnosis of melanoma as well as squamous cell carcinoma masquerading as seborrheic keratosis. J Dermatol Surg Oncol 20:222. Fukunaga KLDH (1975). The estimation of the gradient of a density function, with applications in pattern recog- nition. IEEE T Inform Theory 21:32–40. Gaidhane VH, Hote YV, SinghV (2012). An efficient simi- larity measure technique for medical image registration. Sadhana - Academy Proceedings in Engineering Sciences 37:709–21. Gonzalez RC, Woods RE (2002). Digital image processing, NY: Addison-Wesley Pub, 518–28. Grant-Kels JM, Bason ET, Grin CM (1999). The misdiag- nosis of malignant melanoma. J Am Acad Dermatol 40:539–48. Jain AK, Dubes RC (1988). Algorithms for clustering data, Prentice-Hall Inc. Jaleel JA, Salim S, Aswin RB (2012). Artificial neural network based detection of skin cancer. IJAREEIE 1:200–05. Kassianos AP, Emery JD, Murchie P, Walter FM (2015). Smartphone applications for melanoma detection by 147 KOSTOPOULOS S ET AL: Ensemble template matching for melanoma detection 148 community, patient and generalist clinician users: A review. Br J Dermatol 172:1507–18. Kittler J, Hatef M, Duin RPW, Matas J (1998). On combining classifiers. IEEE T Pattern Anal 20:226-39. Leachman SA, et al. (2016). Methods of melanoma detec- tion. Cancer Treat Res 167:51–105. Lee T, Ng V, Gallagher R, Coldman A, McLean D (1997). Dullrazor: A software approach to hair removal from images. Comput Biol Med 27:533–43. Li CH, Lee CK (1993). Minimum cross entropy thresholding. Pattern Recogn 26:617–25. Lorentzen H F, Weismann K, Grønhøj Larsen F (2001). Structural asymmetry as a dermatoscopic indicator of malignant melanoma - a latent class analysis of sensi- tivity and classification errors. Melanoma Res 11:495– 501. Loukas C, Kostopoulos S, Tanoglidi A, Glotsos D, Sfikas C, Cavouras D (2013). Breast cancer characterization based on image classification of tissue sections visua- lized under low magnification. Comp Math Methods Med 2013:7 pages. Lucas R, McMichael T, Smith W, Armstrong B (2006). Solar ultraviolet radiation: Global burden of disease from solar ultraviolet radiation In: A. Prüss-Üstün, H. Zeeb, C. Mathers, M. Repacholi, eds. Environmental burden of disease series 13, Geneva: World Health Organization. Maragoudakis M, Maglogiannis I (2011). A medical ontology for intelligent web-based skin lesions image retrieval. Health Informatics J 17:140–57. Mazurowski MA, Lo JY, Harrawood BP, Tourassi GD (2011). Mutual information-based template matching scheme for detection of breast masses: From mammo- graphy to digital breast tomosynthesis. J Biomed Inform 44:815-23. Ming ME (2000). The histopathologic misdiagnosis of me- lanoma: Sources and consequences of "false positives" and "false negatives". J Am Acad Dermatol 43:704–6. Ninos K, et al. (2013). Computer-based image analysis system designed to differentiate between low-grade and high-grade laryngeal cancer cases. Anal Quant Cytol 35:261–72. Paddock LE, Lu SE, Bandera EV, Rhoads GG, Fine J, Paine S, et al. (2016). Skin self-examination and long- term melanoma survival. Mela-noma Res: in press. Pfahlberg AB, Gefeller O (2008). Errors in assessing risk factors for melanoma: Lack of reproducibility is the minor problem. Melanoma Res 18:300–1. Rastrelli M, Tropea S, Rossi CR, Alaibac M (2014). Melanoma: Epidemiology, risk factors, pathogenesis, diagnosis and classification. In Vivo 28:1005-11. Ringborg U, et al. (1996). Resection margins of 2 versus 5 cm for cutaneous malignant melanoma with a tumor thickness of 0.8 to 2.0 mm: Randomized study by the swedish melanoma study group. Cancer 77:1809–14. Robson Y, Blackford S, Roberts D (2012). Caution in melanoma risk analysis with smartphone application technology. Br J Dermatol 167:703–4. Ruiz D, Berenguer V, Soriano A, Sanchez B (2011). A decision support system for the diagnosis of melano- ma: A comparative approach. Expert Syst Appl 38: 15217–23. Russakoff DB, Tomasi C, Rohlfing T, Maurer Jr CR (2004). Image similarity using mutual information of regions. 3023:596–607. Schein O, Westreich M, Shalom A (2009). Effect of dermoscopy on diagnostic accuracy of pigmented skin lesions emphasizing malignant melanoma. Harefuah 148:820–3. Stoecker WV, Rader RK, Halpern A (2013). Diagnostic inaccuracy of smartphone applications for melanoma detection: Representative lesion sets and the role for adjunctive technologies. JAMA Dermatol 149:884. Stringa M (1988). Misdiagnosis of choroidal melanoma. Panminerva Med 30:89–92. Tenenhaus A, Nkengne A, Horn JF, Serruys C, Giron A, Fertil B (2010). Detection of melanoma from dermo- scopic images of naevi acquired under uncontrolled conditions. Skin Res Technol 16:85–97. Theodoridis S, Koutroumbas K (2003). Pattern recognition, San Diego:Elsevier. Vañó-Galván S, Paoli J, Ríos-Buceta L, Jaén P (2015). Skin self-examination using smartphone photography to improve the early diagnosis of melanoma. Actas Dermosifiliogr 106:75–7. Veierod MB, Parr CL, Lund E, Hjartaker A (2009). Res- ponse: Errors in assessing risk factors for melanoma. Melanoma Res 19:61. Veronesi U, Cascinelli N (1991). Narrow excision (1-cm margin). A safe procedure for thin cutaneous melanoma. Arch Surg 126:438–41. Wolf JA, Moreau JF, Akilov O, Patton T, English JC, 3rd, Ho J, Ferris LK (2013). Diagnostic inaccuracy of smart- phone applications for melanoma detection. JAMA Dermatol 149:422–6. Yan Y, Huang X, Zheng Y, Xu W (2011). An efficient template matching between rotated mono- or multi- sensor images, MIPPR 2011: Parallel Processing of Images and Optimization and Medical Imaging Processing, Guilin, China, 80050M-80050M-9. Zagrouba E, Barhoumi W (2004). A prelimary approach for the automated recognition of malignant melanoma. Image Anal Stereol 23:121–35. Zhang S, Gao F, Wan D (2010). Effect of misdiagnosis on the prognosis of anorectal malignant melanoma. J Cancer Res Clin Oncol 136:1401–5.