Image Anal Stereol 2014;33:13-27 doi:10.5566/ias.v33.p13-27 Original Research Paper A COMPREHENSIVE FRAMEWORK FOR AUTOMATIC DETECTION OF PULMONARY NODULES IN LUNG CT IMAGES MEHDI ALILOUC,1, VASSILI KOVALEV2, EDUARD SNEZHKO2 AND VAHID TAIMOURI3 1Department of Computer Science, Khoy Branch, Islamic Azad University, Khoy, Iran; 2Department of Biomedical Image Analysis, United Institute of Informatics Problems, National Academy of Sciences, Minsk, Belarus; 3Department of Radiology, Children’s Hospital Boston, Harvard Medical School, Boston, MA, USA e-mail: me.alilou@gmail.com, vassili.kovalev@gmail.com, Eduard.snezhko@gmail.com, vahid.taimouri@childrens.harvard.edu (Received November 4, 2013; revised February 2, 2014; accepted February 25, 2014) ABSTRACT Solitary pulmonary nodules may indicate an early stage of lung cancer. Hence, the early detection of nodules is the most ef.cient way for saving the lives of patients. The aim of this paper is to present a comprehensive Computer Aided Diagnosis (CADx) framework for detection of the lung nodules in computed tomography images. The four major components of the developed framework are lung segmentation, identi.cation of candidate nodules, classi.cation and visualization. The process starts with segmentation of lung regions from the thorax. Then, inside the segmented lung regions, candidate nodules are identi.ed using an approach based on multiple thresholds followed by morphological opening and 3D region growing algorithm. Finally, a combination of a rule-based procedure and support vector machine classi.er (SVM) is utilized to classify the candidate nodules. The proposed CADx method was validated on CT images of 60 patients, containing the total of 211 nodules, selected from the publicly available Lung Image Database Consortium (LIDC) image dataset. Comparing to the other state of the art methods, the proposed framework demonstrated acceptable detection performance (Sensitivity: 0.80; Fp/Scan: 3.9). Furthermore, we visualize a range of anatomical structures including the 3D lung structure and the segmented nodules along with the Maximum Intensity Projection (MIP) volume rendering method that will enable the radiologists to accurately and easily estimate the distance between the lung structures and the nodules which are frequently dif.cult at best to recognize from CT images. Keywords: computed tomography (CT), computer-aided diagnosis (CADx), lung nodule detection, segmentation. INTRODUCTION Lung cancer is a serious public health problem all around the world. The mortality rate for lung cancer is higher than other kinds of cancers and it is considered as the leading cause of deaths among both men and women (Siegel et al., 2012). Early detection of lung cancer, which is typically manifested in the form of pulmonary nodules, is an ef.cient way of improving the survival rate, and has been attempted using X-ray computed tomography (CT) (Sone et al., 2001). However, CT scans generate a large number of images that must be read by the radiologists who may have to interpret up to 50 cases per day (Awai et al., 2004). Considering this large amount of exhausting work, diagnostic reading errors may be hard to avoid. Therefore, it is necessary to develop a computer-aided diagnosis (CADx) system to assist radiologists with CT scan interpretation. CADx systems can aid radiologists by providing a second-opinion and may be used in the .rst stage of examination in the near future (Awai et al., 2004). Various CADx methods for lung nodule detection have been proposed and some are developed and successfully used in clinical processes (Gurcan et al., 2002; Brown et al., 2005). Although much effort has been devoted to it, the development of CADx systems for lung nodule detection remains a dif.cult task. The existing potential nodule detection approaches can be roughly categorized into three main groups: intensity-based (Messay et al., 2010), model-based (Dehmeshki et al., 2007) and combination of geometric-and intensity-based detection methods (Ye et al., 2009). Intensity-based methods employ such techniques as multiple thresholding, clustering, and mathematical morphology, to identify nodules in the lung area. For model-based detection methods, techniques such as template-matching, object-based deformation, and the anatomy-based generic model have been proposed to separate spherical shaped nodules from elongated structures such as blood vessels. Finally, the third group of methods combines geometric and intensity models to enhance local anatomical structure such as spherical objects or vessels. A few examples of these methods are brie.y outlined below. As an early work in this .eld, Kanazawa et al. (1998) proposed a nodule detection CADx which segments lung regions by a fuzzy clustering algorithm then analyzes the features of the segmented regions using image-processing techniques and rule-based classi.cation. A template matching technique which is based on genetic algorithm was proposed in Lee et al. (2001) for detecting lung nodules in chest CT scans. This method was validated on 20 clinical cases of a private dataset and a rule-based classi.er was used to reduce the number of false positives (Fps). However, the amount of FPs in that study was rather high (30 Fp/case) with detection rate equal to 72%. A similar template-matching method which the authors called it shape-based genetic algorithm template-matching is proposed in Dehmeshki et al. (2007) for the detection of spherical shaped nodules. In that work a 3D geometric shape feature is calculated at each voxel and then combined into a global nodule intensity distribution. The detection rate was about 90%, with 14.6 Fp/scan which is rather high in comparison to more recent methods. Some studies utilize special .lters to enhance nodule like structures. For instance, Li et al. (2008) proposed three selective enhancement .lters for dots, lines, and planes, which can simultaneously enhance objects of a speci.c shape and suppress other objects. In that approach, the CT image was blurred with a Gaussian kernel that matched the size of the nodule to be detected before calculating the eigenvalues of the Hessian matrix which was used for selective enhancement. In another similar work, a cylindrical shape .lter as a fast enhancement method for lung nodules is proposed by Teramoto and Fujita (2013). The Fp rate in that work was reduced using a support vector machine (SVM) together with seven characteristic shape parameters. The increased interest in automatic lung nodule detection has resulted in the availability of public image databases for the evaluation and validation of algorithms. These include the Lung Image Database Consortium (LIDC) image database (Armato et al., 2004) and ELCAP Public Lung Image Database made available by Cornell University. However, recent studies mainly employed the images of the LIDC database more than ELCAP. Some works which used images of the LIDC database are as follows. A nodule detection scheme which used a 3D active contour method was proposed in Way et al. (2006). A multi-threshold surface triangulation approach was proposed in Golosio et al. (2009). A multiple-intensity thresholding method combined with morphological operations is proposed in Messay et al. (2010) where nodule candidates were distinguished by a rule-based classi.er. As a .nal example of methods which employed the LIDC database, authors in Tan et al. (2011) proposed a CADx method which identi.es nodules based on nodule and vessel enhancement .lters and a novel feature-selective classi.er based on genetic algorithms and arti.cial neural networks is then used for classi.cation of the identi.ed nodules. Although the mentioned schemes have a sensitivity about 65-80% with the number of false positives per case of less than 10, most of them require extensive computations. On the other hand, current CT machines have the capability of generating lung volume images within 30 second per scan. There is a large gap between image acquisition time and nodule detection time. Furthermore, in addition to detection performance, visualization utilities such as volume rendering and 3D to 2D projection speed up the detection process by the radiologists. However, among the above reviewed systems there are a few comprehensive CADs described in the literature which automate the whole process of nodule detection with an acceptable time ef.ciency and and nodule/lung visualization capabilities. In this paper, we propose a new comprehensive CADx framework for the detection of pulmonary nodules in thoracic CT images. The paper introduces a computationally ef.cient CADx system and provides a complete description of all processing steps of its architecture. The proposed framework automates the whole process of lung segmentation and detection of candidate nodules. Furthermore, it provides a suite of 2D and 3D visualization tools which facilitate the detection and validation task for radiologists. In order to evaluate the capability of our method, the CADx framework has been tested on CT images of 60 patients from the publicly available LIDC database and its detection and processing performance was compared to 6 existing well-known CADx methods. To the best of our knowledge, which derives from reviewing the majority of papers on nodule detection problem published in the last 15 years, the way we choose the optimized set of threshold levels and an ef.cient set of nodule candidate features has not been previously reported. Furthermore, despite the promising advances in the last 15 years, existing CADx solutions may produce a considerable amount of false positives and their sensitivity is usually below 90% (Chan et al., 2008). Therefore, visualization of the different steps of CADx methods seems to be important since it may help radiologists in interpretation of CT scans. Hence, our framework provides a suite of 2D, 3D, and projected 3D illustrations from different steps of the detection process which facilitate the detection and validation task for radiologists. 14 Image Anal Stereol 2014;33:13-27 MATERIALS AND METHODS The Lung CT images of the LIDC-IDRI database (McNitt-Gray et al., 2007) are used in the experiments. The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic CT scans with annotated lesions. It is a web-accessible international resource for development, training, and evaluation of CADx methods for lung cancer detection and diagnosis. Note that the database consisted of CT images taken from different scanners supplied by different vendors. In this study, we used a sample of 60 CT scans which have been taken by GE medical systems LightSpeed16 scanners. The number of slices per scan ranged from 102 to 272 and the total number of slices was 8573. Each slice has a size of 512 × 512 pixels with a 12-bit gray scale resolution in Houns.eld Units (HU). The dataset includes 222 annotated nodules between 3 and 32 millimeters. The distribution of the diameter sizes of the nodules is shown in Fig. 1. Fig. 1. Distribution of the diameters of the nodules in the dataset. As it is obvious from the histogram shown in Fig. 1, there are 11 nodules with diameter < 4 mm. Since the proposed framework is aimed to detect nodules with diameter . 4 mm, the number of target nodules is 211 out of 222. The distribution of nodules per case is illustrated in Fig. 2. Since there are no nodules in four cases of the dataset, the histogram in Fig. 2 presents the distribution of nodules in 56 cases that contain at least one nodule. Fig. 2. Distribution of the nodules per case. Fig. 3 illustrates maximum intensity projection rendering of different pulmonary nodules presented in the dataset. As it can be seen from Fig. 3 the dataset includes isolated, juxtavascular (vessel-connected) and juxtapleural (pleura-connected) nodules of various sizes. The developed framework includes the following major processing steps: segmentation of the lung regions from the surrounding anatomy, multiple gray level thresholding for extracting nodule candidates and vessels inside the lung region followed by morphological processing, 3D blob extraction, computing features of the nodule candidates, rule-based and SVM-based classi.cation of the 3D blobs and .nally, 2D/3D visualization of the analyzed scans. Fig. 4 shows the top level block diagram of the proposed framework. The processing steps are described further in the next sections. 3D LUNG REGIONS SEGMENTATION Segmentation of the lung regions is the .rst stage of the method’s processing pipeline. The goal of this step is to separate the voxels corresponding to lung tissue from the surrounding anatomy. The general scheme of the lung segmentation algorithm is similar to that described in Hu et al. (2001); Leader et al. (2003). Having the input CT image, to accomplish the lung segmentation task we generate and use four types of 3D masks. These are the initial lung mask (Mi), the body mask (Mb) , the secondary lung mask (Ms) and the .nal lung mask (Mf ). Fig. 5 illustrates examples of generated masks during the segmentation process. (a) (b) (c) (d) (e) (f) (g) (h) Fig. 3. Maximum intensity projection renderings of pulmonary nodules of different sizes (ordered by diameter from top left to bottom right). (a) pleura-connected 23.1mm, (b) vessel-connected 20.16mm, (c) isolated 13.4mm, (d) vessel-connected 9.2mm, (e) pleura-connected 8.3mm, (f) vessel-connected 6.9mm, (g) isolated 6.2mm, (h) isolated 5.7mm, (i) isolated 4.2mm. The procedure of the 3D lung segmentation is depicted in Fig. 6. As it is shown in Fig. 6, .rst of all, the optimal thresholding algorithm is applied to the input CT images to generate the initial lung mask. The optimal thresholding algorithm is an iterative procedure which was adapted to separate the body voxels (i.e., high density voxels of body and chest which have higher Houns.eld values) from non-body voxels (i.e., low density voxels of lung and surrounding air which have lower Houns.eld values). The optimal threshold is determined via an iterative procedure. Let Ti be the segmentation threshold at step i. Ti is applied to the input image to separate body-from non-body voxels. Let µb and µn be the mean gray-levels of the body and non-body voxels segmented with threshold Ti. The new threshold (Ti+1) is calculated via: µb + µn Ti+1 = , (1) 2 The iterative updating of the new threshold is repeated until Ti+1 = Ti. The Houns.eld value of the air is chosen as the value of initial threshold (T0 = -1000 HU). The initial lung mask (Mi) is generated by applying the optimal threshold to the input image. In the next step, the body mask (Mb) is generated. It is used to mask out the all voxels corresponding to the body including lung and chest. The body mask which is shown in Fig. 5b is obtained as follows. Firstly, the morphological hole .lling algorithm is applied to the complemented initial lung mask (¬Mi). Then, a 3D connected components labeling algorithm is used to .nd out the connected components of the body mask. By choosing the maximum component of the resultant image which corresponds to the body voxels, we obtain the body mask. Note that morphological hole .lling algorithm is carried out slice-by-slice in 2D fashion to reduce the computational time of the method. Having the initial lung and body masks, as it is shown in the Fig. 6, the secondary lung mask is obtained via: Ms = Mi . Mb , (2) where, Mi is the initial lung mask, Mb is the body mask and “.” is the logical “AND” operator. Next, the .nal lung mask (Mf ) is generated by applying the hole .lling algorithm on Ms. Finally, the segmented lung image is obtained by superimposing of Mf on the input image, which will serve as region of interest (ROI) in order to detect the pulmonary nodules. Image Anal Stereol 2014;33:13-27 Fig. 4. Top level block diagram of the proposed CADx method. (a) (b) (c) (d) Fig. 5. Examples of the masks which are generated and used during the lung segmentation process. (a) Initial lung mask, (b) Body mask, (c) Secondary lung mask, (d) .nal lung mask. CANDIDATE NODULES SEGMENTATION AND DETECTION Having the region of interest (i.e., segmented lung regions), the next step is to identify the nodule-like structures inside the ROI. As it is shown on the top level block diagram of the framework presented in Fig. 4, the identi.cation of the nodule candidates starts with segmentation which employs a multiple thresholding technique. Since the nodule density is higher than that of lung tissue (Golosio et al., 2009), internal isolated nodules can be easily isolated by a proper single-threshold separation. Unfortunately, internal nodules are not always isolated, as they can establish connections with the vessels. If the threshold is too low, juxtavascular or vessel-connected nodules appear as connected to the vessels. On the other hand, the threshold must not be too high. If it becomes higher than the density of a nodule, part of this nodule will be lost and its volume will be underestimated. Another problem regarding the threshold level is the segmentation of juxtapleural nodules (i.e., nodules that are connected to lung wall or parietal pleura). It often happens that the lung segmentation procedure leaves part of the lung wall Fig. 6. Block diagram of the lung segmentation process. The operation starts with processing of the 3D input CT image and results in 3D segmented output. (pleura) inside the volume of interest, especially in Inspired by the method introduced in Armato et high-convexity regions. Juxtapleural nodules in these al. (2001), we employed a specialized version of the regions will remain connected to part of the lung wall. multiple-intensity thresholding approach. The authors If the threshold is too low, juxtapleural nodules will of Armato et al. (1999; 2001) applied 36 gray level be connected to this layer. Using a multithreshold thresholds to the segmented lung volume. For each procedure, solid nodules connected to the vessels as threshold, they identi.ed contiguous structures with well as low-density nodules can be detected. Fig. 7 associated gray-levels greater than the threshold and illustrates a sample of vessel-connected nodule which observed that single structures identi.ed at lower gray-is segmented in multiple threshold levels. level threshold value can disassociate into multiple (a) (b) (c) Fig. 7. Three-dimensional views of the isosurfaces corresponding to a nodule connected to a vessel, segmented with thresholds of -450, -300 and -150 HU depicted in (a), (b) and (c) respectively. Image Anal Stereol 2014;33:13-27 smaller structures at higher threshold values. In a similar work Golosio et al. (2009), a wide range of threshold values applied to CT images and the connections between ROIs at different thresholds are stored in a tree data structure. Unlike mentioned methods, due to computational time considerations, we used a limited range of threshold values. The ten threshold levels in Houns.eld unit which were selected by examination of annotated nodules in the dataset are: -600, -550, -500, -450, -400, -350, -300, -250, -200 and -150. Each threshold operation was followed by 2D morphological opening with a circular structuring element of radius 1 to remove residual structures such as vessels which may be attached to nodule candidates. Applying these threshold levels Ti to the ROI (segmented lung regions) leads to ten corresponding candidate nodule masks (C1,..,C10). Each Ci is a 3D binary mask corresponding to the voxels remaining after thresholding. In the next step, 3D blob extraction algorithm is applied to extract information of connected components within each Ci. At each candidate nodule mask (Ci) remaining pixels are linked by means of a 3D 6-point connectivity scheme. In this scheme, every voxel of interest (x,y,z) within each mask ((x,y,z) . Ci), which has the value ‘1’ is labeled in the same blob with the following neighboring voxels: (x ± 1,y,z), (x,y ± 1,z) and (x,y,z ±1). Fig. 8 represents the scheme of the 6-point connectivity. Fig. 8. Six-point connectivity scheme for 3D blob extraction. The pixel of interest is shown in gray. Extracted blobs within each Ci are inspected further to determine whether to keep or remove detected nodule candidates. In this step the inspection is based on a simple size criterion. Since the effective nodule size range of the framework is set to [4,30] millimeters, then the blobs with maximum dimension size > 30 mm and minimum dimension size < 4 mm are removed from each Ci using a corresponding 3D bounding box. The ef.ciency of this criterion is illustrated more speci.cally in Fig. 9. Fig. 9a represents the maximum intensity projection of extracted blobs in a sample Ci with the threshold level -400 HU. In the .gure, each blob is illustrated with an identifying unique color and a white contour. Applying the mentioned simple size criterion to Fig. 9a resulted in the remaining nodule candidates presented in Fig. 9b. Next, all ten nodule masks are inspected by the simple size criterion and the .nal mask is obtained by applying logical “OR” operation being applied to each Ci via: Cf = C1 .C2... .C10 , (3) where C1 to C10 are candidate nodules’ masks obtained from the different threshold levels inspected by the simple size criterion and “.” is logical “OR” operator. The reason behind applying logical “OR” operation is to keep both low density nodules, which remain in masks obtained from low threshold levels, and vessel-connected nodules which usually appear in masks obtained from high threshold levels. Furthermore, this procedure greatly reduces the computational load of the subsequent feature calculation and classi.cation because it reduces the number of total candidates. As it can be seen from Fig. 9b, there are a signi.cant number of false positives in the inspected nodules masks Ci and consequently in the .nal candidate nodules mask Cf . Therefore, in order to decrease the number of false positives, the remaining blobs presented in the Cf are inspected further by extraction of more powerful features (listed in Table 1) and classifying with the help of the SVM classi.er. The key details of the feature extraction and classi.cation steps are described in the next subsection. FEATURE EXTRACTION AND CLASSIFICATION The nodule masks obtained with different threshold levels were inspected and integrated into a single nodule mask Cf through Eq. 3. The next step is to reduce the number of false positives through a classi.cation step which categorizes the remaining candidate nodules into “nodule” or “non-nodule” classes. To achieve such a reduction, a set of 17 2D and 3D features is computed for each segmented and labeled candidate nodule in Cf . They can be grouped into the following four types: 3D geometrical, 3D intensity-based, 2D geometrical and 2D intensity-based features. All 3D and 2D features are taken from Hardie et al. (2008); Messay et al. (2010). Furthermore, the 2D features computed are based on intensity and geometrical information from the slice with largest area situated inside the bounding box. The details of features employed in this study are listed in Table 1. It is obvious that in real practice the selection of an optimal subset from the original set of features Table 1. The list of optimal subset of features that were selected for the classi.cation of candidate nodules presented inCf. No. Feature Comments 3D features: 1 Volume number of voxels of the candidate 2 MinDim minimum dimension size of the bounding box 3 MaxDim maximum dimension size of the bounding box 4 Eccentricity1 = MaxDim/MinDim – 5 Compactness1 = Volume/.3 i=1 Dim(i) Dim1=Width, Dim2=Height and Dim3=Depth 6 Compactness2 = Volume/MaxDim3 – 7 DistanceToCenter distance to the center of projected lung 8 MinIntensity minimum intensity of candidate 9 MaxIntensity maximum intensity of candidate 10 StdIntensity standard deviation of candidate’s intensity 2D features: 11 Area pixel count of surface with maximum size 12 Circularity = 4.Area/P2 P is the perimiter of maximum surface 13 Eccentricity2 = MaxDim/MinDim length and width of 2D bounding box 14 DistToCenter2 distance to center of current slice 15 MinIntensity2 minimum intensity of area 16 MaxIntensity2 maximum intensity of area 17 ST DIntensity2 standard deviation of area’s intensity in an exhaustive manner is a tedious task. In this work we have selected features with the help of a greedy forward method (Guyon, Elisseeff , 2003). It is known that this method does not guarantee an optimal set of features. However, it resulted in a subset with reasonably high ef.ciency. Following the cited feature selection method, we take a fraction of original sample of patients containing 15 annotated nodules and 720 non-nodule objects for performing the feature selection procedure. The objects of both classes were segmented according to the described candidate nodule identi.cation procedure. The nodule-and non­ (a) (b) Fig. 9. (a) The maximum intensity projection map of a sample candidate nodules mask (Ci). (b) The maximum intensity projection map of the same nodule mask after inspection with the simple size criterion. Image Anal Stereol 2014;33:13-27 nodule objects were characterized by different subsets of an original set of features and the optimal subset of features is selected based on classi.cation results of nodule-and non-nodule objects. The original feature set consisted of 85 features selected from previous studies reported in Hardie et al. (2008); Messay et al. (2010); Tan et al. (2011). In order to illustrate the effect of the feature selection procedure let us present two scatter plots presenting the separation of objects in feature space using a multidimensional scaling method (see Fig. 10). The left scatter plot of the .gure presents objects’ separation using the optimal subset of features including 17 features listed in Table 1 while the right one gives picture of objects’ separation using a non-optimal subset of the features including 9 features. The .gure represents relative distances of the multidimensional feature vectors in a reduced 2 dimensional view. As it is obvious from the Fig. 10a the nodule and non-nodule samples which were described with optimal subset are more separable than the samples described with 9 features presented in Fig. 10b. Having the feature set listed in Table 1, the next step is the classi.cation of candidate nodules. We adopted a SVM classi.er that separates the data into two categories by constructing an N-dimensional hyper-plane in feature space. The radial-based kernel (RBF) (Burges , 1998) is selected empirically as the kernel function to train the SVM. The RBF function de.ned via Eq.4 is applied to the training set samples in the instance-label form (xi,yi) ,where xi . Rn and given to the framework which was already trained with 60 training scans. yi . {-1,1}. Fig. 10. The multidimensional scaling representation of the candidates belonging to nodule and non-nodule 2 K(xi,xj)= exp -. , . > 0 . (4) xi - xj classes. (a) samples described with 17 features listed in Table 1 (b) same samples described with a sub set of features listed in Table 1 including 9 features. Once all of the nodule candidates were classi.ed, To avoid possible bias introduced by selection of speci.c samples for training and test sets, the training was carried out using a 5 fold cross-validation procedure. The validation procedure was as follows. First, the original dataset was broken down to 5 groups each containing 12 cases (60 cases and 211 nodules in total). The estimated class probabilities of the nodule candidates of each group were determined using the trained classi.er on the remaining 4 groups. This procedure was repeated 5 times. Cross-validation enables an estimation of expected classi.cation results of a data set that is independent of the data that were used to train the model. However, after validating the method all of the 60 scans are used to train the framework to prepare the framework for handling totally independent and new test cases. In other words, in real practice, new CT scans (test sets) which are totally independent from the training scans will be we applied a simple threshold (Tprob . [0,1]) on the estimated class probabilities of the nodule candidates and scored the true positive (Tp), false positive (Fp) and false negative (Fn) events utilizing the positional information of nodules provided in the ground truth. The sensitivity of the classi.er at each level of Tprob is computed as: Sensitivity = Tp Tp + Fn . (5) The experimental results including classi.cation results of the nodule candidates are reported in the next section. RESULTS The experimental results are organized in two parts. In the .rst part, the overall detection performance of the framework is reported. Furthermore, this part includes the manner of parameters determination. In the second part, the performance of the framework is compared to other present well-known CADx systems. Finally, we demonstrate a few screen shots of the processed CT images which emphasize the visualization capabilities of the framework. DETECTION PERFORMANCE AND PARAMETER DETERMINATION The nodule detection performance of the proposed framework is measured and presented using FROC curves. A FROC curve is de.ned as the fraction of Tp nodule candidates, which are passed from estimated class probability threshold Tprob, versus the average number of Fps. As a result of several experiments we noticed that the performance of the framework is affected by several parameters. Although .nding the optimal combination of the whole set of control parameters is usually tedious and in some cases practically impossible, several experiments have been carried out to .nd the best combination of the major in.uencing parameters empirically. In addition to the nodule feature set, other major parameters that in.uence both time-and detection performance are: the number of intensity threshold levels Pt applied to segmented lung in order to generate nodule candidate masks and the size criterion Ps that determines the range of the target nodules inside the dataset. To determine the best value of Pt , we used four different sets of thresholds (Si) including 1, 5,10 and 20 threshold levels respectively. The .rst set includes one threshold level (S1 = {-450}), the second set includes .ve threshold levels (S1 = {-200,-300,-400,-500,-600}), the third set (S3) includes 10 threshold levels starting from -600 to -150 in 50 intensity increments and .nally S4 includes 20 threshold levels starting from -1000 to -50 in 50 intensity increments. Note that the number of threshold levels in each set determines the number of candidate nodules’ masks which should be generated and processed in order to identify the nodules. Thus, the nodule detection performance is a function of Si. The 211 nodules of the dataset (nodule size . 4 mm) were detected repeatedly each time by applying a set of threshold levels Si. Consequently, the sensitivity and average Fp per case of the detection component for each threshold set were computed. Fig. 11 shows FROC curves representing the detection performance of the framework for each threshold set. As it can be seen in Fig. 11, considering a .xed Fp per scan in the curves, as the number of threshold values increases the higher detection sensitivity values are achieved. For instance, considering Fp = 3.9 as cut-off point, the sensitivity values of the curves Si are 0.60, 0.74, 0.80 and 0.81 respectively. Since there are no signi.cant differences between the sensitivity values of S3 and S4, considering the time ef.ciency of the detection process, S3 is selected as the optimal threshold set of the framework. Therefore, the optimal value of the parameter Pt is the number of members of S3 which consisted of 10 threshold levels and leads to detection of 169 nodules out of 211 (i.e., sensitivity is equal to 0.80 with average 3.9 Fp/case). However it is important to note that, from diagnostic point of view the cost of less sensitivity (too many false negatives) could be much more than the cost of less speci.city (too many false positives). Anyhow, if higher sensitivity is desired, this can done by tuning the class probabilities threshold Tprob and consequently the sensitivity could be raised up to 88% with average 10 false positives per case. Regarding the computational time of the nodule detection process, it is important to note that computational time is dependent to several parameters such as the image size, the complexity of the algorithms being used and the processing capabilities of the hardware. However, in our case, the average computation time required for detection of nodules per case takes about 80–100 seconds on a computer with Intel Core2 Duo Processor E6600 with 2.4 GHz clock speed. 22 Image Anal Stereol 2014;33:13-27 In order to determine the optimal value of the other major parameter, i.e., the nodule size criterion or Ps, the nodule detection procedure is executed repeatedly with three different size criterion: nodule size . 3 mm, nodule size . 4 mm and nodule size . 5 mm. The number of target nodules in each size criterion is 222, 211 and 173 nodules respectively. Fig. 12 shows FROC curves representing the detection performance of the framework for each size criterion. As it is clear from the Fig. 12, the curves representing the nodule size . 5 mm and . 4 mm yield better detection performance. Although, the detection performance of the curve corresponding to the size criterion . 5 mm is slightly better than the one which corresponds to the size criterion . 4 mm. Nevertheless, since the CADx systems are expected to demonstrate acceptable performance in the case of smaller size nodules, we have .xed the size criterion of our framework to nodule size . 4 mm. PERFORMANCE EVALUATION AND VISUALIZATION RESULTS In this section, the proposed framework is evaluated by a comprehensive comparison of its performance to the existing methods on lung nodule detection. The comparison is based on the following factors: dataset size (number of patients and nodules count), applied size criterion, sensitivity and average Fp per case. Table 2 provides evaluation information on which our method is contrasted with 6 existing methods. Making an exact comparison of different methods is a dif.cult task since some of the mentioned methods used private datasets generated by various modalities rather than the LIDC public dataset. Nevertheless, the mentioned factors together provide a reliable comparison basis which allows ranking of different methods. As it is con.rmed by the information provided in the Table 2, considering the assumed parameter settings, the proposed framework provides better detection results than most of present methods from the sensitivity point of view. However, some methods provide slightly better performance than our method. The discussion regarding these differences is presented in the next section. We conclude this section by illustrating graphical outputs of the framework produced in different phases of the processing pipeline. Fig. 13a shows MIP representation of a sample CT image in the segmented form which is segmented by the lung regions segmentation component. Fig. 13b represents identi.ed nodules on the same image processed by the nodule segmentation and identi.cation component. Finally, Fig. 14 shows corresponding 3D rendered output of the image shown in Fig. 13b. Note that there is no smoothing or any other post processing in Fig. 14. It is constructed by extracting and plotting isosurface data from the volume data which is prepared within the framework’s processing pipeline. Using the visualization tool, the radiologists can navigate through the 3D lung structure and nodules, and .nd the corresponding points in the MIP representation which increases their ability for more accurate diagnosis. DISCUSSION For the last 15 years, considerable efforts have been devoted to the problem of automated lung nodule detection. Despite these efforts, research works aimed at a robust and computationally ef.cient CADx system are still on the way. In this paper, we proposed a new comprehensive CADx framework for segmentation and detection of pulmonary nodules in CT images. Utilizing an optimal set of thresholds, the proposed nodule detection method resulted in an acceptable level of the detection sensitivity (0.80). Meanwhile, using an optimal set of simple features and a SVM classi.er the rate of average false positives is decreased to 3.9 Fp per case. The validity of the method is assured by a 5-fold cross-validation method. We explored the rate of changes in the performance of nodule detection component which is a function of several major parameters. These parameters are: the candidate nodule feature set, the number of applied threshold levels and the applied target nodule size criterion. Although, .nding the optimal feature set for classi.cation is dif.cult , we used a greedy forward feature selection algorithm which resulted in a set of Table 2. Results of comparison of the proposed framework with previously published studies of lung nodule detection. CADx Number of Number of Number of Applied Sensitivity Fp/patient system patients all nodules detected nodules size criterion (percent) (Tan et al., 2011) 125 259 172 . 3 mm 66.4 3 (Golosio et al., 2009) 84 148 117 . 4 mm 79 4 (Messay et al., 2010) 84 143 118 . 3 mm 82.5 4 (Gurcan et al., 2002) 34 63 53 . 3 mm 84 5.5 (Riccardi et al., 2011) 154 385 189 . 5 mm 49 4 (Rubin et al., 2005) 20 195 148 . 3 mm 76 3 Proposed framework 60 211 169 . 4 mm 80 3.9 (a) (b) 17 computationally simple features. Furthermore, the optimal values of the threshold and size parameters (i.e., Pt and Ps) were determined empirically. Although the proposed CADx solution can detect the nodules with high precision, we provide a 2D/3D visualization tool which enables the radiologists a more accurate diagnosis. The visualization capabilities accelerate the process of analyzing CT images which is carried out by radiologists. In this respect, we provided a set of 2D, 3D and projected 3D (MIP) graphical outputs in different steps of the processing pipeline. Fig. 13 and Fig. 14 represent sample illustrations of these graphical outputs. Making an exact comparison with the present CADx systems is dif.cult due to variability in the image datasets as well as differences in the labeling and scoring methods and variation in validation and ground truth standards. Furthermore, some methods were using internal or private datasets which are not publicly available. Nevertheless, it is important to make a relative comparison. In this respect, the performance of the proposed framework is evaluated by comparing it with 6 present CADx methods/systems. The comparison is made based on several factors as it is shown in Table 2. According to information provided in Table 2, considering the assumed parameter settings, the proposed framework provides better detection results than most of present methods since it is able to detect 0.8 of the nodules (sensitivity) with 3.9 average Fp/case. However, there are still two methods proposed in Gurcan et al. (2002); Messay et al. (2010) which have better sensitivity in comparison to our method. The Image Anal Stereol 2014;33:13-27 mentioned methods demonstrate a sensitivity equal to 82.5 and 84 respectively, which are slightly better than the proposed method. In the other hand, they produce in average 4 and 5.5 Fp per case respectively. Although the detection results of the proposed framework are promising, the method still has some drawbacks. For instance, over-or under-estimation of the nodules’ volume can be considered as a drawback. The threshold-based segmentation of the structures may over-or under-estimate the volume of the interested structures depending to the threshold level. In order to resolve this drawback, a future work direction is segmentation re.nement of the detected nodules which can be carried out by utilizing gradient-based reshapable agents which is proposed in Alilou and Kovalev (2013). Furthermore, since the framework is able to detect solid nodules without determining the type of the detected nodules (i.e., whether a detected nodule is malignant or benign), a future research option might be investigating a solution which enables the framework to deal with the ground glass opacity and part solid nodules and to train the method to be able to distinguish between benign and malignant nodules. CONCLUSION We have proposed a new framework for segmentation and detection of solitary pulmonary nodules in CT images. The proposed framework demonstrated acceptable level of detection performance (sensitivity = 0.80, Fp/case = 3.9) in comparison with 6 existing CADx methods. In addition to this detection performance and time ef.ciency, the method offers extra visualization capabilities. Hence, the developed framework could be considered as a potential CADx tool for physicians in the clinical processes. REFERENCES Alilou M, Kovalev V (2013). Automatic object detection and segmentation of the histocytology images using reshapable agents. Image Anal Stereol 32(2):89–99. Armato SG, Giger ML, Moran CJ, Blackburn JT, Doi K, MacMahon H (1999). Computerized detection of pulmonary nodules on CT scans. Radiographics 19(5):1303–11. Armato SG, Giger ML, MacMahon H (2001). Automated detection of lung nodules in CT scans: Preliminary results. Med Phys 28(8):1552–61. 25 Armato SG, McLennan G, McNitt-Gray MF, Meyer CR, Yankelevitz D, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, et al (2004). Lung image database consortium: developing a resource for the medical imaging research community. Radiology 232(3):739–48. Awai K, Murao K, Ozawa A, Komi M, Hayakawa H, Hori S, Nishimura Y (2004). Pulmonary nodules at chest ct: Effect of computer aided diagnosis on radiologists detection performance. Radiology 230(2):347–52. Brown MS, Goldin JG, Rogers S, Kim HJ, Suh RD, McNitt-Gray MF et al (2005). Computer-aided lung nodule detection in CT results of large-scale observer test. Acad Radiol 12:681–6. Burges CJ (1998). A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–67. Chan HP, Hadjiiski L, Zhou C, Sahiner B (2008). Computer-aided diagnosis of lung cancer and pulmonary embolism in computed tomography–a review. Acad Radiol 15(5):535–55. Dehmeshki J, Ye X, Lin X, Valdivieso M, Amin H (2007). Automated detection of lung nodules in CT images using shape-based genetic algorithm. Comput Med Imag Grap 31(6):408–17. Doi K (2007). Computer-aided diagnosis in medical imaging: Historical review, current status and future potential. Comput Med Imag Grap 31(45):198–211. Golosio B, Masala GL, Piccioli A, Oliva P, Carpinelli P, Cataldo R et al (2009). A novel multithreshold method for nodule detection in lung CT. Med Phys 36(8):3607­ 18. Gurcan MN, Sahiner B, Petrick N, Chan HP, Kazerooni EA, Cascade PN, Hadjiiski L (2002). Lung nodule detection on thoracic computed tomography images: preliminary evaluation of a computer-aided diagnosis system. Med Phys 29(11):2552–8. Guyon I, Elisseeff A (2003). An introduction to variable and feature selection. J Mach Learn Res 3:1157–82. Hardie RC, Rogers SK, Wilson T, Rogers A (2008). Performance analysis of a new computer aided detection system for identifying lung nodules on chest radiographs. Med Image Anal 12(3):240–258. Hu S, Hoffman E, Reinhardt J (2001). Automatic lung segmentation for accurate quantitation of volumetric X-ray CT images. IEEE T Med Imaging 20(6):490–8. Kanazawa K, Kawata Y, Niki N, Satoh H, Ohmatsu H, Kakinuma R, Kaneko M, Moriyama N, Eguchi K (1998). Computer-aided diagnosis for pulmonary nodules based on helical CT images. Comput Med Imag Grap 22(2):157–67. Leader JK, Zheng B, Rogers RM, Sciurba FC, Perez A, Chapman BE, Patel S, Fuhrman CR, Gur D (2003). Automated lung segmentation in X-ray computed tomography: development and evaluation of a heuristic threshold-based scheme. Acad Radiol 10(11):1224–36. Lee Y, Hara T, Fujita H, Itoh S, Ishigaki T (2001). Automated detection of pulmonary nodules in helical CT images based on an improved template matching technique. IEEE T Med Imaging 20(7):595–604. Li Q (2007). Recent progress in computer-aided diagnosis of lung nodules on thin section CT. Comput Med Imag Grap 31(45):248–57. Li Q, Li F, Doi K (2008). Computerized detection of lung nodules in thin section CT images by use of selective enhancement .lters and an automated rule-based classi.er. Acad Radiol 15(2):165–75. McNitt-Gray MF, Armato III SG, Meyer CR, Reeves AP, McLen-nan G, Pais RC, Freymann J, Brown MS, Engelmann RM, Bland PH, et al (2007). The lung image database consortium (LIDC) data collection process for nodule detection and annotation. Acad Radiol 14(12):1464–74. Messay T, Hardie RC, Rogers SK (2010). A new computationally ef.cient CAD system for pulmonary nodule detection in CT imagery. Med Image Anal 14(3):390–406. Riccardi A, Petkov TS, Ferri G, Masotti M, Campanini R (2011). Computer-aided detection of lung nodules via 3d fast radial transform, scale space representation, and zernike mip classi.cation. Med Phys 38(4):1962–71. Rubin GD, Lyo JK, Paik DS, Sherbondy AJ, Chow LC, Leung AN, Mindelzun R, Schraedley-Desmond PK, Zinck SE, Naidich DP, et al (2005). Pulmonary nodules on multidetector row CT scans: Performance comparison of radiologists and computer-aided detection. Radiology 234(1):274–83. Saba L, Caddeo G, Mallarini G (2007). Computer-aided detection of pulmonary nodules in computed tomography: analysis and review of the literature. J Comput Assist Tomo 31(4):611–9. Siegel R, Naishadham D, Jemal A (2012). Cancer statistics, 2012. CA-Cancer J Clin 62(1):10–29. Sluimer I, Schilham A, Prokop M, Van Ginneken B (2006). Computer analysis of computed tomography scans of the lung: a survey. IEEE T Med Imaging 25(4):385– 405. Sone S, Li F, Yang Z, Honda T, Maruyama Y, Takashima S, Hasegawa M, Kawakami S, Kubo K, Haniuda M, et al (2001). Results of three-year mass screening programme for lung cancer using mobile low-dose spiral computed tomography scanner. Brit J Cancer 84(1):25–32. Tan M, Deklerck R, Jansen B, Bister M, Cornelis J (2011). A novel computer-aided lung nodule detection system Image Anal Stereol 2014;33:13-27 for CT images. Med Phys 38(10):5630–45. Teramoto A, Fujita H (2013). Fast lung nodule detection in chest CT images using cylindrical nodule-enhancement .lter. Int J Comput Assist Radiol Surg 8:193–205. Way TW, Hadjiiski LM, Sahiner B, Chan HP, Cascade PN, Kazerooni EA, Bogot N, Zhou C (2006). Ye Computer-aided diagnosis of pulmonary nodules on CT scans: segmentation and classi.cation using 3D active contours. Med Phys 33(7):2323–37. X, Lin X, Dehmeshki J, Slabaugh G, Beddoe G (2009). Shape-based computer-aided detection of lung nodules in thoracic CT images. IEEE T Bio-Med Eng 56(7):1810–20.