Image Processing and Artificial Intelligence Applications in the Detection of Lung Cancer***

Image Processing and Artificial Intelligence Applications in the Detection of Lung Cancer

 

 

 

 

Eng: Bodoor alabssi(1)   Eng:  Khaled otour(2)

 

 

Eng: Jamil hanouneh(3)

 

Syria-Lattakia, Tishreen University,

bodooralabssi@gmail.com (1)    

 

Germany - Nuremberg ,FAU University,

khaled.a.m.otour.1@gmail.com(2)

 

Germany - Nuremberg ,FAU University,  

jamil.hanouneh1997@gmail.com (3)

 

 

 

 

Supervised by  : Dr. Ghada Saad

 

 

 

مؤلفون / Authors/

الملخص / Abstract

الكلمات المفتاحية / Keywords


أقسام الملف

 

Introduction

 

Results and notes

 

 

Methodology of work in our research

 

 

Cancer mass extraction algorithm

 

 

neural network algorithm

 

 

Future prospects

 

 

References

 

 

Abstract

Cancer is one of the most dangerous health problems in the world, and the death rate of lung cancer is the highest among the types of cancer, and it is the most type that has low survival rates, as diagnosing lung cancer at an early stage is of paramount importance to reduce the high death rate.

Where the aim of this research is to reduce the time factor by helping the doctor to early detect lung cancer by using image processing techniques (FCM&CCL algorithms) and artificial intelligence (convolutional neural network CNN) so that the computer can distinguish cancerous nodes in a way that is approximately Human ability to detect suspicious areas.

In the end, the accuracy with training the neural network improved until the accuracy of the training reached 98.73%, and the accuracy of the test, val_accuracy, reached 96%, and this value indicates that the network has been trained.

 

 

 

 

Keywords

 CNN convolutional neural network, CCL algorithm, Morphological processes, FCM algorithm.

 

 

 

 

 

Introduction

Lung cancer is one of the lung diseases that is characterized by the occurrence of uncontrolled cell divisions of living cells, and the ability of these divided cells to invade and spread to other tissues of the lung, either by direct growth towards adjacent tissue or by moving to distant tissues. Exposure to tobacco smoke is the main cause of 90%.

Diagnosing lung cancer at an early stage is of utmost importance if the goal is to reduce the high mortality rate.

The research is concerned with early stage diagnosis through segmentation techniques and algorithms which are the basic concept of image processing and then the obtained tomographic images are analyzed using artificial intelligence in python language.

The aim of this research is to reduce the time factor by helping the doctor to detect lung cancer early by using image processing techniques and artificial intelligence, for the purpose of prevention and treatment.

There are a lot of projects and previous research related to the subject of the project, either in terms of improving image extraction or applying image processing techniques to show some prominent points in the images.

The first study: detection of lung cancer stages on CT images using different image processing techniques

CT images were obtained in a search through the Capture Images database:

NIH/NCI Lung Image Database Consortium (LIDC) dataset. DICOM (Digital Imaging and Communications in Medicine)

The algorithm used in this research:

• image capture

• pre-processing: at this stage, image smoothing and improvement is applied

• feature extraction: the basic features used are: Area, perimeter, eccentricity.

• Identify the area of lung cancer cells

 

 

 

 

 

Results and notes

 

The result: the algorithm excels in using image processing techniques effectively in obtaining the suspicious area or nodes in the lung, despite the difference in the shape of the suspicious area from one image to another, depending on the degree of tumor growth in the patient.

Note :

• Changing the values of the parameters It is possible to not be able to detect the suspicious area due to the need to change some parameters of the image processing techniques.

• The researcher did not take into account the difference between the volume of the right and left lung, and therefore we will not give the results with the same accuracy always

The second study: detection of lung cancer using image processing

The research aims to detect the suspicious area in the radiograph of lung cancer, but by using an algorithm that gives better and more accurate results than the previous research, in order to take into account the difference in volume between the right lung and the left lung.

Algorithm:

1. Image Acquisition: In his paper, the researcher used 300 x-rays of the lung, some of which contain a tumor, and some of which are normal.

2. Image Preprocessing:

• Convert to grayscale

• Normalization re-sizing

• Noise Reducti

3. Convert the noise-free grayscale image into a binary image

4. Segmentation: It is the process of dividing a digital image into multiple segments. The goal of segmentation is to simplify and/or change the representation of an image to make it more meaningful and accessible

Results and notes

The algorithm was characterized by accuracy, especially in terms of distinguishing the volume between the right lung and the left lung, and it also gave good results with various images.

Notes:

The inability to detect the suspicious area in all images due to the difference in the values of the parameters, which is considered a weak point in the algorithm

 

 

 

 

 

 

Methodology of work in our research

 

The importance of research is limited to computer-assisted diagnosis, as it usually identifies suspicious areas, and because CT imaging of the chest region may produce several hundred two-dimensional slices for each patient, and this data requires a lot of time and focus from the radiologist to be able to interpret it. Diagnostic imaging techniques produce a large amount of information, which the radiologist must comprehensively analyze and evaluate in a short time.

In this research, we used the Python programming language, relying on artificial intelligence offices such as (Opencv, Pandas, Keras Skfozzy, PyQt).

Software used:

• Anaconda Anaconda is provided in the field of big data analysis, Deep Learning, and access to packages and libraries such as tensorflow, keras with great ease

• Jupiter NoteBook is one of the most important software used which is included in Anaconda Jupyter is known as an open source web application that allows creating and sharing documents that contain live code

• PyQt5 designer with which Python commands built into cells in Jupyter NoteBook can be easily linked to QtDesigner and used to build a GUI

In this research, we have applied the Anaconda program on computerized tomography of the chest region, which are quantized parallel slices with 8 bits (256 units of gray) and dimensions (512X512), which are saved in DICOM formats ((Digital Imaging and Communications in Medicine))

The CT images were obtained from a dataset:

NIH / NCI Lung Image Database Consortium (LIDC)

A total of 212 healthy and 204 healthy lung cancer images were used to train a neural network CNN type

 

 

 

 

 

 

Cancer mass extraction algorithm

 

The first Stage: In this stage, the lungs are extracted from the CT image

1.      Image Thresholding: The image has been tarnished according to each input image by means of automatic punctuation, so that every pixel above the automatic threshold gives a value of zero and every pixel below the threshold has a value of one. The idea is to separate the image into two parts, the background and the foreground, as shown in Figure (1).

Figure 1 Image after Thresholding

2. The mask of the lungs: By observation, it was found that the lungs are always located in the center of the image, so we will delete all the elements that are far from the center by applying a mask in the form of a circle with dimensions of 221 x 221)

As in Figure (2), so that all the pixels of the circle carry the number one, and the pixels outside the circle have the number zero

 

Figure (2) Lung mask with dimensions (221 x 221) applied to the image of the lungs

After merging the mask with the reproached image, we have the following result, as in Figure (3). Where we were able to get rid of all the compounds in the background and keep the lungs

Figure (3) The result of applying the mask to the blurred image

 

 

 

 

 

 

Applying the CCL algorithm

 

 After merging the mask with the threshold image, we number the connected compounds in preparation for deleting some of them according to specific conditions. We note that the lungs are the largest organs in the image, so we delete all elements except for the two largest elements. From experience and observation, it was found that the lung area ranges from Between 4,000 to about 35,000 pixels, as shown

3. Morphological operations: There are some cases in which the mass is on the outer wall of the lung and we may not be able to read it. Therefore, we perform extension operations with a square expansion coefficient of dimensions (2 x 2) and another extension operation with a square expansion factor of dimensions (7 x 7) in order to cover the mass And then we fill in the blanks using the filling in the blanks algorithm, and we get the lung area. As in Figure (4)

Figure (4) Image after lung extraction and void filling

4. Retrieving the gray levels: After determining the lung masses, getting rid of the background and small compounds, and filling the lung spaces, we perform the process of retrieving the gray levels of the image and obtaining the lungs, as in Figure (5).

 

Figure 5: Image of the lungs after retrieving the grayscale values

 

The second stage: In this stage, the suspicious area is reached

1.      The FCM algorithm:

Fuzzy c-mean (FCM) is one of the popular clustering algorithms for medical image segmentation. But FCM is highly susceptible to noise due to the lack of consideration of spatial information in image segmentation.

We apply the FCM to the image of the lungs extracted using the default options and specify the number of clusters to 3 clusters, so that one of these three clusters includes image elements that lie in the background of the image, while the second cluster includes suspicious areas, and thus the last cluster includes image elements in normal areas.


Figure (6) the images resulting from the formation of the elements of the three clusters

2.      Identify the suspicious area and delete the areas with small areas

We first close the square formation element, whose dimensions are 3 * 3, and we apply the process of opening the square composition element, whose dimensions are 5 * 5, in order to remove unwanted areas in the image in addition to suspicious areas.

To isolate the block, we apply two stripping processes with forming elements of dimensions 3 * 4 and 4 * 3

We use the numbering of the connected areas to remove the small elements, and according to the working shape of the block, we were able to obtain it based on the ratio of the length of the secondary diameter to the length of the main diameter, as the suspicious block takes a shape close to the shape of the circle and has a large area in relation to the rest of the elements in the image, so it was relied on these two factors to extract the mass

3-      Coloring the suspicious area in red and attaching it to the original chest image

 After performing morphological operations, deleting small elements and obtaining the mass, we color it in red by making the red color component take the value of the greatness of the color component (green & blue) take the values ​​zero and merge them with the original image Let's get the location of the cluster colored in red

Figure (7) The resulting images merge the block that we have colored in red with the original image

 

 

 

 

 

neural network algorithm

 

 

1- Obtaining training data: These algorithms depend mainly on data, which are used to adjust the weights of the neural network through the training process, so that the neural network gains knowledge and the ability to perform the function assigned to it, such as classification and prediction operations, and others. The Internet was relied upon to obtain some pictures of cases of various types of lung cancer

2- Preparatory processing of images: the size and pattern of the images provided to the neural networks are unified. The color pattern of the images is determined by converting them from the colored pattern to the gray pattern, and carrying out re-sizing or cropping operations for the important region only, and so on. The gray pattern and the images are relied on at a size of 80 * 80

 

3- Building a CNN neural network:

Basic components of convolutional neural networks: Convolutional Layer, Pooling Layer, and Fully-connected Layer

At this stage, the number of neurons of the input layer (the shape of the input layer) is determined, and then the layers are built sequentially one after the other. Figure (8) shows the structure of the neural network used

 

Figure (8) shows the structure of the neural network used

 

Convolutional layers are built by defining the number and size of filters and activation functions used. The task of this layer is to extract the information and features available in the images, and then a selection layer is added to obtain the most important information and make the network impervious to noise by using an appropriate selection filter.

Then the last output is taken in the form of a row of neurons and presented as input to the fully-connected layers responsible for the classification process. The Fully-connected Layer permeates separation or deletion of some nodes to prevent the network from being biased towards very special patterns, saving them and giving them the ability to deal with patterns New similar more general

The construction of the neural network ends with specifying the number of output neurons (the output of the fully connected layer) to correspond with the number of classes to be trained on in the classification processes.

 

4-      Training the neural network: Here, the weights are determined for the connections that connect the neurons within the previous structure, so as to ensure the flow of input data through these neurons to reach the appropriate output neuron, which represents the correct classification process. Change the weight until the network output matches the known output

5-      Testing the neural network after training it: After completing the training process, the network is tested on new images that fall within the framework of the problem (lung cancer) to verify the classification processes and that the network has learned correctly and is available for use.

Advantages and disadvantages of using a CNN neural network

Positives:

1- Neural networks are able to extract features automatically in their layers, which facilitates the process of obtaining specific features and patterns that are useful in the classification process.

2- The training and test results showed high accuracy

Negatives:

1- Despite the high accuracy value, the neural network cannot be generalized on a larger scale (diagnosis for new images) due to the lack of test data on which the network has been trained, i.e. it is highly likely that the patterns that the network has reached for diagnosis are insufficient.

Results and discussion

We have 395 training samples and 21 samples for testing, where in the first iteration, the value of the error in the training loss was 0.7383, the value of the accuracy in the training was 0.5089, the accuracy in the test, val_loss, was 0.6965, and the accuracy in the test, val_accuracy, was 0.0952. Thus, with the training of the neural network, the values of accuracy improve until the accuracy of the training reaches to 0.9873 and accuracy of test val_accuracy is set to 0.96, which is considered an excellent accuracy, and thus we stop the process of training the neural network.

The following figure (9) shows the error in the classification process on the test and training data

figure (9) shows the error in the classification process on the test and training data

As shown in the previous figure, the training curves epoch (orange) and the error value Loss (blue) are shown. We note that the error value of loss decreases so that at the beginning it was 0.7383 until after training it becomes 0.0455, i.e. less than 1. By the end of the training, we notice that the training curve is close to the value curve. The error indicating the learning of the neural network.

The following figure (10) shows the accuracy of the classification process on the training and test data

figure (10) shows the accuracy of the classification process on the training and test data

 

As the previous figure shows the training curve (blue) and the test curve (orange), where we notice the test curve rising at the beginning, but it did not approach the training curve, and with training the network, we notice at the end of the figure the approach of the training and test curves, and this is an indication that the neural network have trained.

Figure (11) below shows the interface of the final application in the program

 

Figure (11) below shows the interface of the final application in the program

After performing the morphological operations and deleting the small elements and obtaining the mass, we color it in red by making the red color component take the value of greatness and the color component (green & blue) take the zero values and merge them with the original image to get the location of the block colored in red

The following figure (12) shows the result we obtained from the program:

 

 

 

 

figure (12) shows the result we obtained from the program

 

 

 

 

 

 

 

 

 

 

 

 

Future prospects

 

·         Increasing training data: The large medical data collections possess high-level informational content when converting this data to specific algorithms that aim to employ the information that this data possesses in the field of analyzing diagnostic errors. In this field, many researches have been conducted in order to detect different types of Tumors are of various sizes and shapes

·         The use of 3D images: the quality of processing can be improved by using it to capture more details, so that it looks more realistic, and gives accurate results for the location of the mass and its anatomical extension in relation to the three axes, but requires medical servers and specialized processors capable of carrying out the operation

 

 

 

 

 

References


 

 

[1] Ammar, M. (2013) “Medical image display and processing systems”, Damascus University.

[2] Chaudhary A and Singh S, “Lung Cancer Detection Using Digital Image Processing,” International Journal of Research in Engineering and Applied Sciences (IJREAS), vol. 2 , no. 2, Feb. 2012

 [3] - E., Umbaugh, Scott (2017-11-30). Digital Image Processing and Analysis with MATLAB and CVIPtools, Third Edition

[4]stoitsis, j ,valavanis ,I ,Mougiakakou , S , Golemati ,S , Nikita ,A and Nikita ,K, (2006) “Computer aided diagnosis base on medical image processing and artificial intelligence methods” Science Direct , 569 ,pp 591-595

[5] Hudson D , Cohen M ," NEURAL NETWORKS AND ARTIFICIAL INTELLIGENCE FOR BIOMEDICAL ENGINEERING" , The Institute of Electrical and Electronics Engineers IEEE , New York, ppt 45-56

[6]Hayat, M. A. (2007) "Cancer Imaging- Lung and Breast Carcinomas", Volume 1,Academic Press.