PARTICULATE MATTER LEVELS
CLASSIFICATION USING MODIFIED AND
COMBINED RESNET MODELS WITH LOW
FEATURES EXTRACTION
Rayan Awni Matloob
University of Duhok, College of Engineering, Electrical and Computer
Department, Zakho Street 38, Kurdistan Region (Iraq).
https://orcid.org/0000-0001-7406-8196
Mohammed Ahmed Shakir*
University of Duhok, College of Engineering, Electrical and Computer
Department, Zakho Street 38, Kurdistan Region (Iraq)
Mohammed.shakir@uod.ac
Reception: 29/01/2023 Acceptance: 09/04/2023 Publication: 28/04/2023
Suggested citation:
A.M., Rayan and A.S., Mohammed. (2023). Particulate Matter Levels
Classication Using Modied and Combined ResNet Models with Low
Features Extraction. 3C TIC. Cuadernos de desarrollo aplicados a las TIC,
12(1), 378-398. https://doi.org/10.17993/3ctic.2023.121.378-398
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
378
ABSTRACT
Smog is a serious environmental problem. It is an atmospheric pollutant that, if
inhaled frequently, can lead to lung diseases such as asthma and bronchitis. One of
the most dangerous air pollutants is particulate matter with a diameter of fewer than
2.5 micrometers (PM2.5), which may be breathed into the body and cause major
health issues by introducing dangerous compounds deep into the lungs and
bloodstream. In this research, a new convolutional neural network is proposed, by
upgrading and parallelly stacking the two pre-trained models ResNet18 and ResNet50
to form a new modified-combined convolutional model (C-DCNN). Besides, we
stacked another two columns of layers to extract the low features of ResNet18 and
ResNet50 separately, to create finally four stacked columns of layers. The new model
classifies images into different classes based on their PM2.5 concentration levels. To
assess the suggested approach, an image augmentation is applied, then divided the
images randomly (80% for the training progress,20% of the used training data for
validation, and 20% for testing). The experimental results demonstrate that the
proposed method increased the accuracy of level estimation with an accuracy
increment equal to (6.25% at LR=0.0007) compared to ResNet50.
KEYWORDS
Deep Learning, Combined Convolutional Neural Network, ResNet, Image
Classification, Air Quality, Particulate Matter.
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
379
PAPER INDEX
ABSTRACT
KEYWORDS
1. INTRODUCTION
2. RELATED WORKS
2.1. COMPARISON WITH OTHER MACHINE LEARNING WORKS
3. CONTRIBUTIONS AND MATERIAL
3.1. CONTRIBUTIONS
3.2. DATASET AND AUGMENTATION
4. METHODOLOGY
4.1. TRANSFER LEARNING FOR TRAINING DEEP LEARNING MODELS
4.2. RESNET18 ARCHITECTURE
4.3. RESNET50 ARCHITECTURE
4.4. COMBINING MODELS AND ADDED LAYERS
4.4.1. DROPOUT LAYER
4.4.2. 2D MAX POOLING AND AVERAGE LAYERS
4.4.3. FLATTEN LAYER
4.4.4. LSTM LAYERS (LONG SHORT-TERM MEMORY)
5. EXPERIMENTATION SETTINGS
5.1. IMAGES CLASSIFICATION RESULTS
6. CONCLUSION AND FUTURE WORK
REFERENCES
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
380
1. INTRODUCTION
The act of polluting the air with toxins that are harmful to living beings and humans,
or that degrade the environment or materials is referred to as "air pollution" [1].
Humans may suffer from sicknesses or allergies, or even pass away as a result of it.
In addition, it can impact the built environment (as a result of factors like ozone
depletion, climate change, or habitat degradation) as well as other living things like
food crops and animals, such as acid rain [2]. With anthropogenic ozone and PM2.5
causing 2.1 million fatalities per year, outdoor air pollution from the combustion of
fossil fuels alone is one of the leading causes of mortality in people [3, 4]. Particles or
as also called Particulate matter are minute solid or liquid particles suspended in a
gas, commonly known as (PM), atmospheric particulate matter (APM), or fine particles
[4]. The 2.5 refers to the size range of particles. A particle considered to be PM2.5 has
a diameter between 1 and 2.5 micrometers (which is about 1/30th of the width of a
human hair) [3, 4]. Some of the biggest cities such as China and Taiwan installed
PM2.5 monitoring stations, but due to the expensive and the required resources it is
not always the best idea, due to the need of installing more than one station for the
large cities due to coverage limitation. The use of deep learning to forecast air quality
by picture classification or regression is one of the better methods in this sector.
Image-based automated information extraction has been the subject of a lot of work in
machine learning and computer vision. The use of the image is an efficient and easy
method because of the widespread of smartphones and the ability of every person to
capture an image whenever he wants and wherever he is. This study introduced an
image-based PM2.5 analysis technique that uses a deep learning network to
categorize the PM2.5 concentration levels of outdoor images. The method suggested
in this study leverages the cutting-edge CNN algorithm for image analysis in contrast
to image feature-based PM2.5 analysis methodologies. Due to the CNN's explicit end-
to-end design and ability to automatically extract both low-level and high-level picture
characteristics. The Shanghai dataset (1052 photos, one scene) from [5] was used to
test our methodology. The rest of this study is divided into the following sections. We
describe some earlier work on machine learning for image categorization in Section 2.
We discuss the contributions of our study and the provisions relating to the
experiments in Section 3 of this article. We provide our suggested approach to
concatenated neural convolution networks in Section 4. The experimental findings in
Section 5 demonstrate how well our strategy performed. In Section 6, we draw to a
close this essay, and finally, the references in Section 7.
2. RELATED WORKS
In [6] they merged meteorological data and images to predict PM2.5 indices of
outdoor photos, using support vector regression (SVR) and deep learning techniques.
Their suggested approach employs two datasets gathered from Beijing and Shanghai
city in China besides a constructed SVR model to integrate the PM2.5 predicted by
the CNN with two meteorological parameters, wind speed, and humidity, to provide
the expected outcomes towards the end of the PM2.5 index. For the Shanghai
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
381
dataset, the proposed model reduced the RMSE by 26.08% and the R-squared
increased by 24.57%. For the Beijing dataset, the proposed model reduced the
reduced RMSE by 5.27% to 56.03 and increased the R-squared by 8.4% to 0.6046.
The authors of [7], Using an ensemble of deep neural network-based regression,
the researchers estimated the PM2.5 concentrations based on photos taken outside.
Using a feedforward neural network and a dataset of 1460 pictures for performance
analysis, they merged the PM2.5 predictions from three convolutional neural networks
ResNet50, Inception-v3, and VGG-16, to generate the final PM2.5 forecast of the
image. As a consequence of their experimental findings, which show that the meta
trainer can effectively aggregate the PM2.5 predictions from base learners and
provide a better prediction than any single base learner utilized, the suggested
technique is suitable for monitoring PM2.5 pollution. Based on a study of a substantial
number of outdoor pictures accessible for Beijing, Shanghai (China), and Phoenix, the
researchers calculated PM air pollution. Six picture elements were taken from the
photographs and combined with additional pertinent information, such as the sun's
position, the date, the time, the location, and the weather, to forecast the PM2.5 index.
The researchers in [8], have proposed an image-based deep learning model (CNN-
RC, under VGG schemes and ResNet with some layers). That combines a
convolutional neural network with a regression classifier. By shots extraction feature
and feature categorization into air quality categories. This model is capable of
calculating the air quality at specified places. The models were tested after training on
datasets, comprising different combinations of the current image, HSV (hue,
saturation, value), characteristics, and the baseline image, to boost model
dependability and estimation accuracy. The Linyuan air quality monitoring station in
Kaohsiung City, Taiwan, collected a total of 3549 hourly air quality datasets, including
images, PM2.5, and the air quality index (AQI), to quickly produce an accurate image-
based estimation of multiple pollutants at once using just one deep learning model.
According to their test findings, the estimation accuracy for R2 for PM2.5 using day
(night) photos is 76%.
In the manuscript of [9], they integrated two deep convolutional neural networks
(DCNNs) utilizing transfer learning to extract distinctive picture properties, running the
previously trained Inception and Xceptions models concurrently. Before feeding the
final fully linked layers for classification, the feature maps are merged and reduced by
dropout. The system uses maximum likelihood and majority voting criteria to classify
sub-images first, then the entire picture. Breast cancer is classified as having four
tissue malignancy levels: invasive carcinoma, in situ carcinoma, benign, and normal.
The experiments were conducted using the BACH, Breast Cancer Histology dataset,
and they used 4800 photos to obtain an accuracy of roughly 95%.
2.1. COMPARISON WITH OTHER MACHINE LEARNING WORKS
Predicting the air quality using image-based machine learning has a huge role, as it
reduces the dependence on the huge amount and expensive equipment. In addition
to being available at hand, for ease of use by smartphone devices. The accuracy of
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
382
any image-based CNN model depends on the size of the used dataset (is it enough to
learn the model or not?) and how good the model is for training correctly. Besides, the
number of classes for classifying the problem at hand is important with the
classification model. For the same dataset, increasing the number of classes used
means decreasing the number of images in each class, which may reduce the
accuracy or cause an overfitting problem. The results comparison demonstrates that
the proposed new model significantly achieved reasonable accuracy increment
compared to the ResNet50, despite the small dataset and the number of classes that
are categorized by it. It is noteworthy that the used models in [6], and [9] have higher
accuracy as they are owning a large number and a clear difference of images in the
dataset. Any increment in the number of images per class has a significant effect on
increasing the accuracy as clarified in.
3. CONTRIBUTIONS AND MATERIAL
3.1. CONTRIBUTIONS
The following is a summary of our work's significant contributions:
This paper uses two pre-trained CNN models for air quality (PM2.5 levels) using one
scene image. Instead of creating a new model and solving the issue of the little
amount of accessible dataset, this study tries to improve the performance of
learning.
The main goal of this study is to provide a comprehensive classification that
considers the following five categories of pollutants: Level 1, Level 2, Level 3, Level
4, and Level 5.
This work develops a model based on parallel convolutional neural networks, to
consolidate the machine learning training process. This architecture assembles the
architecture of two conventional networks: ResNet18 and ResNet50 models.
As far as we are aware, no study has been done using ResNet18 and ResNet50
with their low features to show this design.
3.2. DATASET AND AUGMENTATION
With the use of the AQI index, the daily air quality is provided. It lets you know if the
air is clean or dirty, and it alerts you to any potential health risks. The AQI focuses on
potential health effects that may occur hours or days after inhaling polluted air. The
four primary air pollutants specified by the Clean Air Act are used to calculate the AQI:
carbon monoxide, sulfur dioxide, ground-level ozone, and particle pollution. The
amount of air pollution and health concern is inversely correlated with the AQI value.
This paper considered the air quality using photos taken at fixed locations in Shanghai
[10] with its corresponding PM2.5 [11]. The images have been divided into five classes
based on the number of micrograms (the mass or weight) per cubic meter of air as
(µ
g/m3), In this work, the level of PM2.5 images start with Level 1 class and a
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
383
concentration between (0 to 18.5) and end with the Level 5 class with concentrations
greater than 59.9 as shown in Table (1)
Table 1. PM2.5 classes and concentration.
The Shanghai PM2.5 dataset is available online through Figshare and contains
about 1954 images with different air quality levels during daylight. The images are 389
by 584-pixel resolution. To use the images with ResNet, and as its first layer of size
(224, 224, 3), we resized the image to fit the input layer. besides, for image
augmentation, we flipped all the images horizontally to get a total image equal to 2104
images. The images number in each class are shown in Table (2), (some images are
not considered to ensure data balancing between the classes). According to [12],
using a pre-trained model, 150-500 images per class is sufficient to achieve
reasonable classification accuracy (Reasonable, not the best).
Table 2. The number of images in each class.
4. METHODOLOGY
In this part, we presented a novel categorization scheme for air-quality
photographs. The design is built on transfer learning by utilizing ResNet18 (Depth=18,
layers =71, Size=44MB, Parameters =11.7 million, input size, 224, 224, 3) and
ResNet50 (Depth=50, layers =177, Size=96MB, Parameters =25.6 million, input size,
224, 224, 3) and. The new proposed model consists of 260 layers and 35.4 million
parameters, and its size is 249 MB. ResNet18 and ResNet50, two pre-trained
convolutional neural networks, are upgraded and utilized without the final layers (fully
connected, Softmax, and classification layers). Their role is to perform high-feature
Class Level PM2.5 concentration (µg)
Level 1 < = 18.4
Level 2 18.5 - 30.4
Level 3 30.5 - 40.4
Level 4 40.5 - 59.9
Level 5 > = 60
Class No. of images
Level 1 320
Level 2 411
Level 3 492
Level 4 472
Level 5 409
All classes’ images 2104
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
384
extraction (dense presentations of the input images). This means, to this point, we will
have two lines of stacked layers both starting with the input layers and ending with a
ReLU layer, with some in between layers according to each model architecture. The
ResNet18 and ResNet50 are upgraded then, by adding two important layers, (Flatten
and LSTM layers). In addition to the two models, we stacked another two lines of
layers called (Low_R18 and Low_R50) to extract the low image features from
ResNet18 and ResNet50 respectively.
Low_R18, started by the ResNet18 layer number (18), while Low_R50, started by
the ResNet50 layer number (36), with no layers in between for both. The Low_R18
and Low_R50 columns are designed by feeding them (from their starting layers)
directly into a dropout layer with a probability equal to 0.5 to reduce the overfitting (A
dropout layer randomly sets some input elements to zero with a given probability).
Then to a “2-D max pooling” layers, with (pool size = 4,4, strid = 4,4. Each output of
the “2-D max pooling” layer is fed to a 2-D average pooling layer with (pool size = 2,2,
strid = 2,2). After that, we entered the output of both columns (Low_R18, and
Low_R50) to a flattened layer, followed by Long short-term memory (LSTM) layer with
several hidden units equal to (10). The two outputs of columns (ResNet18 and
Low_R18) are added using an additional layer. Also, the two outputs of columns
(ResNet50 and Low_R50) are added using another additional layer.
The outputs from the two additional layers are concatenated using the
concatenation layer and passed again to another dropout layer with a probability of
0.5, and finally passed to the last dense layer followed by a Layer normalization layer.
The strategy's conceptual underpinnings include:
Initial pre-processing for image resizing.
Image augmentation with horizontal flip to the right.
ResNet18 and ReNet50 upgrading.
Low and high feature extraction based on transfer learning, by the newly added
columns of layers (Low_18 and Low_50).
Figures (1 and 2), which depict the upper and bottom portions of the newly
proposed model, respectively, exhibit the block diagram of the suggested technique.
The suggested models and the added layers will be covered in detail in the sections
below. The proposed model will add the high features of the modified ResNet18 with
the low features of the ResNet18 after stacking some useful layers (Low_R18). And
the same for ResNet50, it will add the high features of the modified ResNet50 with the
low features of the ResNet50 after stacking some useful layers (Low_R50).
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
385
Figure 1. The upper layers of the new model Figure 2. The lower layers of the new model
4.1. TRANSFER LEARNING FOR TRAINING DEEP LEARNING
MODELS
A model that has been trained for one job is utilized as a starting point for a model
that completes a related task in the transfer learning method of deep learning [13].
Transfer learning may often update and retrain a network more quickly and easily than
starting from scratch. Transfer learning is a well-liked approach because: It allows you
to reuse well-liked models that have already been trained on huge datasets, allowing
you to train models with less labeled data. Training and testing data for traditional
machine learning often share the same input feature space and data distribution. The
performance of a predictive learner may suffer when the distribution of data between
training and test sets differs [14]. In some circumstances, it might be challenging and
expensive to find training data that fits the test data's feature space and projected data
distribution properties. A high-performance learner for a target domain is thus
required, and it must be taught from a comparable source domain. The drive behind
transfer learning is this. On the other hand, combined deep convolutional neural
networks (C-DCNN) eliminate the requirement to extract complex features before
training the classifier and allow for the learning of high-level and advanced features
from the training dataset [9]. Convolutional layers for feature extraction give the
classification system effective field knowledge. CNNs enable the reduction of the field
knowledge required to create a classification system, followed by layer pooling. As a
result, the approach’s performance is less influenced by the dataset that was utilized,
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
386
and comparable network topologies can produce positive outcomes for a variety of
issues. When used to estimate air quality, the C-DCNN has improved pollutant
concentration predictions [15]. The performance of the created learning model heavily
depends on the accessibility of the learning dataset. Due to their small datasets,
certain classification problems cannot conduct the necessary deep learning.
Additionally, data gathering is costly and may need a complex capture procedure and
professional annotation. Transfer learning seeks to solve the dataset availability issue
by using a pre-trained model on a sizable, labeled dataset of a generic context. To
meet the new setting, only a minor training change is necessary [9]. The massive
Imagenet dataset, which includes more than a million photos belonging to a thousand
different object categories, is used to train ResNet50 and ResNet18. The accuracy of
the two models on the Imagenet dataset is good. The following is a description of the
two CNNs' structures:
4.2. RESNET18 ARCHITECTURE
ResNet18 has 18 layers with a 7x7 kernel as 1st layer. It has four layers of
ConvNets that are identical. Each layer is made up of two residual blocks. Each block
is made up of two weight layers with a skip connection connected to the output of the
second weight layer with a ReLU. If the result is equal to the input of the ConvNet
layer, then the identity connection is used. But, if the input is not similar to the output,
then a convolutional pooling is done on the skip connection. ResNet18 also used two
pooling layers throughout the network one at the network's inception and the other at
its conclusion. The input size taken by it is (224, 224, 3), where 224 is the width and
height, and 3 represents the RBG channel. The output is a fully connected layer that
gives input to the sequential layer [16, 17].
4.3. RESNET50 ARCHITECTURE
ResNet-50 is a pre-trained model that won the 2015 ImageNet Large-Scale Visual
Recognition Challenge (ILSVRC) competition. It was trained on a portion of the
ImageNet database. The model can classify photos into 1000 item categories and is
trained on more than a million photographs. It contains 177 layers in total, which
corresponds to a 50-layer residual network (224, 224, 3) [18]. The ResNet
architecture (figure 3) is considered to be among the most popular Convolutional
Neural Network architectures around. Residual Networks (abbreviated ResNet) were
first described by Xiangyu Zhang, Kaiming He, Jian Sun, and Shaoqing Ren in their
2015 computer vision research paper titled "Deep Residual Learning for Image
Recognition" [18]. ResNet was later introduced by Microsoft Research in 2015 and set
numerous records.
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
387
Figure3. ResNet architecture.
A significant drawback of convolutional neural networks is the "Vanishing Gradient
Problem." Weights scarcely change as a result of the considerable fall in radiant value
that occurs during backpropagation. ResNet is employed to get around this. It
employs "SKIP CONNECTION," which, as illustrated in Figure (4) [18], adds the
original input to the convolutional block's output. The authors of Xiao Tian [19] provide
many common convolutional neural networks, including VGG16, VGG19, Inception,
Xception, and ResNet50, which are utilized to determine the constellation's
modulation pattern. Through trials, it has been shown that among other models, the
ResNet50 network works best and has the greatest accuracy. Numerous more
studies, including [20] and [21], demonstrate ResNet50's superior accuracy
performance.
Figure 4. ResNet skip connection.
4.4.COMBINING MODELS AND ADDED LAYERS
Next in sections (4.4.1 to 4.4.5), we will describe the added layers to ResNet18 and
ResNet50, and the layers added to extract the ResNet18 low features and ResNet50
low features (Low_R18 and Low_R50 respectively), to build the whole proposed
model.
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
388
4.4.1. DROPOUT LAYER
Problems with training time and learning overfitting are frequent in deep learning,
particularly in C-DCNN. Additionally, it costs a lot of computation to combine the
outputs of several trained models [22]. To address these issues, the feature vector
should be dropped out before being fed into the max pooling layers. Dropout [23] is a
newly suggested regularizer to combat the overfitting problem. It is a regularization
technique that stochastically adjusts the hidden unit activations for each training
example to zero during training. Other stochastic model averaging techniques like
stochastic pooling [24], drop-connect [25], and maxout networks [26] were influenced
by dropout. The following figure (5) illustrates how the dropout layer affected the Low
R50 and Low R18 input features.
Adding a dropout layer before the first max pooling layer row [27] demonstrate that
sampling activation based on a multinomial distribution with an adjustable parameter p
is equal to utilizing max-pooling dropout at training time (the retaining probability). This
method, which is only carried out during training time, has a major impact on cutting
down on training time and preventing overfitting. The following layer for columns (Low
R50 and Low R18) is a max pooling layer based on that.
Figure 5. Dropout layer with probability = 0.5
4.4.2. 2D MAX POOLING AND AVERAGE LAYERS
The output feature vectors of each dropout are fed into a “2-D max pooling” row
layer and symbolized as “
maxPooling2dLayer
”, with (pool size = 4,4, strid = 4,4). This
layer is used to perform downsampling separately on columns (Low_R50 and
Low_R18), by breaking the layer’s input into square or rectangle areas according to
the pool size, then finding each area's maximum value. The used pool size refers to
the width and height of the rectangular regions and the stride represents the stride
dimensions which is the steps that the rectangular region will move as shown in
Figure (6).
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
389
Figure 6. Max pooling layer.
T
he pooling zones overlap if the stride dimensions are smaller than the
corresponding pooling dimensions [28]. To overcome this problem, we have to choose
a size equal to or greater than the pooling size (in our model we used a stride size
equal to the pooling size), to prevent the overfitting issue by reducing the number of
parameters. Each (x × y) area represents a feature map, for example, the green area
to the left of the figure is changed to a single number (the green area with the number
‘4’ in the right of the figure tallied as the max of [x y] values. The resultant feature is an
array with d values, where d is the number of filters. Introduced in[28],
maxPooling2dLayer
provides a more accurate classification. Additionally, the Max
pooling layer offers the key functionality (i.e., edges), Additionally, it is more adept at
handling the extraction of the extreme characteristics. This implies that feature
mapping truly uses all values [29].
The same is for the 2D average pooling layer shown in Figure (7) which
downsamples the pooled area by splitting the features into the square area and
calculating the average of that area.
Figure 7. 2D average pooling layer.
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
390
4.4.3. FLATTEN LAYER
The square or rectangle input features are converted into a single dimension by a
flattened layer since rectangular or cubic forms cannot be used as direct inputs
(convert images to feature vectors). Figure (8) shows a flattened layer applied on a
pooled feature map.
Figure 8. Applying a flattened layer.
The reason for flattening the intermediate outputs (feature maps) is: After the
pooling layer, we will get a feature map with different heights and widths for each
column as shown in the activation of Table (3), which is already obtained from images
after passing through a lot of layers. Flatten layers allow us to add features of different
lengths.
Table 3. Activations properties of each column.
For our problem we have only five classes, so, we have to flatten the output of
each pooling layer and pass through a neural network that has the number of output
layers corresponding to the number of classes (5). In the imaging context, these are
referred to as linear layers. In other words, the convolution layer acts as a feature
extractor which helps a fully connected layer to do the task of classifying the images.
A 1D array is used by these fully connected layers to carry out the categorization. To
facilitate smooth 1D array operations, we flatten the data as we analyzed the images.
Column Activations After flattening
ResNet50 1(S) x 1(S) x 2048(C) 4048(C)
ResNet18 1(S) x 1(S) x 512(C) 512(C)
Low_R18 7(S) x 7(S) x 256(C) 12544(C)
Low_R50 7(S) x 7(S) x 64(C) 3136(C)
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
391
4.4.4.LSTM LAYERS (LONG SHORT-TERM MEMORY)
The long-term relationships between time steps in time series and sequence data
are learned by an LSTM layer. The additive interactions carried out by this layer
during training may aid in enhancing gradient flow over lengthy periods. Recurrent
neural networks of the (LSTM) kind can learn order dependency in sequence
prediction tasks, and complicated problem domains demand this behavior. It contains
feedback connections, which means that aside from single data points like photos, it
can interpret the complete sequence of data. The theory of LSTM is shown in the
following figure (9).
Figure 9. LSTM layer architecture
An important role in an LSTM model is played by a memory cell known as a "cell
state" that maintains its state across time. The horizontal line that passes across the
top of the aforementioned figure indicates the cell state. It may be seen as a conveyor
belt across which data simply and unaltered passes. Information removal from or
addition to the cell state is controlled by the LSTM layer gates. These gates could let
data through and out of the cell. It has a sigmoid neural network layer and a pointwise
multiplication operation that support the technique shown in Figure (10). The sigmoid
layer generates numbers ranging from 0 to 1, where 0 denotes that nothing should be
let through and 1 denotes that everything should be allowed through. Simple addition
or multiplication operations that pass-through cell states are used by LSTM to make
little changes to the data. The LSTM selectively forgets and recalls information in this
way. The quantity of data remembered in between time steps (the hidden state) is
configured to match the number of hidden units (10). Regardless of the duration of the
series, the concealed state can contain data from every previous time step.
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
392
Figure 10. Multiplication operation and a sigmoid neural netlayer
4.4.5. DENSELY CONNECTED LAYERS
Finally, the output features of the concatenation layer are fed again to a dropout
layer, then to a densely connected layer (fully connected layer) followed by a “layer
normalization layer and”, then to a Softmax logistic regression layer. The reason
behind using the layer normalization layer is to speed up the training of recurrent and
multilayer perceptron neural networks and reduce the sensitivity to network
initialization. After normalizing progress, the input to the layer is scaled with a factor γ
called “learnable scale” and then it is shifted by the learnable offset β
(which are left
as their default values).
5. EXPERIMENTATION SETTINGS
The five classes are presented with several images as in Table (2). The images are
in (.jpg) format and have a size of 389 by 584 pixels. Figure 11 highlights the
variability of the five classes. Before the learning and classification steps, the input
images are used directly after resizing them to (224, 224, 3), with no other previous
filtering or pre-processing. The combination of the two models with the low features
allows for achieving an improvement in air quality prediction by about 6.31% (with a
learning rate LR=0.0007). The dataset was expanded through image data
augmentation and obtained using Python programming language and achieved by
flipping the images horizontally to one side only (right). We specified No special
parameters in Matlab for image data augmentation. For columns (Low_R18 and
Low_R50), a dropout of 0.5 is used to prevent the overfitting of the training process.
Followed by two rows of 2D_max pooling and 2D average pooling layers. From this
point, all the columns, then are passed to flatten layers to convert the features map
into vector form followed by LSTM layers. The experiments were set with the following
parameters (option of training): Solver for training network = sgdm (use the stochastic
gradient descent with momentum optimizer), This work, tried different learning rates
(0.007, 0.0007, and 0.00007), with tested batch sizes equal to 10 and 60 epochs. The
number of iterations per epoch was 149 of a total of 8940 iterations. The loss function
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
393
computes the cross-entropy loss between the labels and predictions for the training
set and the validation set.
Figure 11. A variability of the five classes starting with Level 1, Level 2, Level 3, Level 4 and
Level 5.
The proposed model is trained first on 80% of the dataset while 20% of the training
data are used for the validation step and 20% for testing. That led to 65% of the data
for training, 15% for validation, and 20% for testing. The statistics about the training,
test, and validation for the sub-images datasets are detailed in Table (4).
Table 4. Datasets statistics for training, validation, and testing.
The environment for the experiment used along with MATLAB R2022a is as
follows:
System Manufacturer: LENOVO LEGION5
OS Name: Microsoft Windows 11 Home
System Type: x64-based PC
Processor: Intel(R) Core (TM) i7-10750H CPU @ 2.60GHz, 2592 Mhz, 6 Core(s), 12
Logical Processor(s)
Installed Physical Memory (RAM): 16.0 GB
5.1. IMAGES CLASSIFICATION RESULTS
As the number of images is limited and too low, two pre-trained models are used.
dataset augmentation, learning transfer, allows for to prevent or reduce the overfitting
issue by dataset increasing and enhance the training process. The classes are
created by splitting the utilized image dataset (each of size 389 by 584) into five
classes each of size (224 by 224). The estimated level of the five sub-images classes
is obtained using the presented deep model. In this paper, first, a zero-weight
ResNet50 is used, to test a model that hasn’t been trained before to show if using a
small image dataset could achieve reasonable accuracy or not. It is found that the
zero-weight ResNet50 could not exceed 28% of accuracy, whatever the learning rate
Set Percentage No. of images
Training 65 % 1368
Validation 15 % 316
Test 20 % 420
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
394
Table (5).
Table 5. ResNet50 accuracies to LR.
(6).
Table 6. The new model accuracies for LR.
the features four times.
ResNet50
Learning rate Avg. Validation acc. Avg. test acc. Avg. total time
0.007 68.49 67.64 51.34
0.0007 70.92 70.88 54.59
0.00007 66.91 67.52 58.54
The New Proposed Model
Learning rate Avg. Validation acc. Avg. test acc. Avg. total time
0.007 72.92 71.88 82.84
0.0007 77.17 75.31 81.21
0.00007 73.54 73.27 88.41
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
395
6. CONCLUSION AND FUTURE WORK
In this work, a deep architecture for the classification of air quality based on PM2.5
concentration levels images is proposed. The new proposed model allows for the
discrimination of the low and high features. Compared to existing research, the
performance of our approach outperforms state-of-the-art methods. From the results,
because the new models have a higher number of layers and extract the features four
times, the new models take a longer time per epoch for learning. The new proposed
model achieved an accuracy increment of about (6.31% at LR=0.0007), while it
achieved (4.31% and 6.1% with LR=0.007 and 0.00007 respectively) as clarified in
Tables (5 and 6). In the future, our approach can be extended by adding other models
to the proposed one in this paper, or by utilizing other pre-trained models. Also, a
multi-scene image dataset can be used next.
REFERENCES
(1) Silva, L. F. O., Oliveira, M. L. S., Neckel, A., Maculan, L. S., Milanes, C. B.,
Bodah, B. W., & Dotto, G. L. (2022). Effects of atmospheric pollutants on
human health and deterioration of medieval historical architecture (North
Africa, Tunisia). Elsevier BV. https://doi.org/10.1016/j.uclim.2021.101046
(2) Brook, R. D., Brook, J. R., & Rajagopalan, S. (2003). Air pollution: The “heart”
of the problem. Springer Science and Business Media LLC. https://doi.org/
10.1007/s11906-003-0008-y
(3) Landrigan, P. J. (2017). Air pollution and health. Elsevier BV. https://doi.org/
10.1016/s2468-2667(16)30023-8
(4) Zheng, S., Wu, X., Lichtfouse, E., & Wang, J. (2022). High-resolution mapping
of premature mortality induced by atmospheric particulate matter in China.
Springer Science and Business Media LLC. https://doi.org/10.1007/
s10311-022-01445-6
(5) Liu, C., Tsow, F., Zou, Y., & Tao, N. (2016). Particle Pollution Estimation
Based on Image Analysis (H. Liu, Ed.). Public Library of Science (PLoS).
https://doi.org/10.1371/journal.pone.0145955
(6) Won, T., Eo, Y. D., Sung, H., Chong, K. S., Youn, J., & Lee, G. W. (2021).
Particulate Matter Estimation from Public Weather Data and Closed-Circuit
Television Images. Springer Science and Business Media LLC. https://doi.org/
10.1007/s12205-021-0865-4
(7) Rijal, N., Gutta, R. T., Cao, T., Lin, J., Bo, Q., & Zhang, J. (2018). Ensemble of
Deep Neural Networks for Estimating Particulate Matter from Images.
Presented at the 2018 IEEE 3rd International Conference on Image, Vision and
Computing (ICIVC). https://doi.org/10.1109/icivc.2018.8492790
(8) Kow, P.-Y., Hsia, I.-W., Chang, L.-C., & Chang, F.-J. (2022). Real-time image-
based air quality estimation by deep learning neural networks. Elsevier BV.
https://doi.org/10.1016/j.jenvman.2022.114560
(9) Elmannai, H., Hamdi, M., & AlGarni, A. (2021). Deep Learning Models
Combining for Breast Cancer Histopathology Image Classification. Springer
Science and Business Media LLC. https://doi.org/10.2991/ijcis.d.210301.002
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
396
(10) Liu, Chenbin; Tsow, Francis; Zou, Yi; Tao, Nongjian (2016): Particle pollution
estimation based on image analysis. figshare. Figure. https://doi.org/10.6084/
m9.figshare.1603556.v2
(11) Airnow.gov. (n.d.). U.S. Embassies and Consulates - China - Shanghai.
Retrieved April 20, 2023, from https://www.airnow.gov/international/us-
embassies-and-consulates/#China$Shanghai
(12) Shahinfar, S., Meek, P., & Falzon, G. (2020). “How many images do I need?”
Understanding how sample size per class affects deep learning model
performance metrics for balanced designs in autonomous wildlife
monitoring. Elsevier BV. https://doi.org/10.1016/j.ecoinf.2020.101085
(13) Weiss, K., Khoshgoftaar, T. M., & Wang, D. (2016). A survey of transfer
learning. Springer Science and Business Media LLC. https://doi.org/10.1186/
s40537-016-0043-6
(14) Shimodaira, H. (2000). Improving predictive inference under covariate shift
by weighting the log-likelihood function. Elsevier BV. https://doi.org/10.1016/
s0378-3758(00)00115-4
(15) Gilik, A., Ogrenci, A. S., & Ozmen, A. (2021). Air quality prediction using
CNN+LSTM-based hybrid deep learning architecture. Springer Science and
Business Media LLC. https://doi.org/10.1007/s11356-021-16227-w
(16) Ramirez, O. J. V., Cruz de la Cruz, J. E., & Machaca, W. A. M. (2021).
Agroindustrial Plant for the Classification of Hass Avocados in Real-Time
with ResNet-18 Architecture. Presented at the 2021 5th International
Conference on Robotics and Automation Sciences (ICRAS). https://doi.org/
10.1109/icras52289.2021.9476659
(17) Li, Y., Huang, J., & Luo, J. (2015). Using user generated online photos to
estimate and monitor air pollution in major cities. Presented at the ICIMCS
’15: International Conference on Internet Multimedia Computing and Service.
https://doi.org/10.1145/2808492.2808564
(18) He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for
Image Recognition. Presented at the 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2016.90
(19) Tian, X., & Chen, C. (2019). Modulation Pattern Recognition Based on
Resnet50 Neural Network. Presented at the 2019 IEEE 2nd International
Conference on Information Communication and Signal Processing (ICICSP).
https://doi.org/10.1109/icicsp48821.2019.8958555
(20) Yang, X., Yang, D., & Huang, C. (2021). An interactive prediction system of
breast cancer based on ResNet50, chatbot and PyQt. Presented at the 2021
2nd International Seminar on Artificial Intelligence, Networking and Information
Technology (AINIT). https://doi.org/10.1109/ainit54228.2021.00068
(21) Kumar, D., Sharma, P., Anupama, A., & Sharma, P. (2022). A Performance
Study on Deep Learning Covid-19 Prediction through Chest X-Ray Image
with ResNet50 Model. Presented at the 2022 Second International Conference
on Advances in Electrical, Computing, Communication and Sustainable
Technologies (ICAECT). https://doi.org/10.1109/icaect54875.2022.9807920
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
397
(22) Garbin, C., Zhu, X., & Marques, O. (2020). Dropout vs. batch normalization:
an empirical study of their impact to deep learning. Springer Science and
Business Media LLC. https://doi.org/10.1007/s11042-019-08453-9
(23) Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R.
R. (2012). Improving neural networks by preventing co-adaptation of
feature detectors (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1207.0580
(24) Zeiler, M. D., & Fergus, R. (2013). Stochastic Pooling for Regularization of
Deep Convolutional Neural Networks (Version 1. arXiv. https://doi.org/
10.48550/ARXIV.1301.3557
(25) Santos, C. F. G. dos, Roder, M., Passos, L. A., & Papa, J. P. (2022).
MaxDropoutV2: An Improved Method to Drop Out Neurons in
Convolutional Neural Networks. Springer International Publishing. https://
doi.org/10.1007/978-3-031-04881-4_22
(26) do Santos, C. F. G., Colombo, D., Roder, M., & Papa, J. P. (2021). MaxDropout:
Deep Neural Network Regularization Based on Maximum Output Values.
Presented at the 2020 25th International Conference on Pattern Recognition
(ICPR). https://doi.org/10.1109/icpr48806.2021.9412733
(27) Wu, H., & Gu, X. (2015). Max-Pooling Dropout for Regularization of
Convolutional Neural Networks. Springer International Publishing. https://
doi.org/10.1007/978-3-319-26532-2_6
(28) Fouad, M. M., Mostafa, E. M., & Elshafey, M. A. (2020). Detection and
localization enhancement for satellite images with small forgeries using
modified GAN-based CNN structure. Universitas Ahmad Dahlan, Kampus 3.
https://doi.org/10.26555/ijain.v6i3.548
(29) Ma, Z., Chang, D., Xie, J., Ding, Y., Wen, S., Li, X., … Guo, J. (2019). Fine-
Grained Vehicle Classification With Channel Max Pooling Modified CNNs.
Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/
tvt.2019.2899972
https://doi.org/10.17993/3ctic.2023.121.378-398
3C TIC. Cuadernos de desarrollo aplicados a las TIC. ISSN: 2254-6529
Ed.42 | Iss.12 | N.1 January - March 2023
398