Research on the Optimization of Defect Detection Based on Convolutional Neural Network Architecture

: Machine learning is a method generally used in defect detection of smart manufacturing. It uses data and algorithms to simulate the function of human’s brain, and the accuracy can be improved by repeated machine training. Neural networks such as convolutional neural network is an effective method used in machine learning to achieve defect detection in smart manufacturing. Through a systematic lecture review, this gives the architecture of the convolutional network and provides its model, which includes three layers: convolution layer, pooling layer and fully-connected layer, whose functions are determining the way of the neurons network connections, reducing the amount of parameters within activations by downsampling along with the spatial dimensionality of the inputs and produce class scores from the activations as well as suggesting the usage of Rectified Linear Unit to improve performance respectively, then points out two challenges of applying convolutional neural network in practical of defect detection, which is the class imbalance problem and insufficient data problem. Following these problems, three solutions namely, Object-level Attention Mechanism; PaDiM and SDD-CNN are discussed. In addition, this paper also identifies the topics for future study at the end.


INTRODUCTION
Since technologies such as big data, the Internet of Things, cloud computing and edge computing have been advanced, there is a growing interest in smart manufacturing in recent years.According to Statista [1], the scale of the smart manufacturing systems market in China from 2016 to 2019 along with a predicted trend until 2025 is shown in figure1.Hence, there is a growing need of deploying machine learning strategies in defect detection.In terms of the original intention of convolutional neural network (CNN), which is used in pattern recognition to do the image analysis [2], it has prominent performance in the field of Machine Vision and Image Processing by now.It has been applied in multiple areas, such as mage Classification and Segmentation, Object Detection, Video Processing, and Speech Recognition [3].Therefore, it has been currently used in the defect detection of smart manufacturing.However, there are some difficulties in implementing CNN in a real-world scenario.This paper mainly studies two typical problems: the insufficient data problem and the imbalanced class problem of adopting CNN and how they influence the performance of CNN as well as provides three solutions to the problems.The solutions include an Efficient CNN model based on an object-level attention mechanism for casting defects detection on radiography images [4]; Small Data-Driven Convolution Neural Networks (SDD-CNN) [5] and PaDiM [6].The research in this paper uses a systematic approach proposed by Tranfield, Denyer [7] to investigate the challenges with CNN, and to do the survey of finding the solutions.The process of this approach is discussed in detail in the methodology section.The significance of this research is it not only reveals the critical issues of applying CNN in real smart manufacturing scenario, but also lists the solutions of the issue to improve the CNN.Thereby, it allows CNN to be adopted in defect detection more effectively and accurately.

METHODOLOGY
To find out the challenges of applying the CNN network in real smart manufacturing scenarios as well as providing solutions, a systematic approach proposed by Tranfield, Denyer, and Smart is carried out to conduct this research.To achieve this aim, three key processes need to be conducted to gather the literature and investigate the problems, which are looking for a clear definition of CNN, finding the challenges with CNN in defect detection, and investigating the solutions respectively.Fig. 1 shows a comprehensive procedure of the methodology used to gather the corpus.The search engine 'Google Scholar' is used to search for the literature.In terms of the function, the convolutional layer is used to decide which output neurons will connect to local regions of the input [8]; pooling layer is a location operation that collect similar information near the receptive field and outputs the dominant response within this local region [9]; fully-connected layer is used to produce class scores from preceding layers to assist the classification at the end of the network.It is also suggested that Rectified Linear Unit (ReLu) may be used between these layers, in order to improve the performance.

Insufficient data problem
CNN networks has a feature that it can only learn the characteristics from the training datasets hence the accuracy can be low if the characteristics and features in the training datasets are too much different from the image in real scenarios, this problem can be diminished if the training set is truly comprehensive [12], hence when training the CNN networks, large datasets are needed to serve as the input.Although techniques such as data augmentation, which augments some datasets with labelpreserving transformations, require more than hundreds of images in practice [13], the number of the required image is depend on the intricacy of the problem to be analyzed.
Barbedo (2018) [12] carries out a study that applies CNN in the field of plant pathology to investigate the effect of the scale and diversity of the datasets on deep learning techniques.In the study, the plant species, diseases and image capture conditions in the database are all varied.For inadequate samples, CNN cannot capture the features and variety of each class entirely.Also, the accuracy of CNN will be impacted if new samples are added to the database.In addition to this, if the newly added sample's characteristics of the new images are closer to the training set, the accuracy will be improved, on the contrary, if its characteristics are more diverse, the accuracy will decrease.Consequently, in order to achieve solid results in defect detection, CNN training requires a substantial number of images and database that covers all kinds of defects.However, the casting process is much more complex and diverse than expected, so works in process may have various types and degrees of disorder in the field of defect detection.Moreover, the background can cause sever impact on the results because all the image were caught under realistic conditions.As a result, it is impossible to capture all the images of disorders with different backgrounds and correctly label them to build a database that covers all kinds of defects.[11] studies the effect of class imbalance in CNN, in the study they adopt three standardized datasets of increasing complexity, which are called MINST) [15], CIFAR-10 [16] and ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [17] respectively to investigate the effects of the imbalanced class problem.According to the study, class inequalities have a considerable negative impact on performance.Apart from that, when the ratio of examples in the majority and minority classes grows, as does the number of minority classes, the performance of the resulting classifiers suffers.When comparing the output findings from MINST and CIFAR-10, it can be shown that the influence induced by the imbalance class is more stronger when the task is more complicated.[11].

Imbalance class problem
In an industrial application, the detection targets are often tiny, local and faint, which causes the problem of the insufficient datasets and imbalance class, thereby making the CNN inefficient since training it need a large quantity of data with precise annotations [4].

A novel Convolutional Neural Network Model that take the advantage of Model Based on Object-Level Attention Mechanism
This is a novel CNN model combined with a training strategy proposed by Hu and Wang [4] to improve the machine's ability to detect local contrast casting defects to conduct an industrial inspection efficiently in the complex scenario.
In this study, defects' the spatial information is not given and the datasets are only annotated with defect or non-defect image-level labels, hence, in a complex scenario, an attention mechanism is required.In order to implement the defect detection and defect inspection, a new training strategy which does not need additional network structures is proposed by Hu and Wang [4] to form an object-level attention mechanism.The overall planning is to firstly let the model pay attention to a certain type of casting in the image, after that, the model is taught to derive defects accurately based on the cognition.Fig. 3 [18] illustrates the two stages of this strategy.Two datasets with type and defect labels are used to implement this technique, and the innovative CNN model is separated into two subnetworks, the Type Classification Module (TCM) and the Defect Classification Module (DCM).
TCM is in charge of irrelevant suppression and objectrelated representation in complicated scenarios via the object level attention mechanism to help the function of DCM, which excavates deeper characteristics in the object-related features from TCM in order to implement microscopic defect classification.
In the first stage of this strategy, train the TCM on the type data set with an additional classifier that used to distinguish the type of the products.Then use the trained TCM to extract more object-related features of the casting, after TCM converges the softmax layer is discarded.Following the first stage, in the secondary stage, fix all the parameters of TCM and only defect data set is trained.The process is use the trained TCM to provide object-related features for subsequent DCM with the adoption of objectlevel attention mechanism.Which means DCM is able to learn the defect features from effective areas with imagelevel supervision.
The last step of this strategy is to use a new approach which called bilinear CAM (Bi-CAM) to form the active feature map for bilinear architectures through the algorithm [4].The bilinear architecture in this model is on the basis of the CAM method proposed by Zhou, Bolei et al (2016) [19].The example of bilinear pooling proposed by Hu and Wang [4] can be expressed by the following three formulas, the meaning of the elements in the formula is shown in table 1.
Instance of bilinear pooling: The score of class c in the softmax layer: Ultimate formula of Bi-CAM: In the tests carried out by Hu and Wang (2019) [4], the accuracy, precision, recall, F-Measure, frame per second of the results increase in different degrees when compared with other models.

PaDiM
This solution is a new Patch distribution modeling framework to detect and localize anomalies in images at the same time in a one-class learning setting put forward by Defard et al (2021) [6].With the help of a CNN network that is already trained for patch embedding, and multivariate Gaussian distributions PaDiM can get a probabilistic expression of the normal class.Then, with the purpose of better locating anomalies, the PaDiM also leverages the correlation between different semantic levels of CNN.This Patch distribution modeling framework is composed of three sections, which is embedding extraction, learning of normality and interference respectively.
In the prior procedure of the embedding extraction process, a pretrained CNN is used to create patch embedding vectors since it can either output the defects' relevant features or avoid ponderous neural network optimization.The details of the patch embedding process are first, the PaDiM extract the Guassian parameters (μ , Σ ) for each image patch corresponding to position (i, j) from the training embedding vectors X , which are obtained from the N different training images and three different pretrained CNN layers [6].Fig.4 [20] demonstrates this process.The five steps in the training process are as follow: Step 1: Associate each patch of the normal images with its spatially corresponding activation vectors in the pretrained CNN activation maps.
Step 2: Connect the activation vectors from different layers to get the information and resolutions in different level, thereby, finegrained and global contexts can be encoded.
Step 3: Divide the input image in a grid of (i, j) ∈ [1, W] × [1, H] positions where W × H is the resolution of the largest activation map used to generate embedding.
Step 4: Associate each patch position in the grid to its embedding vector.
Step 5: Output the anomaly map by using patch embedding vectors from test images [6].
The next process according to Defard et al (2021) [6] is the learning of normality process, which is to let the machine learn the feature of the normal image at position (i, j).In this process Defard et al [6] first calculate the patch embedding vectors at (i, j), then assume that patch embedding vectors are generated by a multivariate Gaussian distribution and each possible patch position is associated with a multivariate Gaussian distribution in order to gather to information carried by the patch embedding vectors.
The last process is inference.The concept of the inference is to compute the anomaly map.In this process, Mahalanobis distance [21] is used to assign an anomaly score to the patch in each position (i, j) of an image.In this case, the high score on the map means the area has defects.Then uses the maximum of Mahalanobis distances [21] as the final anomaly score of the image.
In the test procedure, Defard et al [6] extend the evaluation protocol to non-aligned data, the result shows that the PaDiM model is more reliable on more realistic data.Meanwhile, its properties such as memory and low time consumption make it to be adopted various applications such as defect detection easily.

Small Data-Driven Convolution Neural Networks
This method is offered by Xu et al [5] for Subtle Roller Defect Inspection which has a low possibility of occurrence.
Raw image acquisition, roller ring region expansion, surface sample acquisition, small dataset preprocessing, CNN model training, and classification are the six components of an SDD-CNN [5].This method provides a train of thought to solve class imbalances; the architecture is discussed in detail below: In raw image acquisition, a monocular camera to capture the images and a customized light source system to adjust the lighting conditions of the images.Then, implement roller ring region expansion to facilitate the inspection algorithm.In this process, Polar-to-Cartesian (P2C) coordinate transformation is used to transform the ring shape image to a rectangular image.Thirdly, shear the rectangular image into a smaller image that fits the CNN training and prediction process.Next, is the small dataset preprocessing process.During this phase, the roller dataset is expanded twice.In the first expansion, Label Dilation (LD) [5] method is used to solve the imbalanced class problem.Then in the second expansion, Semi-Supervised Data Augmentation (SSAD) [5] is used.After the two expansion, divide the dataset into three subdatasets with a ratio of 3:1:1, which are a training set, verification set, and test set respectively.After the expansion of the dataset, the sample training can be conducted on CNN.Finally, classification can be done by the trained CNN through a sliding window to get the final inspection.Fig.5 [22] shows the architecture of an SDD-CNN.

CONCLUSION
After that a systematic investigation is conducted, the insufficient data problem and imbalance class problem of implementing CNN are considered to be two main problems of defect detection in smart manufacturing.This paper analyzes these problems.Following the analysis of the problems, three solutions namely, CNN model based on object-level attention mechanism for casting defects detection on radiography images [4], SDD-CNN [5] and PaDiM [6] are discussed in this paper.However, all these solutions are based on unsupervised and semi-supervised learning.Compare to supervised learning, these architectures mainly perform clustering and categorizing based on similarity, hence the results are not comprehensive.Characteristics like defect types cannot be precisely predicted.As a result, in future studies, architectures and solutions to provide comprehensive results of the defects are need to be studied.

Figure 1
Figure 1 Procedure of the methodology According to the results of stage 1, CNN architecture can be concluded as follows:There are three elements in the architecture of CNN, which are convolutional layers, pooling layers and fully-

Figure 2
Figure 2 Architecture of CNN.
Class imbalance means in deep learning based classifiers, the examples of some classes is far outnumbering than it in other classes in the training set.Proved by Japkowicz and Stephen (2002) [14], class imbalances has considerable effects on training the traditional classifier.Not only does it damage the convergence during the training phase but also affects the generalization of a model on the test set.Buda et al, (2018)

Figure 3
Figure 3 Two stages of the training strategy Type labels: type of casting Defect label: type of defect

Figure 4
Figure 4 Prior procedure of embedding extraction process (i, j): position in the images

Figure 5
Figure 5 architecture of a SDD-CNN.In Xu et al's experiment of SDD-CNN [5], this architecture is applied to three CNN models, which called, SqueezeNet v1.1, Inception v3, VGG-16, and ResNet-18.According to the experiment's results, SDD-CNN outperforms origin CNN in convergence speed, training time and classification accuracy.

Table 1 .
Illustration of terms in formula (1) (2) (3) Z Gram matrix associated with feature map  , (k = 1, . . ., D)  abbreviation of the feature map of channel k in the last convolutional layer of the classification network at spatial (x, y). weight corresponding to class c in the softmax layer for channel k  Outer product of   , i, j ∈ k.  Eigenvalues of  .