Research on recognizing required items based on opencv and machine learning

. Starting from the background of the outbreak of New Coronavirus, in order to realize the function of automatically identifying the required items by machine, the support vector machine algorithm in the neural network and the traditional computer vision algorithm opencv were used. The software developed by pycharm and python programming language was used to compile automatically a software to identify whether the required items were filled out. And on the basis of completing the software, it is connected to the embedded device high-speed clapper. It is applied to Fuzhou Customs to help the customs staff review the health form and declaration card of inbound and outbound passengers, which not only saves the time of staff and passengers, but also contributes to the prevention and control of epidemic situation to a certain extent.


Introduction
At the end of 2019, a sudden virus was found in Wuhan and completely broke out in people's panic. This is another national outbreak of infectious diseases since SARS. While we saw the dawn in the difficult struggle against the epidemic, the epidemic broke out abroad and was much more serious than the epidemic in our country. Therefore, in this context, the staff of China Customs in various regions of China who control the entry of overseas personnel have a very arduous task, that is, to control the country and investigate the infected persons imported from abroad. In Fuzhou Customs, every person entering China from abroad needs to fill in a passenger health form, and some data must be filled in. Before, the staff of the customs carried out manual observation, which greatly increased the workload of the staff and made the efficiency very low, Therefore, the content of this research can liberate the staff from the tedious affairs of watching the health table and spend their time on more meaningful things.
The main research topic of this paper is to use AI to automatically identify whether the required items of entry-exit personnel's health form have been filled in. Through the above analysis, it can be seen that the severe epidemic situation makes how to carry out efficient epidemic prevention and control become the top priority. After full investigation in the above fields, this paper designs and implements the algorithm for AI to automatically identify the mandatory items of passenger health form.
The research content of the whole subject can be divided into the construction of passenger health table data, in which the construction of data set is to fill in the printed health table. Firstly, the registration of the completed passenger health form is filled in and sorted out. Because the recognition accuracy of small frame objects with traditional opencv is very low, SVM is used for training to identify small frame objects, so as to distinguish the training set and test set. Finally, SVM algorithm is used for training on the trained samples, Finally, the test set is tested according to the actual needs of the scenario, and the final results are evaluated.

Opencv features
Opencv was born by focusing on the problem of computer vision. Since its birth, opencv has been a completely open source library, and it can run across platforms. Opencv supports many different systems, including but not limited to the two most popular operating systems: Linux operating system and windows operating system. Opencv was born in Intel. His overall code was written in C language / C + + language. It is a general computer vision library. At that time, the designer of OpenCV hoped that he can quickly read and deal with computer vision problems. Because C / C + + language is faster than other languages, he used C language / C + + language to write opencv library. His appearance provides a lot of convenience for the technicians who study computer vision, so that they can solve complex computer vision problems in a simple and understandable framework, and he has good portability, especially in dealing with image problems.
The source code of OpenCV is free and open source, so it is not only conducive for researchers to study it and develop better code, but also support businesses to benefit from its framework. Not only that, the optimized code in opencv can be embedded into the people who use it, which is really mutually beneficial in a win-win situation.
In the whole opencv library, there are more than 500 functions, and these functions support cross platform. These functions have penetrated into all fields of computer vision, not only image recognition and human-computer interaction, but also into every place of computer vision. In addition, opencv also provides an efficient interface for multi-media library functions for IPP, and the code can be optimized for the CPU used, so as to improve the performance of the program.

Machine learning
Machine learning is a popular subject recently. It involves many mathematical subjects, such as statistics, probability theory, algorithm complexity theory and so on. In fact, in a broad sense, machine learning is essentially to let machines learn new things. The algorithm in machine learning can automatically analyze the results and laws from the data, and use the laws to predict the unknown data. In machine learning, it involves a lot of mathematical knowledge, especially statistics. Therefore, if you want to learn the content of machine learning and conduct in-depth research, you must learn mathematics well. In modern life, with the rapid development of science and technology, machine learning has been deeply applied in many fields, such as data mining, computer vision, search engine, medical diagnosis, robot application and so on. Up to now, the research work in the field of machine learning has mainly focused on the following three aspects: A series of research and improvement on some object-oriented tasks are carried out It can recognize the model and simulate human learning and work Theoretically analyze and explore various algorithms and their application scenarios It can be said that machine learning is very popular now. It can be said that it is a historic breakthrough on the road of studying artificial intelligence. It is the core research topic of neural computing and artificial intelligence. But even now, there are still some systems without learning ability. Even if some of them have learning ability, it has little effect. It can be said that they can not accurately keep pace with the times. Therefore, in the follow-up development, the discussion of machine learning and the progress of machine learning research will promote the further development of artificial intelligence and the whole science and technology.

Principle of SVM
SVM is a kind of machine learning. It is also called support vector machine. It is a classifier dealing with binary classification model. The idea of SVM algorithm is that we artificially give an example of positive and negative samples, and his purpose is to use a hyperplane to segment the examples of positive and negative samples. Many people who have used him have a surprisingly high evaluation of him, because SVM shows a very efficient performance in dealing with nonlinear and small samples, and can perfectly fit different machine learning problems. It can be said that there is such a hot trend in the development of machine learning, and SVM algorithm accounts for most of the credit. The core idea of SVM is to maximize the interval between the two categories, so as to maximize the credibility. Moreover, it has good classification and prediction ability for unknown new samples (called generalization ability in machine learning). Then, when executing, we will find that how should we describe this interval and make it tend to the maximum? The method used in SVM is to make the data point closest to the separation point have the maximum distance. In the application practice of researchers for many years, they found that support vector machine has a very excellent effect in dealing with nonlinear samples and small target samples. Therefore, SVM algorithm is widely used in different industries, such as medical field, error diagnosis, pattern recognition and so on. Core concepts of SVM: (1) If a function is convex, the local optimal solution of the unconstrained optimization problem of the objective function is the global optimal solution.
(2) Maximum interval: that is, among the points divided into two categories, the sum from the point closest to the hyperplane to the hyperplane is the smallest.
(3) The convex hull of the training vector set contains the minimum convex set of all training sample sets of a certain class.
(4) Relaxation factor: make the optimization problem tolerate some misclassified points. The penalty factor is the penalty degree of the sacrificial classification accuracy.

Processing with OpenCV
Because the problem designed in this project is the problem of computer vision, and opencv is an open source library dedicated to dealing with computer vision, opencv is selected for code processing and implementation. When you get the data, that is, a passenger entry-exit health declaration form, you need to fill it in manually, and you need to add training samples to facilitate machine learning. First read in the file, and then save the picture to the specified path. Secondly, because this is an image recognition project, I need to scale his consent to a certain proportion, so as to facilitate the follow-up work of the machine. Because the work is to judge whether some required items are filled in, I wrote an algorithm to judge whether they are filled in: the computer actually sees each pixel when recognizing the image, and integrates them. Therefore, when I judge whether to fill in the required item, I detect whether there are more than three black pixels in the content of the required item. If there are, I judge that the item has been filled in. If not, I judge that the item has not been filled in. Since the passenger entry-exit health declaration card is a fixed size paper, it is necessary to find a reference point on the image, and then take this reference point as the initial coordinate. Therefore, I use Photoshop software, which mainly deals with digital images composed of pixels. Using its many editing and drawing tools, you can edit pictures effectively. I chose three benchmark points for testing. Finally, the center point of the customs logo is used as the benchmark point, because the image needs to be binarized later. Due to the influence of various factors, gray points may be generated around the binarized image, which will affect some nearby pixels, Therefore, selecting the center point of the customs logo can prevent it from being affected by some irregular pixels nearby. As shown in Figure 1, the white pixel at the center of the logo mark is the mark used as the reference point in this paper.

Training with SVM
In the passenger entry-exit health declaration card, one required item is gender, and the choice of gender type is to tick in the small box. If the traditional opencv algorithm is used to identify whether the passenger has checked, the accuracy is not high. First, the small box is too small to identify, and second, some passengers will tick outside the box, As a result, the accuracy of recognition is very low, so SVM is used for training in this field. It was originally considered to use convolutional neural network for training, because the performance of convolutional neural network in image recognition is better than SVM, but because only the small box of gender needs to be trained by neural network, there will be no significant difference in performance, The speed of SVM is better than convolutional neural network. Under the background of the epidemic, the focus is to evacuate passengers as soon as possible, because many people will lead to the risk of epidemic infection if they gather together. Therefore, SVM is considered to be used repeatedly for auxiliary training. Before training, you need to divide the training set and test set, so you can classify the empty frames into one class and the checked frames into one class. After these, you can start SVM training. As shown in Figure 2 and

Conclusion
The function of this experiment is the design of AI recognition mandatory item system. The traditional computer vision algorithm of OpenCV and the novel machine learning algorithm SVM are used to jointly realize the code of this project, which is deployed to the high-speed camera, and the final accuracy can reach 92%. The functions realized in this experiment basically meet the function of using the machine to identify whether the required items in the passenger health statement are filled in. Under the background of preventing and controlling the epidemic imported from abroad, it greatly saves the burden of staff and passengers' time.
This work was supported by 2020KQNCX138.