Design and Optimization of a Deep Neural Network Architecture for Trafﬁc Light Detection

. Autonomous Driving has recently become a research trend and e ﬃ cient autonomous driving system is di ﬃ cult to achieve due to safety concerns, Applying tra ﬃ c light recognition to autonomous driving system is one of the factors to prevent accidents that occur as a result of tra ﬃ c light violation. To realize safe autonomous driving system, we propose in this work a design and optimization of a tra ﬃ c light detection system based on deep neural network. We designed a lightweight convolution neural network with parameters less than 10000 and implemented in software. We achieved 98.3% inference accuracy with 2.5 fps response time. Also we optimized the input image pixel values with normalization and optimized convolution layer with pipeline on FPGA with 5% resource consumption.


Introduction
Recently, neural network have been applied in many fields such as farm monitoring [1], autonomous driving [2], natural image processing [3] and character recognition [4]. Traffic light detection is one of the important factors to prevent traffic accidents in advance [5].To realize safety autonomous driving system, we developed a system for traffic light detection. 11,652 [6] traffic accidents due to ignorance or violation of traffic light signals were recorded in japan in 2019. However, unrecorded traffic accidents are clearly more. Thus, traffic light detection to prevent violation of traffic light signals in advance is one of the big advantages of autonomous driving system. Improving recognition accuracy contributes to improving safety and realizing efficient autonomous driving system. However, There are factors to consider which include the maneuverability and the power constraints of autonomous driving car environment. Thus, implementation with real time processing in compact circuit size and extremely high accuracy are necessary. Traffic light detection is also object detection, we can realize high accuracy with high speed using well known object detection algorithm such as VGG [7] base Single Shot Multibox Detector SSD [8] and You Only Look Once (Yolo) [9]. However, as we mentioned above, since this detection algorithm requires large architecture and also high power, these object detection algorithms are not suitable for traffic light detection in a power constrained autonomous driving car environment [5]. In this work, we proposed an architecture for traffic light detection. This paper contributes the design of optimized neural network for traffic light detection system both in software on CPU and hardware on FPGA. In next chapter, we introduce basic knowledge for neural network.

Artificial Neuron
Artificial neuron is a mathematical model of biological neuron. It abstracts the synapse connection, internal potential and neuron firing mechanism of biological neuron. Figure 1 shows a simple biological neuron model and equation (1) shows the mathematical model of the biological neuron model. Strength of synapse connection is represented by w k , input signal by x k , membrane potential of each axon by y m , and final output signal y m is obtained by multiplying Input x k and weight w k then the bias b is added. n is number of input.
(1) Figure 2 shows an example of multilayer perceptron [5]. Multilayer perceptron has input layer, more than two hidden layer and output layer. Input image is given as one dimensional array x k and two dimensional array weight w k . As we mentioned previously , multiplication between input x k and weight w k is performed. And the product is sent to the activation function. The result from the activation function is then sent to the next layer. The activation function could be, Identity activation function , used in regression problem, sigmoid function used in binary classification problem [10], or softmax activation function used in multi class classification problem [10]. In Figure 2, final classification result is represented by probability.

Convolutional Neural Network
Convolutional neural network is one of the multi layer perceptron. Figure 3 is an example of convolutional neural network LeNet-5 [11]. Convolutional neural network consists of mainly convolution layer, pooling layer and fully connected layer.

Convolutional Layer
Convolution layer which we developed has three channel RGB color image for traffic light image , and background image is also given as input. There are green, yellow, red and background. In convolution layer, input image is extracted as feature using convolution filter. The output image size after it is extracted using convolution filter is given by following formula (2) and (3).
Each variable used in the formula, W out ,H out , W in , H in , stride, padding and f ilter correspond to width of output image, height of output image, row of input image , column of input image, stride size, padding size and filter size respectively. In this work, we used two convolution layer.

Pooling Layer
In pooling layer, row and column of image is shrinked. Pooling is robust to microscopic gap because result of pooling will be same result to small gap of image. In this work, we used max pooling layer which extract a maximum pixel value after convolution layer.

Fully Connected Layer
Fully Connected Layer performs flatten operation on the output from convolution layer or pooling layer is given as input. Final recognition accuracy is output as probability.

System Overview and Optimizations
Figure 4 [5] shows the overview of the proposed traffic light detection system. The image acquired from the camera is fed as input to the system. In region proposal phase, input image is divided into some appropriate size image blocks using slide window for image classification. In classification phase, image is classified as one of four classes; green, yellow, red and background using convolution neural network (CNN). In detection phase, traffic light Is detected using trained parameters.

pragma PIPELINE
pragma HLSPIPELINE in Figure 5, is used to increase the parallelism and calculation speed of the loops required for multiplication of weight and input values. In the case of this experiment, SDx Pipeline is used in convolutional layer.

Normalization
Normalization in Figure 6 is used to make CNN work faster. In this experiment case, training and testing image values are normalized between -1 to +1 by dividing by 255.0. Table 1 shows the proposed neural network architecture for traffic light detection which performed training and detection in software , using CPU on Ubuntu 16.04. We used two convolution layers and two dropout layers. In dropout layer, connected edges to neuron are chosen randomly, then the randomly chosen weight value will be zero and trained. The number of trained weights and bias is 7884. This neural network architecture is lightweight because the number of parameter is less than 10000.It is suitable for hardware implementation in terms of hardware complexity. The inference accuracy achieved 98.3 % with the 2.5 fps response time for detection. The result of prediction accuracy is preliminarily acceptable for autonomous driving system.

Hardware Complexity
In this experiment, Zynq Ultrascale MPSoC+ FPGA is used for implementation. Figure 7 shows a demonstration of traffic light image classification on FPGA. Table  2 presents the result of evaluated hardware complexity of optimized convolutional layer using Xilinx SDx. Resource consumption of BRAM18K, DSP48E, FF and LUT are respectively 50 , 41 , 4859 and 5866. We optimized convolution layer for traffic light detection using 5% FPGA resource.

Conclusion and Future Works
In this work, we designed and evaluated neural network for traffic light detection with different types of activation function achieving 98.3% inference accuracy. We also implemented, optimized and evaluated the hardware complexity of the traffic light classification on FPGA using only 5% of FPGA resources. In the future, we optimize execution time per image for detection with Spiking Neural Network(SNN).

Discussion
The classification accuracy is 98.3 % with ReLu activation function and 97.9% with sigmoid activation function.In Table 3, accuracy is improved 0.4% with Relu compared to sigmoid function. There is no big difference between the two result , Relu activation function is suitable for hardware implementation because it is easier to implement.