Pavement crack detection plays an important role in the field of road distress evaluation . Traditional crack detection methods depend mainly on manual work and are limited by the following: (i) they are time consuming and laborious; (ii) they rely entirely on human experience and judgment. Therefore, automatic crack detection is essential to detect and identify cracks on the road quickly and accurately . This procedure is a key part of intelligent maintenance systems, to assist and evaluate the pavement distress quality where more continual road status surveys are required. Over the past decade, the development of high-speed mobile cameras and large-capacity hardware storage devices has made it easier to obtain large-scale road images. Through mobile surveying and mapping technology, integrated acquisition equipment is fixed to the rear of the vehicle roof frame to monitor both the road surface and the surrounding environment. The images can be acquired by processing and storing pavement surface images that are realized . Currently, many methods utilize computer vision algorithms to process the collected pavement crack images and then obtain the final maintenance evaluation results .
Automatic crack detection is a very challenging image classification task with the goal of accurately marking crack areas. Figure 1 shows examples of data acquisition by a mobile pavement inspection vehicle. In a few cases, the cracks have good continuity and obvious contrast, as shown in Figure 1(a). However, in most cases, there is a considerable noise in cracks, which leads to poor continuity and low contrast, as shown in Figure 1(b). Therefore, automatic crack detection mainly includes the following three challenges. (i) In a poorly lit environment and complex background, the texture, and linearity of interference (weeds, stains, etc.) have similar features, resulting in greater intraclass differences. (ii) Boundary blurring occurs between small cracks and local noises. (iii) Blurred low-quality images from crack data collected at high speed are unavoidable. These three difficulties create considerable challenges in pavement crack detection.
Although most of the published methods have achieved ideal results, automated pavement crack detection in the complex backgrounds is still demanding. In this paper, we propose an end-to-end trainable deep convolutional neural network, called the CrackSeg, for pixel-wise crack detection from a complex scene. First, a multiscale dilated convolution module is proposed to obtain more abundant crack texture information. Additionally, to satisfy networks with a larger receptive field and spatial resolution, multiscale context information is captured by different dilated rates. Second, a pixel-level dense prediction mapping is generated by fusing the upsampling module of low-level features to recover the crack boundary details. Finally, the model is systematically evaluated in three crack data sets by quantitative evaluation methods, including comparing the results with manual marking. The results show that the proposed crack detection method can accurately extract cracks in different pavement types and complex backgrounds.
The rest of this paper is organized as follows. Section 2 describes crack detection based on deep learning semantic segmentation. Section 3 demonstrates the effectiveness of the proposed scheme through comparative analyses of experiments. Section 4 discusses the detailed design of the two modules proposed in this paper. Finally, Section 5 concludes the paper.
In this section, we introduce a novel end-to-end trainable crack detection DCNN structure based on multiscale features, which is divided into three parts. In the first part, the overall structure of the crack detection network is introduced. In the second part, a multiscale dilated convolution module is introduced to obtain more abundant context information in the crack image, and feature mapping after crack detection fusion is preliminarily obtained. In the third part, we propose a new upsampling scheme based on different resolution feature maps.
The multiscale dilated convolution module in the encoding stage can transform the input image into rich semantic visual features. However, these features have a rough spatial resolution . The purpose of upsampling is to restore these features to the input image resolution and then predict the crack spatial distribution.
To verify the effectiveness of our scheme, extensive experiments on pavement crack detection were conducted on various images. In this section, we depict the experimental setup and analyze our experimental results.
Our CrackDataset consists of pavement detection images of 14 cities in the Liaoning Province, China. The data cover most of the pavement diseases in the whole road network. These images include collected images of different pavement, different illumination, and different sensors. The real values in the dataset provide two types of labels, cracks, and noncracks. The dataset is divided into three parts. The training set and the validation set are composed of 4736 and 1036 crack images, respectively. The test set contains 2416 images. In addition, two other crack datasets, CFD  and AigleRN , are used as test sets. The details of the datasets are shown in Table 2.
In the evaluation of crack detection accuracy, crack and noncrack pixels are considered as two categories. The overall accuracy (OA), precision, recall, -score, and mIoU are used as the metrics for the quantitative performance evaluation and comparison method in the experiment. These five indicators can be calculated as follows:
To demonstrate the feasibility of the proposed scheme, we compare our CrackSeg with SegNet , U-Net , PSPNet , DeepCrack , and DeepLabv3+ . In addition, to verify the advantages of the deep learning semantic segmentation model in crack detection, the nondeep learning method CrackForest is introduced to compare based on different comparative experiments.
The quantitative comparison testing results in our CrackDataset are shown in Table 3, which shows that the crack detection accuracy based on the deep learning method in a complex background is higher and has good advantages. Compared with other segmentation methods based on deep learning, the CrackSeg achieves the highest OA, recall, precision, mIoU, and -score. The mIoU value of CrackSeg reached the highest 73.53%, followed by DeepCrack, and Deeplabv3+, with the mIoU of 72.04 and 71.77. The mIoU of CrackForest, SegNet, U-Net, and PSPNet are 14.27%, 2.97%, 2.04%, and 3.90% lower than the results of CrackSeg. The performance improvement is mainly due to the use of a multiscale dilated convolution module in the encoding stage, which captures a multiscale context for accurate semantic mining. On the basis of obtaining rich semantic information, the boundary information of the target is recovered by using low-level high-resolution features, and more accurate segmentation results are obtained by using continuous convolution operations.
Figure 5 describes the visual comparisons of the crack detection results using different methods. The first row is the original image containing cracks, some of which are accompanied by noise such as shadows, oil spots, and watermarks, which are the main factors affecting the detection of cracks. The experimental results show that the CrackForest method based on traditional machine learning features can extract cracks in a simple background, but it still retains more noise and cannot adapt to the automatic crack detection in complex scenes. For SegNet and U-Net, the detection results are acceptable, but these methods produce many false detections in complex backgrounds. The DeepCrack performs well in extracting the thin cracks in the complex backgrounds, however, some width information of cracks is lost in the detection results. The DeepLabV3+ has good performance in detecting light cracks, but nonexistent cracks occur because of its large dilation rate. Furthermore, its single convolution kernel size causes the loss of crack information. Our CrackSeg integrates low-level and high-level features in convolution stages at different scales and can further improve the accuracy of crack detection and robustness of background artifact suppression, effectively eliminate the influence of oil pollution, shadow, and complex backgrounds, and extract various complex topological crack relationships.
To verify the stability of our proposed method, the other two datasets (CFD and AigleRN) are tested by CrackSeg. The visual crack detection results are shown in Figure 6. It is noteworthy that this method does not use the crack images in these two datasets in the training phase. The results show that the proposed method can extract most pavement cracks and that the model has strong robustness.
In this section, to determine the optimal crack characteristics, we discuss the self-impact of the multiscale dilated convolution module. Then, the low-level feature selection and convolution operation structure in the upsampling module are discussed.
To compare the effect of the multiscale dilated convolution module on crack detection more clearly, the features are sampled 16 times in the upsampling stage by BU, and the final prediction results are obtained. In the experiment, ResNet50 is used as the network backbone to validate the multiscale dilated convolution module. Figure 7 shows the change in mIoU of different dilated convolution modules after 20 epochs in the training stage. After 14 epochs, each method reaches a stable state. The BaseLine has the lowest performance, and the purple polyline (fusion-S-dilated) represents the highest mIoU score compared with the other methods. In summary, the multiscale dilated convolution module with fusion features achieves the best results.
As shown in Table 4, the experimental results are compared and analyzed for the selection of