Doctoral Defense Announcement
Multitask, Weakly-Supervised, and Gaze-Guided Deep Learning Methods for Automating Structural Inspection
By
Visual inspection remains one of the primary methods for assessing the condition of civil infrastructure in the United States. Regular and post-disaster inspections are essential for identifying deficiencies before they compromise structural integrity and serviceability. However, traditional practices are labor-intensive, time-consuming, subjective, and sometimes hazardous, relying on human experts to collect and interpret visual or other nondestructive data. Advances in deep learning and computer vision, combined with imagery from robotic platforms such as unmanned aerial vehicles, offer a promising approach for automating condition assessment and accelerating damage evaluation. These systems transform inspection imagery into actionable information by detecting, segmenting, and interpreting structural components and defects. Nonetheless, key challenges persist in improving model efficiency, reducing dependence on large annotated datasets, and enhancing the explainability of deep neural networks to support engineering decision-making.
This dissertation addresses these challenges by developing a comprehensive deep learning-based framework for automated structural inspection. It begins with a multitask learning model that jointly segments bridge elements and surface defects, leveraging their interdependence to improve segmentation accuracy and computational performance. A new pixel-level annotated dataset of bridge elements and corrosion was established to support model training and evaluation. Building on this foundation, an attention-enhanced co-interactive fusion network is proposed, trained and validated on an expanded Steel Bridge Condition Inspection Visual (SBCIV) dataset to strengthen feature sharing and spatial correlation between tasks. To mitigate the annotation cost inherent in supervised learning, a weakly supervised structural component segmentation model is introduced that relies only on lightweight scribble annotations, reducing labeling effort by approximately 80% while maintaining high segmentation accuracy. Finally, the framework incorporates an expert gaze-guided multitask model that integrates inspector eye-tracking data collected from large-scale structural inspection imagery to improve both damage classification accuracy and model explainability, aligning algorithmic decision-making with human perceptual reasoning.
The developed framework advances automation and data efficiency in visual structural inspection while delivering inspector-oriented explanations, thereby strengthening decision support for civil infrastructure asset management.