Dental model segmentation network with fine-grained receptive fields and multiscale fusion
Objective Dental computer-aided therapy relies on the use of dental models to aid dentists in their practice.One of the most fundamental tasks in dental computer-aided therapy is the automated segmentation of teeth using point cloud data obtained from intra-oral scanners(IOS).The precise segmentation of each individual tooth in this procedure pro-vides vital information for a variety of subsequent tasks.These segmented dental models facilitate customized treatment planning and modeling,thus providing extensive assistance in carrying out further treatments.However,the automated seg-mentation of individual teeth from dental models faces three significant challenges.First,the indistinct boundary between teeth and gums poses difficulties in segmentation based solely on geometric features.Second,certain factors,such as occlusion during scanning,can lead to suboptimal results,particularly in posterior dental regions,thereby further compli-cating the segmentation process.Lastly,teeth often exhibit complex anomalies in patients,including crowding,missing teeth,and misalignment issues,which further complicate the task of accurate segmentation.To address these challenges,two conventional methods are proposed for segmenting teeth in images obtained from IOS scanners.The first method employs a projection-based approach,wherein a 3D dental scan image is initially projected into a 2D space,segmentation is then performed in a 2D space,and the result is remapped back into the 3D space.The second method adopts a geometry-based approach and typically utilizes geometric attributes,such as surface curvature,geodesic information,harmonic fields,and other geometric properties,to distinguish tooth structures.However,these methods are not fully automated and rely on domain-specific knowledge and experience.Moreover,the predefined low-level attributes used by these methods lack robustness when dealing with the complex appearance of patietns'teeth.Considering the impactful application of con-volutional neural networks(CNN)in computer vision and medical image processing,several deep learning methods rooted in CNN have been introduced.Some of these methods directly extract translation-invariant depth geometric features from 3D point cloud data but suffer from a lack of necessary receptive field for fine-grained tasks,such as dental model segmen-tation.Moreover,the network structure exhibits redundancy and neglects the crucial details of dental models.To address these issues,a fully automatic tooth segmentation network called TRNet is proposed in this paper,which can automatically segment teeth on unprocessed intra-oral scanned point cloud models.Method In the proposed end-to-end 3D point cloud-based multi-scale fusion dental model segmentation method,an encoder with a fine-grained receptive field is employed to address those challenges posed by the small size of each tooth within the dental model and the lack of distinct features between the teeth and gums.Each tooth within the dental model is relatively small in comparison to the entire dental model,and the boundaries between the teeth and gums lack distinct features.Consequently,a fine-grained receptive field is essential for extracting features from this model.The network adopts a small radius for querying the neighborhood,thus narrowing the receptive field and enabling the network to focus on detailed features.Additionally,downsampling can lead to the uneven density of the point cloud,thereby causing the network trained on sparse point clouds to struggle in recogniz-ing fine-grained local structures.Multiscale feature fusion coding is implemented to address these issues.Given that the encoder uses a small query radius to create a fine-grained receptive field,the relative coordinates become relatively small.Consequently,the network needs to learn large weights to operate on these relative coordinates,thereby introducing further challenges in network optimization.TRNet normalizes the relative coordinates in the feature extraction layer to facilitate network optimization and enhance segmentation performance.The network also employs a highly efficient decoder.Previ-ous segmentation methods often utilize the U-Net structure,which incorporates jump connections for multi-level feature aggregation between the input features of the cascaded decoder and the outputs of the corresponding layer encoder.How-ever,this top-down propagation is considered inefficient for feature aggregation.The decoding approach used by TRNet directly combines the features outputted from all cascade encoders,thereby allowing the network to learn the importance of each cascade.The discrepancies in scales or dimensions of the features represented by fused information in the network may also introduce unwanted bias during the fusion process.To address these issues and ensure that the network focuses on crucial information within the fused features,a soft attention mechanism is incorporated into the fusion process.Specifi-cally,a soft attention operation is performed on the newly combined features after their connection,thereby enabling the network to adaptively balance the discrepancies of different scales or levels in the propagated features.Result A dataset comprising dental models taken from numerous patients with irregular tooth shapes,such as crowding,misalignment,and underdeveloped teeth,was compiled.To establish the labeled values,an experienced dentist meticulously segmented and annotated these models.The dataset was then randomly divided into two subsets,with 146 models allocated for training and 20 models reserved for testing.Data augmentation techniques,such as random panning and scaling,were employed to enhance the diversity of the training set.In each iteration,intra-oral scan images were shifted by a randomly selected value within the range of[-0.1,0.1]and scaled by a randomly chosen magnification within the range of[0.8,1.25],thereby generating new training data.Experimental results from a 5-fold cross-validation reveal that TRNet achieved an overall accuracy(OA)of 97.015±0.096%and a mean intersection over union(mIoU)of 92.691±0.454%,significantly outper-forming the existing methods.Conclusion An end-to-end deep learning network called TRNet is introduced in this paper for the automatic segmentation of teeth in 3D dental images acquired from intra-oral scanners.An encoder with fine-grained receptive fields was also implemented to enhance the local feature extraction capabilities essential for dental model segmen-tation.Additionally,a decoder based on hierarchical connections was employed to allow the network to decode efficiently by learning the significance of each level.This refinement significantly improves the precision of dental model segmenta-tion.A soft attention mechanism was also integrated into the feature fusion process to enable the network to focus on key information within dental model features.Experimental results indicate that TRNet shows excellent performance on intra-oral scanned point cloud models and enhances the ability of the network to segment dental models,thereby improving the accuracy of point cloud segmentation results.