基于CrossViT模型的树叶识别方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：中国幅员辽阔,由于各地自然条件不同,加之植物种类繁多,森林植物和森林类型极为丰富多样,而树叶的准确识别对树木研究具有重要意义.树叶分类是一项具有挑战性的任务,需要对树叶的形态、纹理、颜色等多种特征进行识别和分类.提出一种基于Cross Vision Transformer(CrossViT)的树叶分类与识别方法.该方法以八角金盘(Fatsia japonica)、杜鹃(Rhododendron simsii)、广玉兰(Magnolia grandiflora)、桂树(Cinnamomum cassia)、海桐(Pittosporum tobira)、木槿(Hibiscus syriacus)、石楠(Photinia serratifolia)、梧桐(Firmiana simplex)、银杏(Ginkgo biloba)和樟树(Camphora officinarum)10种园林绿化常见阔叶树叶片为实验对象.首先,分别拍摄在实验环境和真实环境下的树叶图像作为数据集;其次,对CrossViT模型的网络结构,构造两个独立的分支,以获取不同大小的嵌入向量,通过优化Transformer编码器,利用交叉注意力模块融合不同大小的嵌入向量,以平衡计算成本和识别精度;最后,通过一个MLP Head得到最终的分类结果.对两个不同环境下的树叶数据集的训练和测试表明,该研究基于的CrossViT模型在实验环境下的树叶数据集上总体准确率约92.5％,在真实环境下的树叶数据集上总体准确率约75.2％.通过与传统卷积网络的比较,所提出方法的性能在实验环境下的树叶数据集上高出0.6～4.0个百分点,在真实环境下的树叶数据集上高出1.3～3.3个百分点,FLOPs和模型参数略有增加.

外文标题：Method of tree leaf recognition based on CrossViT

外文摘要：China,characterized by its expansive territory and diverse ecological conditions,hosts a rich tapestry of forest flora,showcasing extensive botanical diversity.Accurate leaf recognition is a pivotal component in botanical research,requiring meticulous identification and classification of intricate leaf attributes such as shape,texture,and color.This study introduced an innovative leaf classification and recognition methodology based on the Cross Vision Transformer(CrossViT).The research focused on ten distinct types of leaves:Fatsia japonica,Rhododendron simsii,Magnolia grandiflora,Cinnamomum cassia,Pittosporum tobira,Hibiscus syriacus,Photinia serratifolia,Firmiana simplex,Ginkgo biloba,and Camphora officinarum.Comprehensive datasets were curated by capturing leaf images under controlled experimental conditions and in diverse real-world environments.This meticulous approach ensured the robustness of the dataset used for training and validation of the CrossViT model.Central to the methodology is the enhancement of the CrossViT model's architecture.Dual independent branches were incorporated to generate embedding vectors of varying dimensions,effectively capturing a wide range of leaf image features.The Transformer encoder was further optimized through the integration of a cross-attention mechanism,facilitating the seamless fusion of embedding vectors across different scales.This strategic refinement aimed to strike a balance between computational efficiency and classification accuracy,enhancing the model's performance in high-precision leaf categorization tasks.The classification process utilized a Multilayer Perceptron(MLP)Head,which successfully yielded robust results.Evaluation across distinct environmental settings revealed significant achievements,with an overall accuracy of approximately 92.5％in the controlled experimental dataset and 75.2％in the real-world dataset.The comparative analysis with traditional convolutional neural networks(CNNs)highlighted notable performance advantages of the CrossViT-based approach.In the controlled experimental environment,performance improvements ranged from 0.6 to 4.0 percentage points,while in the real-world scenario,improvements ranged from 1.3 to 3.3 percentage points.Despite a modest increase in floating-point operations(FLOPs)and model parameters,the CrossViT model demonstrated substantial gains in accuracy,underscoring its efficacy in leaf classification and recognition tasks.In conclusion,the proposed CrossViT-based methodology represents an efficient and effective approach to advance tree research and ecological conservation.By leveraging advanced deep learning techniques,this study contributes significantly to the disciplines of botany and environmental science,addressing critical challenges in biodiversity monitoring and sustainable natural resource management.The findings hold promise for enhancing our understanding and preservation of global forest ecosystems,emphasizing the importance of technological innovation in fostering environmental stewardship and conservation efforts worldwide.

外文关键词：

tree species identificationcross vision transformer(CrossViT model)self-attentionvisualizationplant phenotype analysis

作者：

许兵博、张怀清、薛联凤、云挺

展开 >

作者单位：

南京林业大学信息科学技术学院,南京 210037

中国林业科学研究院,北京 100091

南京林业大学林草学院,南京 210037

关键词：

树种识别 CrossViT模型自注意力机制可视化树木表型分析

出版年：

2024

DOI：

10.13360/j.issn.2096-1359.202308025

林业工程学报

南京林业大学

林业工程学报

CSTPCD北大核心

影响因子：0.742

ISSN：2096-1359

年,卷(期)：2024.9(6)