Object units and Transformer networks combined with urban functional zone classification method
Urban Functional Zones(UFZs)refer to specific areas within a city that have distinct functionalities and land uses.These zones are designated based on their primary activities and the roles they play in the urban environment.Accurate extraction of UFZs and a comprehensive understanding of their spatial distribution play an important role in urban planning and management.Traditional Convolutional Neural Networks(CNNs)focus on local features through convolutions,but they often miss the broader spatial relationships.Vision Transformer(ViT),while advanced,still has limitations;its tokenization method and learnable position encoding do not effectively represent geographical entities and their spatial relationships,which is a crucial feature in geospatial analysis.This study proposes a UFZ classification method combining object units and ViT to address this issue.First,this method utilizes over-segmented objects generated from a multi-scale segmentation approach as analysis units to avoid the presence of multiple kinds of UFZs within a single object.Over-segmentation helps in creating smaller,more homogeneous units,thereby increasing the precision of the classification process.Then,considering that current methods often focus on the inherent analysis of objects while ignoring their spatial relationships,ViT is employed for spatial relationship modeling between objects,with the geographic attributes of objects serving as position embeddings.In this way,both the inherent features of a single analysis unit and the inter-spatial features among objects are considered for UFZ classification.Position embeddings using geographic coordinates allow the model to understand spatial proximity and relationships,which are crucial for accurate classification.We chose Beijing as the study area and downloaded imagery of the area within the Sixth Ring Road from Bing Maps.We also collected labels from OpenStreetMap and reclassified them into 10 typical urban functional zones according to the"Code for classification of urban land use and planning standards of development land(GB 50137-2011)".This dataset provided a comprehensive and diverse set of examples that are representative of different urban functionalities.Experimental results show that,firstly,compared with the results of existing methods,over-segmented objects can improve boundary accuracy.This enhancement avoids the jagged boundaries resulting from grid units and the presence of multiple UFZs within a single unit due to road-block units.The improved boundary accuracy ensures that the functional zones are delineated more precisely,reflecting true urban layouts and reducing classification errors.Secondly,the accuracy of UFZ classification increases by 13.9%compared to the method that employs objects as analysis units while ignoring their spatial relationships.This significant improvement highlights the importance of considering spatial relationships in UFZ classification.Additionally,the traditional position encoding method achieved similar accuracy to the method without position encoding,indicating that traditional position encoding does not effectively provide positional information.The kappa coefficient of the proposed method,which uses geographic coordinates for encoding,shows an average improvement of 0.042 compared to the traditional Transformer position encoding method.This demonstrates that the introduction of geographic coordinates can effectively provide spatial relationship information,leading to better classification results.The kappa coefficient is a measure of classification accuracy adjusted for chance agreement,and an improvement in this metric underscores the robustness of the proposed method.