首页期刊导航|机器智能研究(英文)
期刊信息/Journal information
机器智能研究(英文)
机器智能研究(英文)

谭铁牛 刘国平 胡豁生

双月刊

2731-538X

ijac@ia.ac.cn

010-62655893

100190

北京海淀区中关村东路95号2728信箱

机器智能研究(英文)/Journal Machine Intelligence ResearchCSCDCSTPCD北大核心EI
查看更多>>International Journal of Automation and computing is a publication of Institute of Automation, the Chinese Academy of Sciencs and Chinese Automation and computing Society in the United Kingdom. The Journal publishes papers on original theoretical and experimental research and development in automation and computing. The scope of the journal is extensive. Topics include; artificial intelligence, automatic control, bioinformatics, computer sciene, information technology, modeling and simulation, networks and communications, optimization and decision, pattern recognition, robotics, signal processing, and systems engineering.
正式出版
收录年代

    Autonomy Evaluation of Unmanned Systems Based on Task Models

    Yi ZouZehao NiXun LeiChi Zhang...
    815-830页
    查看更多>>摘要:In this study,relevant work on autonomy evaluation(AE)in recent years was comprehensively reviewed and classified from the perspective of task models,and a closed-loop task models based theoretical framework for AE was developed.The main contribu-tions of this study are as follows.1)A taxonomy for AE based on task models was introduced to classify current theories,methods and standards.2)The limitations of the current autonomous evaluation methods were addressed to provide a theoretical framework for quantitative evaluation based on task models,and evaluation metrics for each stage were proposed based on the AE theoretical frame-work.3)Qualitative analyses of the superiority of the proposed AE framework based on the closed-loop task models were conducted.This study attempts to provide a reference for researchers and engineers in the autonomous unmanned systems field and inspire future development of AE.

    A Survey of Synthetic Data Augmentation Methods in Machine Vision

    Alhassan MumuniFuseini MumuniNana Kobina Gerrar
    831-869页
    查看更多>>摘要:The standard approach to tackling computer vision problems is to train deep convolutional neural network(CNN)models using large-scale image datasets that are representative of the target task.However,in many scenarios,it is often challenging to obtain sufficient image data for the target task.Data augmentation is a way to mitigate this challenge.A common practice is to explicitly trans-form existing images in desired ways to create the required volume and variability of training data necessary to achieve good generaliza-tion performance.In situations where data for the target domain are not accessible,a viable workaround is to synthesize training data from scratch,i.e.,synthetic data augmentation.This paper presents an extensive review of synthetic data augmentation techniques.It covers data synthesis approaches based on realistic 3D graphics modelling,neural style transfer(NST),differential neural rendering,and generative modelling using generative adversarial networks(GANs)and variational autoencoders(VAEs).For each of these classes of methods,we focus on the important data generation and augmentation techniques,general scope of application and specific use-cases,as well as existing limitations and possible workarounds.Additionally,we provide a summary of common synthetic datasets for training computer vision models,highlighting the main features,application domains and supported tasks.Finally,we discuss the effectiveness of synthetic data augmentation methods.Since this is the first paper to explore synthetic data augmentation methods in great detail,we are hoping to equip readers with the necessary background information and in-depth knowledge of existing methods and their attendant issues.

    Unified Classification and Rejection:A One-versus-all Framework

    Zhen ChengXu-Yao ZhangCheng-Lin Liu
    870-887页
    查看更多>>摘要:Classifying patterns of known classes and rejecting ambiguous and novel(also called as out-of-distribution(OOD))inputs are involved in open world pattern recognition.Deep neural network models usually excel in closed-set classification while perform poorly in rejecting OOD inputs.To tackle this problem,numerous methods have been designed to perform open set recognition(OSR)or OOD rejection/detection tasks.Previous methods mostly take post-training score transformation or hybrid models to ensure low scores on OOD inputs while separating known classes.In this paper,we attempt to build a unified framework for building open set classi-fiers for both classification and OOD rejection.We formulate the open set recognition of K-known-class as a(K+1)-class classification problem with model trained on known-class samples only.By decomposing the K-class problem into K one-versus-all(OVA)binary clas-sification tasks and binding some parameters,we show that combining the scores of OVA classifiers can give(K+1)-class posterior prob-abilities,which enables classification and OOD rejection in a unified framework.To maintain the closed-set classification accuracy of the OVA trained classifier,we propose a hybrid training strategy combining OVA loss and multi-class cross-entropy loss.We implement the OVA framework and hybrid training strategy on the recently proposed convolutional prototype network and prototype classifier on vis-ion transformer(ViT)backbone.Experiments on popular OSR and OOD detection datasets demonstrate that the proposed framework,using a single multi-class classifier,yields competitive performance in closed-set classification,OOD detection,and misclassification de-tection.The code is available at https://github.com/zhen-cheng121/CPN_OVA_unified.

    MOSS:An Open Conversational Large Language Model

    Tianxiang SunXiaotian ZhangZhengfu HePeng Li...
    888-905页
    查看更多>>摘要:Conversational large language models(LLMs)such as ChatGPT and GPT-4 have recently exhibited remarkable capabilit-ies across various domains,capturing widespread attention from the public.To facilitate this line of research,in this paper,we report the development of MOSS,an open-sourced conversational LLM that contains 16 B parameters and can perform a variety of instructions in multi-turn interactions with humans.The base model of MOSS is pre-trained on large-scale unlabeled English,Chinese,and code data.To optimize the model for dialogue,we generate 1.1 M synthetic conversations based on user prompts collected through our earlier ver-sions of the model API.We then perform preference-aware training on preference data annotated from AI feedback.Evaluation results on real-world use cases and academic benchmarks demonstrate the effectiveness of the proposed approaches.In addition,we present an effective practice to augment MOSS with several external tools.Through the development of MOSS,we have established a complete technical roadmap for large language models from pre-training,supervised fine-tuning to alignment,verifying the feasibility of chatG-PT under resource-limited conditions and providing a reference for both the academic and industrial communities.Model weights and code are publicly available at https://github.com/OpenMOSS/MOSS.

    Tuning Synaptic Connections Instead of Weights by Genetic Algorithm in Spiking Policy Network

    Duzhen ZhangTielin ZhangShuncheng JiaQingyu Wang...
    906-918页
    查看更多>>摘要:Learning from interaction is the primary way that biological agents acquire knowledge about their environment and them-selves.Modern deep reinforcement learning(DRL)explores a computational approach to learning from interaction and has made signi-ficant progress in solving various tasks.However,despite its power,DRL still falls short of biological agents in terms of energy efficiency.Although the underlying mechanisms are not fully understood,we believe that the integration of spiking communication between neur-ons and biologically-plausible synaptic plasticity plays a prominent role in achieving greater energy efficiency.Following this biological intuition,we optimized a spiking policy network(SPN)using a genetic algorithm as an energy-efficient alternative to DRL.Our SPN mimics the sensorimotor neuron pathway of insects and communicates through event-based spikes.Inspired by biological research show-ing that the brain forms memories by creating new synaptic connections and rewiring these connections based on new experiences,we tuned the synaptic connections instead of weights in the SPN to solve given tasks.Experimental results on several robotic control tasks demonstrate that our method can achieve the same level of performance as mainstream DRL methods while exhibiting significantly higher energy efficiency.

    A New Generation of Rules-based Approach:Mivar-based Intelligent Planning of Robot Actions(MIPRA)and Brains for Autonomous Robots

    Oleg VarlamovDmitry Aladin
    919-940页
    查看更多>>摘要:To create autonomous robots,both hardware and software are needed.If enormous progress has already been made in the field of equipment,then robot software depends on the development of artificial intelligence.This article proposes a solution for creating"logical"brains for autonomous robots,namely,an approach for creating an intelligent robot action planner based on Mivar expert sys-tems.The application of this approach provides opportunities to reduce the computational complexity of solving planning problems and the requirements for the computational characteristics of hardware platforms on which intelligent planning systems are deployed.To theoretically and practically justify the expediency of using logically solving systems,in particular Mivar expert systems,to create intel-ligent planners,the MIPRA(Mivar-based Intelligent Planning of Robot Actions)planner was created to solve problems such as STRIPS for permutation cubes in the Blocks World domain.The planner is based on the platform for creating expert systems of the Razumator.As a result,the Mivar planner can process information about the state of the subject area based on the analysis of cause-effect relation-ships and an algorithm for automatically constructing logical inference(finding a solution from"Given"to"Find").Moreover,an im-portant feature of the MIPRA is that the system is built on the principles of a"white box",due to which the system can explain any of its decisions and provide justification for the actions performed in the form of a retrospective of the stages of the decision-making process.When preparing a set of robot actions aimed at changing control objects,expert knowledge is used,which is the basis for the functioning algorithms of the planner.This approach makes it possible to include an expert in the process of organizing the work of the intelligent planner and use existing knowledge about the subject area.Practical experiments of this study have shown that instead of many hours and powerful multiprocessor servers,the MIPRA on a personal computer solves the planning problems with the following number of cubes:10 cubes can be rearranged in 0.028 seconds,100 cubes in 0.938 seconds,and 1 000 cubes in 84.188 seconds.The results of this study can be used to reduce the computational complexity of solving tasks of planning the actions of robots,as well as their groups,mul-tilevel heterogeneous robotic systems,and cyber-physical systems of various bases and purposes.Practical demonstration of MIPRA:ht-tps://mivar.org/en/about/contacts/

    One-shot Face Reenactment with Dense Correspondence Estimation

    Yunfan LiuQi LiZhenan Sun
    941-953页
    查看更多>>摘要:One-shot face reenactment is a challenging task due to the identity mismatch between source and driving faces.Most exist-ing methods fail to completely eliminate the interference of driving subjects'identity information,which may lead to face shape distor-tion and undermine the realism of reenactment results.To solve this problem,in this paper,we propose using a 3D morphable model(3DMM)for explicit facial semantic decomposition and identity disentanglement.Instead of using 3D coefficients alone for reenactment control,we take advantage of the generative ability of 3DMM to render textured face proxies.These proxies contain abundant yet com-pact geometric and semantic information of human faces,which enables us to compute the face motion field between source and driving images by estimating the dense correspondence.In this way,we can approximate reenactment results by warping source images accord-ing to the motion field,and a generative adversarial network(GAN)is adopted to further improve the visual quality of warping results.Extensive experiments on various datasets demonstrate the advantages of the proposed method over existing state-of-the-art bench-marks in both identity preservation and reenactment fulfillment.

    Task-specific Part Discovery for Fine-grained Few-shot Classification

    Yongxian WeiXiu-Shen Wei
    954-965页
    查看更多>>摘要:Localizing discriminative object parts(e.g.,bird head)is crucial for fine-grained classification tasks,especially for the more challenging fine-grained few-shot scenario.Previous work always relies on the learned object parts in a unified manner,where they at-tend the same object parts(even with common attention weights)for different few-shot episodic tasks.In this paper,we propose that it should adaptively capture the task-specific object parts that require attention for each few-shot task,since the parts that can distin-guish different tasks are naturally different.Specifically for a few-shot task,after obtaining part-level deep features,we learn a task-spe-cific part-based dictionary for both aligning and reweighting part features in an episode.Then,part-level categorical prototypes are gen-erated based on the part features of support data,which are later employed by calculating distances to classify query data for evaluation.To retain the discriminative ability of the part-level representations(i.e.,part features and part prototypes),we design an optimal trans-port solution that also utilizes query data in a transductive way to optimize the aforementioned distance calculation for the final predic-tions.Extensive experiments on five fine-grained benchmarks show the superiority of our method,especially for the 1-shot setting,gain-ing 0.12%,8.56% and 5.87% improvements over state-of-the-art methods on CUB,Stanford Dogs,and Stanford Cars,respectively.

    Self-attention Guidance Based Crowd Localization and Counting

    Zhouzhou MaGuanghua GuWenrui Zhao
    966-982页
    查看更多>>摘要:Most existing studies on crowd analysis are limited to the level of counting,which cannot provide the exact location of indi-viduals.This paper proposes a self-attention guidance based crowd localization and counting network(SA-CLCN),which can simultan-eously locate and count crowds.We take the form of object detection,using the original point annotations of crowd datasets as supervi-sion to train the network.Ultimately,the center point coordinate of each head as well as the number of crowds are predicted.Specific-ally,to cope with the spatial and positional variations of the crowd,the proposed method introduces transformer to construct a global-local feature extractor(GLFE)together with the convolutional structure.It establishes the near-to-far dependency between elements so that the global context and local detail features of the crowd image can be extracted simultaneously.Then,this paper designs a pyramid feature fusion module(PFFM)to fuse the global and local information from high level to low level to obtain a multiscale feature repres-entation.In downstream tasks,this paper predicts candidate point offsets and confidence scores by a simple regression header and classi-fication header.In addition,the Hungarian algorithm is used to match the predicted point set and the labelled point set to facilitate the calculation of losses.The proposed network avoids the errors or higher costs associated with using traditional density maps or bounding box annotations.Importantly,we have conducted extensive experiments on several crowd datasets,and the proposed method has pro-duced competitive results in both counting and localization.

    AHLNet:Adaptive Multihead Structure and Lightweight Feature Pyramid Network for Detection of Live Working in Substations

    Mengle PengXiaoyong JiangLangyue HuangZhongyi Li...
    983-992页
    查看更多>>摘要:With the increasing demand for power in society,there is much live equipment in substations,and the safety and standard-ization of live working of workers are facing challenges.Aiming at these problems of scene complexity and object diversity in the real-time detection of the live working safety of substation workers,an adaptive multihead structure and lightweight feature pyramid-based network(AHLNet)is proposed in this study,which is based on YOLOV3.First,we take AH-Darknet53 as the backbone network of YOLOV3,which can introduce an adaptive multihead(AMH)structure,reduce the number of network parameters,and improve the feature extraction ability of the backbone network.Second,to reduce the number of convolution layers of the deeper feature map,a lightweight feature pyramid network(LFPN)is proposed,which can perform feature fusion in advance to alleviate the problem of fea-ture imbalance and gradient disappearance.Finally,the proposed AHLNet is evaluated on the datasets of 16 categories of substation safety operation scenarios,and the average prediction accuracy MAP50 reaches 82.10% .Compared with YOLOV3,MAP50 is increased by 2.43%,and the number of parameters is 90 M,which is only 38% of the number of parameters of YOLOV3.In addition,the detection speed is basically the same as that of YOLOV3,which can meet the real-time and accurate detection requirements for the safe operation of substation staff.