首页期刊导航|机器智能研究(英文)
期刊信息/Journal information
机器智能研究(英文)
机器智能研究(英文)

谭铁牛 刘国平 胡豁生

双月刊

2731-538X

ijac@ia.ac.cn

010-62655893

100190

北京海淀区中关村东路95号2728信箱

机器智能研究(英文)/Journal Machine Intelligence ResearchCSCDCSTPCD北大核心EI
查看更多>>International Journal of Automation and computing is a publication of Institute of Automation, the Chinese Academy of Sciencs and Chinese Automation and computing Society in the United Kingdom. The Journal publishes papers on original theoretical and experimental research and development in automation and computing. The scope of the journal is extensive. Topics include; artificial intelligence, automatic control, bioinformatics, computer sciene, information technology, modeling and simulation, networks and communications, optimization and decision, pattern recognition, robotics, signal processing, and systems engineering.
正式出版
收录年代

    Editorial for Special Issue on Artificial Intelligence for Art

    Luntian MouFeng GaoZijin LiJiaying Liu...
    1-3页

    Cogeneration of Innovative Audio-visual Content:A New Challenge for Computing Art

    Mengting LiuYing ZhouYuwei WuFeng Gao...
    4-28页
    查看更多>>摘要:In recent years,computing art has developed rapidly with the in-depth cross study of artificial intelligence generated con-tent(AIGC)and the main features of artworks.Audio-visual content generation has gradually been applied to various practical tasks,including video or game score,assisting artists in creation,art education and other aspects,which demonstrates a broad application pro-spect.In this paper,we introduce innovative achievements in audio-visual content generation from the perspective of visual art genera-tion and auditory art generation based on artificial intelligence(AI).We outline the development tendency of image and music datasets,visual and auditory content modelling,and related automatic generation systems.The objective and subjective evaluation of generated samples plays an important role in the measurement of algorithm performance.We provide a cogeneration mechanism of audio-visual content in multimodal tasks from image to music and display the construction of specific stylized datasets.There are still many new op-portunities and challenges in the field of audio-visual synesthesia generation,and we provide a comprehensive discussion on them.

    Exploring Variational Auto-encoder Architectures,Configurations,and Datasets for Generative Music Explainable AI

    Nick Bryan-KinnsBingyuan ZhangSongyan ZhaoBerker Banar...
    29-45页
    查看更多>>摘要:Generative AI models for music and the arts in general are increasingly complex and hard to understand.The field of ex-plainable AI(XAI)seeks to make complex and opaque AI models such as neural networks more understandable to people.One ap-proach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on gen-erative AI models.This paper contributes a systematic examination of the impact that different combinations of variational auto-en-coder models(measureVAE and adversarialVAE),configurations of latent space in the AI model(from 4 to 256 latent dimensions),and training datasets(Irish folk,Turkish folk,classical,and pop)have on music generation performance when 2 or 4 meaningful musical at-tributes are imposed on the generative model.To date,there have been no systematic comparisons of such models at this level of com-binatorial detail.Our findings show that measureVAE has better reconstruction performance than adversarialVAE which has better musical attribute independence.Results demonstrate that measureVAE was able to generate music across music genres with inter-pretable musical dimensions of control,and performs best with low complexity music such as pop and rock.We recommend that a 32 or 64 latent dimensional space is optimal for 4 regularised dimensions when using measureVAE to generate music across genres.Our res-ults are the first detailed comparisons of configurations of state-of-the-art generative AI models for music and can be used to help select and configure AI models,musical features,and datasets for more understandable generation of music.

    Deep Video Harmonization by Improving Spatial-temporal Consistency

    Xiuwen ChenLi FangLong YeQin Zhang...
    46-54页
    查看更多>>摘要:Video harmonization is an important step in video editing to achieve visual consistency by adjusting foreground appear-ances in both spatial and temporal dimensions.Previous methods always only harmonize on a single scale or ignore the inaccuracy of flow estimation,which leads to limited harmonization performance.In this work,we propose a novel architecture for video harmoniza-tion by making full use of spatiotemporal features and yield temporally consistent harmonized results.We introduce multiscale harmon-ization by using nonlocal similarity on each scale to make the foreground more consistent with the background.We also propose a fore-ground temporal aggregator to dynamically aggregate neighboring frames at the feature level to alleviate the effect of inaccurate estim-ated flow and ensure temporal consistency.The experimental results demonstrate the superiority of our method over other state-of-the-art methods in both quantitative and visual comparisons.

    Audio Mixing Inversion via Embodied Self-supervised Learning

    Haotian ZhouFeng YuXihong Wu
    55-62页
    查看更多>>摘要:Audio mixing is a crucial part of music production.For analyzing or recreating audio mixing,it is of great importance to conduct research on estimating mixing parameters used to create mixdowns from music recordings,i.e.,audio mixing inversion.However,approaches of audio mixing inversion are rarely explored.A method of estimating mixing parameters from raw tracks and a stereo mixdown via embodied self-supervised learning is presented.In this work,several commonly used audio effects including gain,pan,equalization,reverb,and compression,are taken into consideration.This method is able to learn an inference neural network that takes a stereo mixdown and the raw audio sources as input and estimate mixing parameters used to create the mixdown by iteratively sampling and training.During the sampling step,the inference network predicts a set of mixing parameters,which is sampled and fed to an audio-processing framework to generate audio data for the training step.During the training step,the same network used in the sampling step is optimized with the sampled data generated from the sampling step.This method is able to explicitly model the mixing process in an interpretable way instead of using a black-box neural network model.A set of objective measures are used for evaluation.The experimental results show that this method has better performance than current state-of-the-art methods.

    AI for Supporting the Freedom of Drawing

    Xiaohua SunJuexiao Qin
    63-88页
    查看更多>>摘要:Artificial intelligence(AI)has recently been developing rapidly in image processing and generation.AI is not only starting to take over repetitive and tedious tasks but is also involved in creative activities.Drawing is one of the areas with the greatest potential for collaboration between humans and AI.It is an important approach to express various information,but due to the lack of appropriate knowledge and skills,ordinary people without long-time training are unable to draw freely.Although there have been various attempts at human-AI collaboration in drawing,it is difficult for researchers to consider the wide variety of specific problems and develop univer-sal methods due to the openness,improvisation,and individuality of drawing.In this paper,we first analysed the contents of drawing and the general creation process in detail.Second,we have described a mechanism for using AI to enable people to regain the freedom of drawing collaboratively.Finally,we have developed a framework that describes methods for analysing specific problems and quickly finding solutions by building connections between the influencing factors in drawing,the demands of humans,and possible implementa-tion options.The framework also reveals a broad scope of possibilities for applying AI to support people in drawing.

    Weakly Supervised Object Localization with Background Suppression Erasing for Art Authentication and Copyright Protection

    Chaojie WuMingyang LiYing GaoXinyan Xie...
    89-103页
    查看更多>>摘要:The problem of art forgery and infringement is becoming increasingly prominent,since diverse self-media contents with all kinds of art pieces are released on the Internet every day.For art paintings,object detection and localization provide an efficient and ef-fective means of art authentication and copyright protection.However,the acquisition of a precise detector requires large amounts of ex-pensive pixel-level annotations.To alleviate this,we propose a novel weakly supervised object localization(WSOL)with background su-perposition erasing(BSE),which recognizes objects with inexpensive image-level labels.First,integrated adversarial erasing(IAE)for vanilla convolutional neural network(CNN)dropouts the most discriminative region by leveraging high-level semantic information.Second,a background suppression module(BSM)limits the activation area of the IAE to the object region through a self-guidance mechanism.Finally,in the inference phase,we utilize the refined importance map(RIM)of middle features to obtain class-agnostic loc-alization results.Extensive experiments are conducted on paintings,CUB-200-2011 and ILSVRC to validate the effectiveness of our BSE.

    Deep Industrial Image Anomaly Detection:A Survey

    Jiaqi LiuGuoyang XieJinbao WangShangnian Li...
    104-135页
    查看更多>>摘要:The recent rapid development of deep learning has laid a milestone in industrial image anomaly detection(IAD).In this pa-per,we provide a comprehensive review of deep learning-based image anomaly detection techniques,from the perspectives of neural net-work architectures,levels of supervision,loss functions,metrics and datasets.In addition,we extract the promising setting from indus-trial manufacturing and review the current IAD approaches under our proposed setting.Moreover,we highlight several opening chal-lenges for image anomaly detection.The merits and downsides of representative network architectures under varying supervision are discussed.Finally,we summarize the research findings and point out future research directions.More resources are available at https://github.com/M-3LAB/awesome-industrial-anomaly-detection.

    Multimodal Fusion of Brain Imaging Data:Methods and Applications

    Na LuoWeiyang ShiZhengyi YangMing Song...
    136-152页
    查看更多>>摘要:Neuroimaging data typically include multiple modalities,such as structural or functional magnetic resonance imaging,dif-fusion tensor imaging,and positron emission tomography,which provide multiple views for observing and analyzing the brain.To lever-age the complementary representations of different modalities,multimodal fusion is consequently needed to dig out both inter-modality and intra-modality information.With the exploited rich information,it is becoming popular to combine multiple modality data to ex-plore the structural and functional characteristics of the brain in both health and disease status.In this paper,we first review a wide spectrum of advanced machine learning methodologies for fusing multimodal brain imaging data,broadly categorized into unsupervised and supervised learning strategies.Followed by this,some representative applications are discussed,including how they help to under-stand the brain arealization,how they improve the prediction of behavioral phenotypes and brain aging,and how they accelerate the biomarker exploration of brain diseases.Finally,we discuss some exciting emerging trends and important future directions.Collectively,we intend to offer a comprehensive overview of brain imaging fusion methods and their successful applications,along with the chal-lenges imposed by multi-scale and big data,which arises an urgent demand on developing new models and platforms.

    A Knowledge-enhanced Two-stage Generative Framework for Medical Dialogue Information Extraction

    Zefa HuZiyi NiJing ShiShuang Xu...
    153-168页
    查看更多>>摘要:This paper focuses on term-status pair extraction from medical dialogues(MD-TSPE),which is essential in diagnosis dia-logue systems and the automatic scribe of electronic medical records(EMRs).In the past few years,works on MD-TSPE have attracted increasing research attention,especially after the remarkable progress made by generative methods.However,these generative methods output a whole sequence consisting of term-status pairs in one stage and ignore integrating prior knowledge,which demands a deeper un-derstanding to model the relationship between terms and infer the status of each term.This paper presents a knowledge-enhanced two-stage generative framework(KTGF)to address the above challenges.Using task-specific prompts,we employ a single model to com-plete the MD-TSPE through two phases in a unified generative form:We generate all terms the first and then generate the status of each generated term.In this way,the relationship between terms can be learned more effectively from the sequence containing only terms in the first phase,and our designed knowledge-enhanced prompt in the second phase can leverage the category and status candidates of the generated term for status generation.Furthermore,our proposed special status"not mentioned"makes more terms available and en-riches the training data in the second phase,which is critical in the low-resource setting.The experiments on the Chunyu and CMDD datasets show that the proposed method achieves superior results compared to the state-of-the-art models in the full training and low-re-source settings.