期刊,Information Fusion 2022年77卷期_国家学术搜索

期刊信息/Journal information

Information Fusion

Elsevier Science

主办单位：Elsevier Science

国际刊号：1566-2535

Information Fusion/Journal Information FusionEIISTPSCI

正式出版

收录年代

Cost-effective ensemble models selection using deep reinforcement learning

Birman, YoniHindi, ShakedKatz, GiladShabtai, Asaf...

16页

查看更多>>摘要：Ensemble learning - the application of multiple learning models on the same task - is a common technique in multiple domains. While employing multiple models enables reaching higher classification accuracy, this process can be time consuming, costly, and make scaling more difficult. Given that each model may have different capabilities and costs, assigning the most cost-effective set of learners for each sample is challenging. We propose SPIREL, a novel method for cost-effective classification. Our method enables users to directly associate costs to correct/incorrect label assignment, computing resources and run-time, and then dynamically establishes a classification policy. For each analyzed sample, SPIREL dynamically assigns a different set of learning models, as well as its own classification threshold. Extensive evaluation on two large malware datasets - a domain in which the application of multiple analysis tools is common - demonstrates that SPIREL is highly cost-effective, enabling us to reduce running time by similar to 80% while decreasing the accuracy and F1-score by only 0.5%. We also show that our approach is both highly transferable across different datasets and adaptable to changes in individual learning model performance.

原文链接:

NSTL
Elsevier

Multimodal research in vision and language: A review of current and emerging trends

Uppal, ShagunBhagat, SarthakHazarika, DevamanyuMajumder, Navonil...

23页

查看更多>>摘要：Deep Learning and its applications have cascaded impactful research and development with a diverse range of modalities present in the real-world data. More recently, this has enhanced research interests in the intersection of the Vision and Language arena with its numerous applications and fast-paced growth. In this paper, we present a detailed overview of the latest trends in research pertaining to visual and language modalities. We look at its applications in their task formulations and how to solve various problems related to semantic perception and content generation. We also address task-specific trends, along with their evaluation strategies and upcoming challenges. Moreover, we shed some light on multi-disciplinary patterns and insights that have emerged in the recent past, directing this field toward more modular and transparent intelligent systems. This survey identifies key trends gravitating recent literature in VisLang research and attempts to unearth directions that the field is heading toward.

原文链接:

NSTL
Elsevier

SaccadeFork: A lightweight multi-sensor fusion-based target detector

Ouyang, ZhenchaoCui, JiaheDong, XiaoyunLi, Yanqi...

12页

查看更多>>摘要：Commercialization of self-driving applications requires precision and reliability of the perception system due to the highly dynamic and complex road environment. Early perception systems either rely on the camera or on LiDAR for moving obstacle detection. With the development of vehicular sensors and deep learning technologies, the multi-view and sensor fusion based convolutional neural network (CNN) model for detection tasks has become a popular research area. In this paper, we present a novel multi-sensor fusion-based CNN model-SaccadeFork-that integrates the image and upsampled LiDAR point clouds as the input. SaccadeFork includes two modules: (1) a lightweight backbone that consists of hourglass convolution feature extraction module and a parallel dilation convolution module for adaptation of the system to different target sizes; (2) an anchor-based detection head. The model also considers deployment of resource-limited edge devices in the vehicle. Two refinement strategies, i.e., Mixup and Swish activation function are also adopted to improve the model. Comparison with a series of latest models on public dataset of KITTI shows that SaccadeFork can achieve the optimal detection accuracy on vehicles and pedestrians under different scenarios. The final model is also deployed and tested on a local dataset collected based on edge devices and low-cost sensor solutions, and the results show that the model can achieve real-time efficiency and high detection accuracy.

原文链接:

NSTL
Elsevier

CaSE: Explaining Text Classifications by Fusion of Local Surrogate Explanation Models with Contextual and Semantic Knowledge

Kiefer, Sebastian

12页

查看更多>>摘要：Generating explanations within a local and model-agnostic explanation scenario for text classification is often accompanied by a local approximation task. In order to create a local neighborhood for a document, whose classification shall be explained, sampling techniques are used that most often treat the according features at least semantically independent from each other. Hence, contextual as well as semantic information is lost and therefore cannot be used to update a human's mental model within the according explanation task. In case of dependent features, such explanation techniques are prone to extrapolation to feature areas with low data density, therefore causing misleading interpretations. Additionally, the "the whole is greater than the sum of its parts" phenomenon is disregarded when using explanations that treat the according words independently from each other. In this paper, an architecture named CaSE is proposed that either uses Semantic Feature Arrangements or Semantic Interrogations to overcome these drawbacks. Combined with a modified version of Local interpretable model-agnostic explanations (LIME), a state of the art local explanation framework, it is capable of generating meaningful and coherent explanations. The approach utilizes contextual and semantic knowledge from unsupervised topic models in order to enable realistic and semantic sampling and based on that generate understandable explanations for any text classifier. The key concepts of CaSE that are deemed essential for providing humans with high quality explanations are derived from findings of psychology. In a nutshell, CaSE shall enable Semantic Alignment between humans and machines and thus further improve the basis for Interactive Machine Learning. An extensive experimental validation of CaSE is conducted, showing its effectiveness by generating reliable and meaningful explanations whose elements are made of contextually coherent words and therefore are suitable to update human mental models in an appropriate way. In the course of a quantitative analysis, the proposed architecture is evaluated w.r.t. a consistency property and to Local Fidelity of the resulting explanation models. According to that, CaSE generates more realistic explanation models leading to higher Local Fidelity compared to LIME.

原文链接:

NSTL
Elsevier

Dynamic sensor activation and decision-level fusion in wireless acoustic sensor networks for classification of domestic activities

Dekkers, GertRosas, Fernandovan Waterschoot, ToonVanrumste, Bart...

15页

查看更多>>摘要：For the past decades there has been a rising interest for wireless sensor networks to obtain information about an environment. One interesting modality is that of audio, as it is highly informative for numerous applications including speech recognition, urban scene classification, city monitoring, machine listening and classifying domestic activities. However, as they operate at prohibitively high energy consumption, commercialisation of battery-powered wireless acoustic sensor networks has been limited. To increase the network's lifetime, this paper explores the joint use of decision-level fusion and dynamic sensor activation. Hereby adopting a topology where processing - including feature extraction and classification - is performed on a dynamic set of sensor nodes that communicate classification outputs which are fused centrally. The main contribution of this paper is the comparison of decision-level fusion with different dynamic sensor activation strategies on the use case of automatically classifying domestic activities. Results indicate that using vector quantisation to encode the classification output, computed at each sensor node, can reduce the communication per classification output to 8 bit without loss of significant performance. As the cost for communication is reduced, local processing tends to dominate the overall energy budget. It is indicated that dynamic sensor activation, using a centralised approach, can reduce the average time a sensor node is active up to 20% by leveraging redundant information in the network. In terms of energy consumption, this resulted in an energy reduction of up to 80% as the cost for computation dominates the overall energy budget.

原文链接:

NSTL
Elsevier

Scheduler-based state estimation over multiple channels networks

Alsaadi, Fuad E.Wang, ZidongAlharbi, Khalid H.

9页

查看更多>>摘要：We investigate the remote state estimation problem for networked systems over parallel noise-free communication channels. Due to limited network capabilities in practical network environments, communication schedulers are implemented at the transmit side of each subchannel to promote resource efficiency. Specifically, the processed signals are transmitted only when it is necessary to provide the real-time measurements to the remote estimator. The recursive approximate minimum mean-square error estimator is established to restore the state vector of a target plant by utilizing the scheduled transmission signals. All the information coming from the individual subchannels, even if no measurement is sent, will contribute to improve the estimation performance in an analytical form. Finally, a numerical example is given to illustrate the effectiveness of the main results.

原文链接:

NSTL
Elsevier

Democratic consensus reaching process for multi-person multi-criteria large scale decision making considering participants' individual attributes and concerns

Liu, XiaXu, YejunGong, ZaiwuHerrera, Francisco...

13页

查看更多>>摘要：Consensus reaching is a key issue in group decision-making, because conflicts of interest among groups are common. Democratic consensus refers to achieve a soft consensus among collective as well as ensure the effective participation and satisfaction of individuals. Multi-person multi-criteria large scale decision making (MpMcLSDM) usually involves a huge number of decision makers (DMs/participants), and different DMs usually have different interests. Thus, how to effectively manage individuals to promote democratic consensus is a current research challenge. To do that, this research develops a democratic consensus reaching process (DCRP) for MpMcLSDM problems. In the proposed approach, a clustering method that considers both the opinion similarity and individual concern similarity of DM is firstly given to decrease the complexity of MpMcLSDM issues. Subsequently, we propose to assign equal initial weight to each cluster to protect the interests of minorities. Meanwhile, a consensus contribution-based dynamic interactive weight updating method is implemented in the DCRPs to promote a high level of democratic consensus. Besides, a compromise degree-based consensus feedback strategy is developed to improve the efficiency of the DCRPs. The proposed feedback mechanism effectively considers the individual concern and adjustment willingness of DMs in the DCRPs. Finally, a case study and some comparisons are given to show the effectiveness and innovation of this research.

原文链接:

NSTL
Elsevier

Explain and improve: LRP-inference fine-tuning for image captioning models

Sun, JiameiLapuschkin, SebastianBinder, AlexanderSamek, Wojciech...

14页

查看更多>>摘要：This paper analyzes the predictions of image captioning models with attention mechanisms beyond visualizing the attention itself. We develop variants of Layer-wise Relevance Propagation (LRP) and gradient-based explanation methods, tailored to image captioning models with attention mechanisms. We compare the interpretability of attention heatmaps systematically against the explanations provided by explanation methods such as LRP, Grad-CAM, and Guided Grad-CAM. We show that explanation methods provide simultaneously pixel-wise image explanations (supporting and opposing pixels of the input image) and linguistic explanations (supporting and opposing words of the preceding sequence) for each word in the predicted captions. We demonstrate with extensive experiments that explanation methods (1) can reveal additional evidence used by the model to make decisions compared to attention; (2) correlate to object locations with high precision; (3) are helpful to ``debug'' the model, e.g. by analyzing the reasons for hallucinated object words. With the observed properties of explanations, we further design an LRP-inference fine-tuning strategy that reduces the issue of object hallucination in image captioning models, and meanwhile, maintains the sentence fluency. We conduct experiments with two widely used attention mechanisms: the adaptive attention mechanism calculated with the additive attention and the multi-head attention mechanism calculated with the scaled dot product.

原文链接:

NSTL
Elsevier

Proposal-Copula-Based Fusion of Spaceborne and Airborne SAR Images for Ship Target Detection

Wang, XueqianZhu, DongLi, GangZhang, Xiao-Ping...

14页

查看更多>>摘要：In this paper, we consider the problem of fusion of synthetic aperture radar (SAR) images from spaceborne and airborne sensors and investigate its applications to inshore ship target detection. Existing SAR image fusion methods mainly focus on image denoising or texture enhancement, but show limited improvement of target-toclutter ratios (TCRs) in composite images and lead to deteriorated target detection performance. To address this issue, we propose a new method for the fusion of spaceborne and airborne SAR images based on the target proposal and the copula theory (TPCT). In TPCT, target and clutter correspondence between different images are exploited to improve the TCRs of composite images. TPCT consists of three steps. First, target proposals are extracted from spaceborne and airborne SAR images and then fused to enhance the common ship target areas therein. Second, a new method to construct the joint probability density function (PDF) of clutter in spaceborne and airborne SAR images is presented to model the statistical dependence of clutter therein based on the copula theory. This copula-based joint PDF is used to suppress the clutter areas remained in the intersection of target proposals. Third, clues from the intersection of target proposals and the copula-based joint PDF of clutter are fused by the Hadamard product to generate the composite image with enhanced ship targets and the suppressed clutter. Experimental results based on measured spaceborne and airborne SAR data show that the proposed TPCT fusion method leads to higher TCRs of composite images and better performance in the ship detection task than other commonly used image fusion methods.

原文链接:

NSTL
Elsevier

Finding and removing Clever Hans: Using explanation methods to debug and improve deep models

Anders, Christopher J.Weber, LeanderNeumann, DavidSamek, Wojciech...

35页

查看更多>>摘要：Contemporary learning models for computer vision are typically trained on very large (benchmark) datasets with millions of samples. These may, however, contain biases, artifacts, or errors that have gone unnoticed and are exploitable by the model. In the worst case, the trained model does not learn a valid and generalizable strategy to solve the problem it was trained for, and becomes a ``Clever Hans'' predictor that bases its decisions on spurious correlations in the training data, potentially yielding an unrepresentative or unfair, and possibly even hazardous predictor. In this paper, we contribute by providing a comprehensive analysis framework based on a scalable statistical analysis of attributions from explanation methods for large data corpora. Based on a recent technique - Spectral Relevance Analysis - we propose the following technical contributions and resulting findings: (a) a scalable quantification of artifactual and poisoned classes where the machine learning models under study exhibit Clever Hans behavior, (b) several approaches we collectively denote as Class Artifact Compensation, which are able to effectively and significantly reduce a model's Clever Hans behavior, i.e., we are able to un-Hans models trained on (poisoned) datasets, such as the popular ImageNet data corpus. We demonstrate that Class Artifact Compensation, defined in a simple theoretical framework, may be implemented as part of a neural network's training or fine-tuning process, or in a post-hoc manner by injecting additional layers, preventing any further propagation of undesired Clever Hans features, into the network architecture. Using our proposed methods, we provide qualitative and quantitative analyses of the biases and artifacts in, e.g., the ImageNet dataset, the Adience benchmark dataset of unfiltered faces, and the ISIC 2019 skin lesion analysis dataset. We demonstrate that these insights can give rise to improved, more representative, and fairer models operating on implicitly cleaned data corpora.

原文链接:

NSTL
Elsevier