首页期刊导航|Information Sciences
期刊信息/Journal information
Information Sciences
Elsevier
Information Sciences

Elsevier

0020-0255

Information Sciences/Journal Information SciencesSCIAHCIISTPEI
正式出版
收录年代

    Efficient edit rule implication for nominal and ordinal data

    Boeckling, ToonDe Tre, GuyBronselaer, Antoon
    19页
    查看更多>>摘要:Edit rule implication is an essential subtask when repairing data inconsistencies against a set of edit rules. In this paper, novel techniques to enhance the performance of this subtask are studied. Our work includes several contributions. First, we draw attention to the case of nominal edit rules in particular. We point out that in many cases, starting with a set of edit rules that is as small as possible is important to improve the performance. This could be achieved by folding edit rules together. Besides that, an enhanced nominal edit rule implication algorithm is proposed, exploiting the properties of nominal edit rules. Second, we introduce ordinal edit rules as a generalization of nominal edit rules, used to capture data inconsistencies for data measured on an ordinal scale and we propose an ordinal edit rule implication algorithm. Evaluation of our methods shows promising results for both implication algorithms, with the ordinal algorithm as best performing in general. On average, our techniques improve the state-of-the-art algorithm for edit rule implication with more than 50%. (C) 2021 Elsevier Inc. All rights reserved.

    Dynamic embeddings for efficient parameter learning of Bayesian network with multiple latent variables

    Qi, ZhiweiYue, KunDuan, LiangHu, Kuang...
    19页
    查看更多>>摘要:Latent variables (LVs), representing the unobservable abstract concepts, such as patient disease and customer credit, play an important role in the simplification of network structure and improving the interpretability of Bayesian network (BN). However, LVs incorporated into BN lead to missing probability parameters due to the missing observation. As the classic method for parameter estimation for the situation with LVs, the expectation- maximization (EM) suffers high complexity and slow convergence. To this end, we propose dynamic embeddings for parameter learning of BN with LVs. Firstly, we reconstruct the Estep of EM and propose to use the dynamic embeddings to calculate the weights of fractional samples that could reduce the computational complexity of parameter learning. Secondly, we propose to construct the point mutual information (PMI) matrix to represent directed weighted graphs (DWGs) transformed from the updated parameters. Thirdly, the incremental singular value decomposition (SVD) is adopted to generate dynamic embeddings while capturing the updated parameters and preserving BN's graphical structure. Experimental results show that our proposed methods are efficient and effective. On real-world BNs, the efficiency, convergence and accuracy of our method outperform those of the state-of-the-art methods for parameter learning of BN with multiple LVs. (C) 2022 Published by Elsevier Inc.

    Less is more: improving neural-based collaborative filtering by using landmark modeling

    Mello, Carlos E.Mourthe, Adriano
    17页
    查看更多>>摘要:Collaborative Filtering (CF) has been extensively studied over the last decade. Among the most successful methods, neural-based methods have changed the research landscape by bringing cutting-edge artificial neural networks (ANN) and big data to offer the best personalization users have ever experienced. However, a significant issue arose from these neural-based methods is that the increase of the model complexity directly impacts the computational cost. In general, this drawback is a consequence of the high dimensionality and sparsity of the input feature space. We believe that reducing sparsity and dimensionality of the input features is essencial to enhance accuracy while keeping computational costs low. This paper investigates an alternative modeling for the CF setting to represent users or items by using landmarks. This modeling drastically decreases the input feature space and eliminate sparsity while maintaining the underlying information needed to achieve great accuracy. Based on that, we propose a novel neural-based CF method via landmark modeling. Experiments on six real-world benchmark CF datasets were conducted by comparing the proposed method to well-known and widely-used CF methods of the state-of-the-art. The results show that the Neural Landmark method outperforms the other methods in both accuracy and computational performance. (C) 2022 Elsevier Inc. All rights reserved.

    Knowledge derivation from Likert scale using Z-numbers

    Anjaria, Kushal
    19页
    查看更多>>摘要:The Likert scale is an extensively used psychometric scale in questionnaire-based research. It is the most widely used approach for measuring responses in survey research. The data collected using the Likert scale is somewhat ambiguous in terms of data quality and latent variable assessment. Besides, there are numerous debates about the Likert scale regarding the accuracy of the data collected, the ranking order and distance between the scale options, the use of multiple scales for different questionnaires, the ability to combine scale options, the possibility of performing mathematical operations on the data collected using the Likert scale, and quantification of respondents'negation choices. How to extract useful information from the Likert scale and resolve the aforementioned debates has been an interesting and open topic of discussion in recent years. To address this open issue, we deploy the Z-number in conjunction with the Likert scale since the former is a useful tool for generating information from objective and subjective data. A case study with numerical examples is provided to demonstrate the proposed technique. Additionally, the case study illustrates how Z-numbers might be used to resolve protracted disagreements on the usage of the Likert scale. One critical component of the proposed study is that it does not offer a new variety of scale but rather attempts to resolve long-standing controversies and issues using the conventional, regular Likert scale with Z-numbers. (C) 2022 Elsevier Inc. All rights reserved.

    A Self-paced Learning based Transfer Model for Hypergraph Matching

    Zhu, HuWang, XueqinXu, GuoxiaDeng, Lizhen...
    14页
    查看更多>>摘要:Determination of correspondences between vertexes of two graphs is one of essential tasks in the computer vision fields. Despite the graph matching problem is NP-hard, hypergraph matching is well used in many matching methods from the perspective of higher order geometric information. However, it is still a challenge to learn graph models from observed samples of graph matching. In this paper, we present an effective scheme to parameterize a graph model through self-paced learning algorithm. Consequently, each iteration heuristically selects smaller loss samples in a data-driven manner, the learning and matching problems are aligned to learn a new transfer hypergraph model for constructing its high-order structural attributes for visual object matching. For the final matching task between two graphs, we develop the transfer matching method through the assignment matrix decomposition to achieve it. Several experiments on Willow-Object datasets and some other data sets indicate the good performance of our method. (C) 2022 Published by Elsevier Inc.

    Online feature selection for multi-source streaming features

    You, DianlongSun, MiaomiaoLiang, ShunpanLi, Ruiqi...
    29页
    查看更多>>摘要:Multi-source streaming feature selection in an online manner has attracted considerable attention, from researchers because it can reduce the dimensionality of heterogeneous big data. However, traditional online algorithms such as Alpha-investing , Online Streaming Feature Selection (OSFS), Online Group Feature Selection (OGFS) and Scalable and Accurate OnLine Approach (SAOLA) consider only a single data source with fixed instances, and are not directly applicable to multi-source data. Multi-source Causal Feature Selection (MCFS) can search for an invariant set in multiple interventional datasets. However, fixed feature spaces are restrained, and exactly these same features are required among multi-source data. To overcome these limitations, we propose a novel method known as Multi-source Streaming Feature Selection (MSFS) to tackle the feature selection problem for multi-source streaming features. The MSFS algorithm addresses a new feature from a random source in three phases: relevance, intra-source redundancy, and inter source redundancy analyses. That is, MSFS attempts to mine the potential relationships among different data sources rather than only independently consider each data source. In particular, each new feature is analyzed online using the overlapping instances from all data sources, and the Markov blanket (MB) of the target variable is dynamically adjusted. To evaluate the performance of the MSFS algorithm, we compare it with that of the abovementioned algorithms on 14 datasets and two real-world scenarios. The results demonstrate that MSFS outperforms the existing algorithms in classification accuracy and number of selected features. (C) 2022 Elsevier Inc. All rights reserved.

    On some similarity of finite sets (and what we can say today about certain old problem)

    Pliszka, Zbigniew
    26页
    查看更多>>摘要:The paper presents a new way of classifying numerical sets (division into abstraction classes, called cuts hereafter in this paper) after imposing a constraint on the sum of elements of their subsets. A simple and not too expensive operation of sorting a set (with maximum n log(n), sorting by merge sort) allowed to determine pairs of subsets (which shall be described in the paper as pairs in a solid compound) representing a given set with the accepted constraint, which in turn allowed to prove various properties and limit the number of necessary comparisons. Then, the properties of the ratio of the sum of elements to the constraint within its cut were demonstrated, and to this cut (the abstraction class) an interval on the real number axis was assigned. To illustrate the obtained results, a mutation tree is used as an image of a unit hypercube. For introduced concepts, for example, their application for the classical number partitioning problem is shown, which is NPcomplete according to the classification (which was proved by Richard Karp in 1972 in his work [1] and classified it as a subproblem of the knapsack problem). (C) 2022 Elsevier Inc. All rights reserved.

    An axiomatic distance methodology for aggregating multimodal evaluations

    Escobedo, Adolfo R.Yasmin, RomenaMoreno-Centeno, Erick
    24页
    查看更多>>摘要:This work introduces a multimodal data aggregation methodology featuring optimization models and algorithms for jointly aggregating heterogeneous ordinal and cardinal evaluation inputs into a consensus evaluation. Specifically, this work derives mathematical modeling components to enforce three types of logical couplings between the collective ordinal and cardinal evaluations: Rating and ranking preferences, numerical and ordinal estimates, and rating and approval preferences. The proposed methodology is based on axiomatic distances rooted in social choice theory. Moreover, it adequately deals with highly incomplete evaluations, tied values, and other complicating aspects of group decision-making contexts. We illustrate the practicality of the proposed methodology in a case study involving an academic student paper competition. The methodology's advantages and computational aspects are further explored via synthetic instances sampled from distributions parametrized by ground truths and varying noise levels. These results show that multimodal aggregation effectively extracts a collective truth from noisy information sources and successfully captures the distinctive evaluation qualities of rating and ranking preference data. (C) 2022 Elsevier Inc. All rights reserved.

    Scenario-based analysis for discovering relations among interestingness measures

    Somyanonthanakul, RachasakTheeramunkong, Thanaruk
    40页
    查看更多>>摘要:Many interestingness measures have been proposed for mining meaningful association rules among two events in the form of A -> B, but their characteristics and semantic similarity relations have not been comprehensively investigated. This paper presents a scenario-based approach for characterizing sixty-one commonly used measures and revealing their relationships in three steps. The first step generates a set of 969 threeprobability scenarios, S = {s vertical bar s = (p(A), p(B), p(A, B) Lambda P(A), P(B), P(A vertical bar B) is an element of[0, 1]Lambda P(A,B) <= min(p(A), p(B))}, in consideration of all possible situations in the range of 0.0 to 1.0 with a step of 0.05, excluding infinity and not-a-number cases. In the second step, 937,992 pairs of scenarios are enumerated, and for each scenario pair s(1) and s(2), the values of a measure (M) of s(1) and s(2), i.e., M(s(1)) and M(s(2)), are compared with the result of greater-than (M(s(1))>M(s(2))), smaller-than (M(s(1))<M(s(2))), or equal-to (M(s(1))<M(s(2))) for characterizing the measure. The final step is based on three types of relations: (1) behavior-based, (2) correlation-based, and (3) association-based similarity relations. The behavior of measures is depicted using nine common algebraic/statistical properties and four special condition properties, i.e., zero, min-max, infinity, and not-a-number of the measures. Similarities among the measures can be examined by grouping measures based on their properties. With three correlation functions, i.e., correlation coefficient, joint entropy, and mutual information, a correlation analysis was performed to discover relations among interestingness measures in the form of dendrograms and clusters with thresholding. Finally, the details of the relations among these interestingness measures are explored with association rule mining. Besides support, confidence, and lift, we propose five types of rules, i.e. same-directionrule (S-rule), opposite-direction rule (O-rule), equal-both rule (E-rule), equal-left rule (EL-rule),and equal-right rule (ER-rule) for a five-gradient comparison of any two measures to outline their similarities and dissimilarities in five directions. (C) 2022 Elsevier Inc. All rights reserved.