首页期刊导航|Information processing & management
期刊信息/Journal information
Information processing & management
Pergamon Press
Information processing & management

Pergamon Press

双月刊

0306-4573

Information processing & management/Journal Information processing & managementSCIAHCISSCIISSHP
正式出版
收录年代

    Mitigating collusive manipulation of reviews in e-commerce platforms: Evolutionary game and strategy simulation

    Xiaoxia XuRuguo FanDongxue WangXiao Xie...
    104080.1-104080.27页
    查看更多>>摘要:The growing review manipulation has seriously hampered credit regulation on e-commerce platforms, yet few studies have explored its complex dynamics. Unlike current research centering on merchants creating various management strategies, this study examines the collusion between merchants and consumers. By integrating evolutionary game theory and a system dynamics approach, this study offers meaningful conclusions for platform credit management. First, our findings indicate that merchants can maintain honesty regardless of the regulatory strategy implemented. For positive regulation, platforms can impose higher penalties; for negative regulation, maintaining lower exposure is feasible. Second, our analysis illustrates the necessity of breaking the collusion between merchants and consumers. Under positive regulation, platforms can amplify penalties or enhance the regulatory impact on platform revenues. Conversely, negative regulation allows for reducing the short-term financial impact of reviews or adjusting cashback. Third, we uncover that dynamic punishment strategies are not always optimal. In some cases, static punishment strategies outperform linear dynamic punishment strategies, highlighting the importance of carefully evaluating the effectiveness of different regulatory approaches in various contexts.

    An LLM-assisted ETL pipeline to build a high-quality knowledge graph of the Italian legislation

    Andrea ColomboAnna BemasconiStefano Ceri
    104082.1-104082.28页
    查看更多>>摘要:The increasing complexity of legislative systems, characterized by an ever-growing number of laws and their interdependencies, has highlighted the utility of Knowledge Graphs (KGs) as an effective data model for organizing such information, compared to traditional methods, often based on relational models, which struggle to efficiently represent interlinked data, such as references within laws, hindering efficient knowledge discovery. A paradigm shift in modeling legislative data is already ongoing with the adoption of common international standards, predominantly XML-based, such as Akoma Ntoso (AKN) and the Legal Knowledge Interchange Format, which aim to capture fundamental aspects of laws shared across different legislations and simplify the task of creating Knowledge Graphs through the use of XML tags and identifiers. However, to enable advanced analysis and data discovery within these KGs, it is necessary to carefully check, complement, and enrich KG nodes and edges with properties, either metadata or additional derived knowledge, that enhance the quality and utility of the model, for instance, by leveraging the capabilities of state-of-the-art Large Language Models. In this paper, we present an ETL pipeline for modeling and querying the Italian legislation in a Knowledge Graph, by adopting the property graph model and the AKN standard implemented in the Italian system. The property graph model offers a good compromise between knowledge representation and the possibility of performing graph analytics, which we consider essential for enabling advanced pattern detection. Then, we enhance the KG with valuable properties by employing carefully fine-tuned open-source LLMs, i.e., BERT and Mistral-7B models, which enrich and augment the quality of the KG, allowing in-depth analysis of legislative data.

    Improving event representation learning via generating and utilizing synthetic data

    Yubo FengLishuang LiXueyang QinBeibei Zhang...
    104083.1-104083.17页
    查看更多>>摘要:Representations of events are important in various event-related tasks. Recent advances in event representation learning have focused on Contrastive Learning (CL) resulting in remarkable progress. However, solely using dropout as the data augmentation technique in CL methods may cause the model to become sensitive to length differences between event pairs. Moreover, CL methods ignore the evidence that the similarities between positive pairs are different, and the encoder-aware similarities also change dynamically as training progresses. It may cause the event encoder to learn the alignment of positive pairs at a coarse-grained level. In this paper, we propose LLM-CL: a Large Language Models-driven self-adaptive Contrastive Learning framework for event representation learning. Specifically, we present an event knowledge graph-augmented synthetic data generation method designed to alleviate the sensitivity of CL-based models to length differences between event pairs. This method generates large-scale, high-quality event pairs with equivalent semantics, little lexical overlap, and varying text lengths. Additionally, we propose a novel CL method called self-adaptive contrastive learning to help the event encoder effectively and efficiently learn the alignment of synthetic data at fine-grained levels. This method dynamically estimates encoder-aware similarities and scales the CL losses accordingly. Experimental results show that LLM-CL outperforms strong baselines in both intrinsic and extrinsic evaluations.

    MHGC: Multi-scale hard sample mining for contrastive deep graph clustering

    Tao RenHaodong ZhangYifan WangWei Ju...
    104084.1-104084.16页
    查看更多>>摘要:Contrastive graph clustering holds significant importance for numerous real-world applications and yields encouraging performance. However, current efforts often overlook hierarchical high-order semantic information and treat all contrastive pairs equally during optimization. Consequently, the abundance of well sample pairs overwhelms the critical structural context learning process, limiting the accumulation of information and deteriorating the network's learning capability. To address this concern, a novel contrastive deep graph clustering method termed MHGC is proposed by conducting hard sample mining in contrastive learning with multi-granularity. Specifically, random walk with restart is utilized to sample subgraphs centered around anchor nodes. Then, an attribute encoder to learn node representations is designed to obtain subgraph embeddings. Subsequently, hard and easy sample pairs within high-confidence clusters is identified by applying a two-component beta mixture model to the clustering loss. Building upon this, a weight regulator is then elaborated to adaptively tune the weights of sample pairs and a multi-scale contrastive loss framework is proposed to leverage structural context information in a hierarchical contrastive manner. Comprehensive experiments conducted on six widely used datasets confirm the comparable performance of our MHGC relative to the state-of-the-art baselines, demonstrating an average increase of 1.54% in accuracy. Additionally, the ablation study further proves that our proposed multi-scale learning scheme and BMM-based hard mining strategy are effective approaches for the graph clustering task. The source code is available at https://github.com/sodarin/MHGC

    Improving cross-document event coreference resolution by discourse coherence and structure

    Xinyu ChenPeifeng LiQiaoming Zhu
    104085.1-104085.19页
    查看更多>>摘要:Cross-Document Event Coreference Resolution (CD-ECR) is to identify and cluster together event mentions that occur across multiple documents. Existing methods exhibit two limitations: (1) In contrast to within-document event mentions, which are linked by rich, coherent contexts, cross-document event mentions lack such contexts, posing a challenging for the model to understand the relation between two event mentions in different documents. (2) The lack of coherent textual information between cross-document event mentions lead to the inability to capture their global information, which is important to mine long-distance interactions between them. To tackle these issues, we propose a novel discourse coherence enhancement mechanism and introduce discourse structure to improve cross-document event coreference resolution. Specifically, we first introduce a new task: Event-oriented cross-document coherence enhancement (ECD-CoE), which selects coherent sentences that form a coherent text for two cross-document event mentions. Second, we represent the coherent text as a tree structure with rhetorical relation information between textual units. We then obtain the global interaction information of event mentions from the tree structures and finally resolve coreferent events. Experimental results on both the ECB+ and GVC datasets indicate that our proposed method outperforms several state-of-the-art baselines.

    Effective near-duplicate image detection using perceptual hashing and deep learning

    Yash JakharMalaya Dutta Borah
    104086.1-104086.12页
    查看更多>>摘要:Computer vision has always been concerned with near-duplicate image detection. Previous approaches for detecting near duplicates highlighted the necessity to adequately explore the aspect of image transformations for effectively handling complex images. We proposed a method of finding near duplicate images using the integration of three different techniques: perceptual hashing, Siamese network, and Vision Transformer. Perceptual hashing gives us a quick way to filter out similar-looking pictures, while the Siamese network architecture paired with the Vision transformer helps us identify more complex near duplicate instances. The integrated approach learns a metric space from data, which reflects both visual similarity and perceptual closeness among items in the dataset. The results demonstrate the effectiveness and robustness of our proposed method, achieving an AUROC of 0.99 and a precision of 0.987 on the California-ND dataset, and an AUROC of 0.92 with a precision of 0.884 on the INRIA Holidays dataset, significantly outperforming traditional methods by over 10% in both metrics. This represents a significant step forward in near-duplicate image detection research.

    Competition or coexistence: Diffusion network differences between entertainment events and public events on social media

    Sini SuYusong DaiXiaoke XuZhijin Zhong...
    104087.1-104087.20页
    查看更多>>摘要:There is a prevalent concern that public information will be marginalized due to the prevailing preference for entertainment content on social media, consequently impacting public engagement. Despite extensive discussions, the relationship between entertainment and public affairs remains ambiguous. Unlike the majority of relative studies that examine broad phenomena or topics, we focus on event-specific diffusion networks, thereby avoiding ambiguous information categorization. Specifically, we separately selected 10 of the most influential events that happened from 9:00 to 15:00 on June 23,2021, in both entertainment and public fields on Weibo. The collected dataset comprises 4,361,793 original posts, 16,511,446 reposts, and 10,557,370 users. Based on the diffusion network of entertainment events with those of public events, we observed that entertainment events do not divert attention from public events closely associated with people's lives. This remains the case despite entertainment events steadily exhibiting higher diffusion characteristics than those of most public events. Notably, public events can sustain public attention and discussions for longer. In addition, there are differences between the early followers of public events and entertainment events. The former predominantly comprises unverified users, while the latter mainly consists of verified users. Overall, the flow of sentiment tends to be consistent, stable, and transferable in both types of events. This study also utilized event data from June to December 2020, which underwent a complete diffusion process, to reaffirm these findings, thereby validating their explanatory power on a larger scale.

    Injecting new insights: How do review sentiment and rating inconsistency shape the helpfulness of airline reviews?

    Yang LiuLihua MaYue DouZhen Zhu...
    104088.1-104088.21页
    查看更多>>摘要:Evaluating review helpfulness is pivotal in assessing the caliber of airline reviews, instigating lively debates in both academic and practical spheres. This study endeavors to construct a comprehensive conceptual framework grounded in signaling theory, recognizing two factors as indicators shaping the perceived helpfulness of reviews. Empirical analysis was conducted using 82,539 reviews from nine airlines on TripAdvisor. Initially, the study scrutinizes the combined impact of review sentiment and consumer rating, followed by exploring the influence of review inconsistency on review helpfulness. Our experimental results show that most variables achieved a significance of one thousandth. Additionally, we shed light on the moderating effects of several heuristic clues in the model, including text length, seat class, and region. These findings underscore those heuristic clues that collectively influence the helpfulness of reviews. The outcomes of this research can aid airlines in identifying the most helpful reviews, thereby mitigating consumer search costs and empowering reviewers to contribute more valuable insights.

    DCCMA-Net: Disentanglement-based cross-modal clues mining and aggregation network for explainable multimodal fake news detection

    Siqi WeiZheng WangMeiling LiXuanning Liu...
    104089.1-104089.24页
    查看更多>>摘要:Multimodal fake news detection is significant in safeguarding social security. Compared with single-text news, multimodal news data contains rich cross-modal clues that can improve the detection effectiveness: modality-common semantic enhancement, modality-specific semantic complementation, and modality-specific semantic inconsistency. However, most existing studies ignore the disentanglement of modality-specific and modality-common semantics but treat them as an entangled whole. Consequently, these studies can only implicitly explore the interactions between modalities, resulting in a lack of explainability. To address that, we propose a Disentanglement-based Cross-modal Clues Mining and Aggregation Network for explainable fake news detection, called DCCMA-Net. Specifically, DCCMA-Net decomposes each modality into two distinct representations: a modality-common representation that captures shared semantics across modalities, and a modality-specific representation that captures unique semantics within each modality. Then, leveraging these disentangled representations, DCCMA-Net explicitly and comprehensively mines three cross-modal clues: modality-common semantic enhancement, modality-specific semantic complementation, and modality-specific semantic inconsistency. Since not all clues play an equal role in the decision-making process, DCCMA-Net proposes an adaptive attention aggregation module to assign contribution weights to different clues. Finally, DCCMA-Net aggregates these clues based on their contribution weights to obtain highly discriminative news representations for detection, and highlights the most contributive clues as explanations for the detection results. Extensive experiments demonstrate that DCCMA-Net outperforms existing methods, achieving detection accuracy improvements of 2.53%, 4.01%, and 3.99% on Weibo, PHEME, and Gossipcop datasets, respectively. Moreover, the explainability accuracy of DCCMA-Net exceeds that of current state-of-the-art methods on the Weibo dataset.

    Expert-level policy style measurement via knowledge distillation with large language model collaboration

    Yujie ZhangBiao HuangWeikang YuanZhuoren Jiang...
    104090.1-104090.36页
    查看更多>>摘要:Policy style is a crucial concept in policy science that reflects persistent patterns in the policy process across different governance settings. Despite its importance, policy style measurement faces issues of complexity, subjectivity, data sparseness, and computational cost. To overcome these obstacles, we propose KOALA, a novel KnOwledge distillation framework based on large lAnguage modeL collAboration. It transforms the weak scoring abilities of LLMs into a pairwise ranking problem, employs a small set of expert-annotated samples for non-parametric learning, and utilizes knowledge distillation to transfer insights from LLMs to a smaller, more efficient model. The framework incorporates multiple LLM-based agents (Prompter, Ranker, and Analyst) collaborating to comprehend complex measurement standards and self-explain policy style definitions. We validate KOALA on 4,572 Chinese government work reports (1954-2019) from central, provincial, and municipal levels, with a focus on the imposition dimension of policy style. Extensive experiments demonstrate KOALA's effectiveness in measuring the intensity of policy style, highlighting its superiority over state-of-the-art methods. While GPT-4 achieves only 66% accuracy in pairwise ranking of policy styles, KOALA, despite being based on GPT-3.5, achieves a remarkable 85% accuracy, highlighting significant performance improvement. This framework offers a transferable approach for quantifying complex social science concepts in textual data, bridging computational techniques with social science research.