期刊,大数据挖掘与分析（英文版） 2023年卷4期_国家学术搜索

期刊信息/Journal information

大数据挖掘与分析（英文版）

大数据挖掘与分析（英文版）/Journal Big Data Mining and AnalyticsCSCDEI

正式出版

收录年代

VDCM:A Data Collection Mechanism for Crowd Sensing in Vehicular Ad Hoc Networks

Juli YinLinfeng WeiZhiquan LiuXi Yang...

391-403页

查看更多>>摘要：With the rapid development of mobile devices,aggregation security and efficiency topics are more important than past in crowd sensing.When collecting large-scale vehicle-provided data,the data transmitted via autonomous networks are publicly accessible to all attackers,which increases the risk of vehicle exposure.So we need to ensure data aggregation security.In addition,low aggregation efficiency will lead to insufficient sensing data,making the data unable to provide data mining services.Aiming at the problem of aggregation security and efficiency in large-scale data collection,this article proposes a data collection mechanism(VDCM)for crowd sensing in vehicular ad hoc networks(VANETs).The mechanism includes two mechanism assumptions and selects appropriate methods to reduce consumption.It selects sub mechanism 1 when there exist very few vehicles or the coalition cannot be formed,otherwise selects sub mechanism 2.Single aggregation is used to collect data in sub mechanism 1.In sub mechanism 2,cooperative vehicles are selected by using coalition formation strategy and auction cooperation agreement,and multi aggregation is used to collect data.Two sub mechanisms use Paillier homomorphic encryption technology to ensure the security of data aggregation.In addition,mechanism supplements the data update and scoring steps to increase the amount of available data.The performance analysis shows that the mechanism proposed in this paper can safely aggregate data and reduce consumption.The simulation results indicate that the proposed mechanism reduces time consumption and increases the amount of available data compared with existing mechanisms.

原文链接:

万方数据

Elastic Optimization for Stragglers in Edge Federated Learning

Khadija SultanaKhandakar AhmedBruce GuHua Wang...

404-420页

查看更多>>摘要：To fully exploit enormous data generated by intelligent devices in edge computing,edge federated learning(EFL)is envisioned as a promising solution.The distributed collaborative training in EFL deals with delay and privacy issues compared to traditional centralized model training.However,the existence of straggling devices,responding slow to servers,degrades model performance.We consider data heterogeneity from two aspects:high dimensional data generated at edge devices where the number of features is greater than that of observations and the heterogeneity caused by partial device participation.With large number of features,computation overhead on the devices increases,causing edge devices to become stragglers.And incorporation of partial training results causes gradients to be diverged which further exaggerates when more training is performed to reach local optima.In this paper,we introduce elastic optimization methods for stragglers due to data heterogeneity in edge federated learning.Specifically,we define the problem of stragglers in EFL.Then,we formulate an optimization problem to be solved at edge devices.We customize a benchmark algorithm,FedAvg,to obtain a new elastic optimization algorithm(FedEN)which is applied in local training of edge devices.FedEN mitigates stragglers by having a balance between lasso and ridge penalization thereby generating sparse model updates and enforcing parameters as close as to local optima.We have evaluated the proposed model on MNIST and CIFAR-10 datasets.Simulated experiments demonstrate that our approach improves run time training performance by achieving average accuracy with less communication rounds.The results confirm the improved performance of our approach over benchmark algorithms.

原文链接:

万方数据
维普

Personalized Federated Learning for Heterogeneous Residential Load Forecasting

Xiaodong QuChengcheng GuanGang XieZhiyi Tian...

421-432页

查看更多>>摘要：Accurate load forecasting is critical for electricity production,transmission,and maintenance.Deep learning(DL)model has replaced other classical models as the most popular prediction models.However,the deep prediction model requires users to provide a large amount of private electricity consumption data,which has potential privacy risks.Edge nodes can federally train a global model through aggregation using federated learning(FL).As a novel distributed machine learning(ML)technique,it only exchanges model parameters without sharing raw data.However,existing forecasting methods based on FL still face challenges from data heterogeneity and privacy disclosure.Accordingly,we propose a user-level load forecasting system based on personalized federated learning(PFL)to address these issues.The obtained personalized model outperforms the global model on local data.Further,we introduce a novel differential privacy(DP)algorithm in the proposed system to provide an additional privacy guarantee.Based on the principle of generative adversarial network(GAN),the algorithm achieves the balance between privacy and prediction accuracy throughout the game.We perform simulation experiments on the real-world dataset and the experimental results show that the proposed system can comply with the requirement for accuracy and privacy in real load forecasting scenarios.

原文链接:

万方数据
维普

K-Means Clustering with Local Distance Privacy

Mengmeng YangLongxia HuangChenghua Tang

433-442页

查看更多>>摘要：With the development of information technology,a mass of data are generated every day.Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition.K-means clustering has been widely used for cluster analysis in real life.However,these analyses are based on users'data,which disclose users'privacy.Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis.However,existing K-means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee.To solve this problem,we propose a novel method that provides local distance privacy for users who participate in the clustering analysis.Instead of making the users'records in-distinguish from each other in high-dimensional space,we map the user's record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other.To be specific,we generate a noisy distance first and then synthesize the high-dimensional data record.We propose a Bounded Laplace Method(BLM)and a Cluster Indistinguishable Method(CIM)to sample such a noisy distance,which satisfies the local differential privacy guarantee and local dE-privacy guarantee,respectively.Furthermore,we introduce a way to generate synthetic data records in high-dimensional space.Our experimental evaluation results show that our methods outperform the traditional methods significantly.

原文链接:

万方数据
维普

Towards Privacy-Aware and Trustworthy Data Sharing Using Blockchain for Edge Intelligence

Youyang QuLichuan MaWenjie YeXuemeng Zhai...

443-464页

查看更多>>摘要：The popularization of intelligent healthcare devices and big data analytics significantly boosts the development of Smart Healthcare Networks(SHNs).To enhance the precision of diagnosis,different participants in SHNs share health data that contain sensitive information.Therefore,the data exchange process raises privacy concerns,especially when the integration of health data from multiple sources(linkage attack)results in further leakage.Linkage attack is a type of dominant attack in the privacy domain,which can leverage various data sources for private data mining.Furthermore,adversaries launch poisoning attacks to falsify the health data,which leads to misdiagnosing or even physical damage.To protect private health data,we propose a personalized differential privacy model based on the trust levels among users.The trust is evaluated by a defined community density,while the corresponding privacy protection level is mapped to controllable randomized noise constrained by differential privacy.To avoid linkage attacks in personalized differential privacy,we design a noise correlation decoupling mechanism using a Markov stochastic process.In addition,we build the community model on a blockchain,which can mitigate the risk of poisoning attacks during differentially private data transmission over SHNs.Extensive experiments and analysis on real-world datasets have testified the proposed model,and achieved better performance compared with existing research from perspectives of privacy protection and effectiveness.

原文链接:

万方数据
维普

Replication-Based Query Management for Resource Allocation Using Hadoop and MapReduce over Big Data

Ankit KumarNeeraj VarshneySurbhi BhatiyaKamred Udham Singh...

465-477页

查看更多>>摘要：We live in an age where everything around us is being created.Data generation rates are so scary,creating pressure to implement costly and straightforward data storage and recovery processes.MapReduce model functionality is used for creating a cluster parallel,distributed algorithm,and large datasets.The MapReduce strategy from Hadoop helps develop a community of non-commercial use to offer a new algorithm for resolving such problems for commercial applications as expected from this working algorithm with insights as a result of disproportionate or discriminatory Hadoop cluster results.Expected results are obtained in the work and the exam conducted under this job;many of them are scheduled to set schedules,match matrices'data positions,clustering before determining to click,and accurate mapping and internal reliability to be closed together to avoid running and execution times.Mapper output and proponents have been implemented,and the map has been used to reduce the function.The execution input key/value pair and output key/value pair have been set.This paper focuses on evaluating this technique for the efficient retrieval of large volumes of data.The technique allows for capabilities to inform a massive database of information,from storage and indexing techniques to the distribution of queries,scalability,and performance in heterogeneous environments.The results show that the proposed work reduces the data processing time by 30％.

原文链接:

万方数据
维普

AI-Based Hybrid Models for Predicting Loan Risk in the Banking Sector

Vikas KumarShaiku Shahida SahebPreetiAtif Ghayas...

478-490页

查看更多>>摘要：Every real-world scenario is now digitally replicated in order to reduce paperwork and human labor costs.Machine Learning(ML)models are also being used to make predictions in these applications.Accurate forecasting requires knowledge of these machine learning models and their distinguishing features.The datasets we use as input for each of these different types of ML models,yielding different results.The choice of an ML model for a dataset is critical.A loan risk model is used to show how ML models for a dataset can be linked together.The purpose of this study is to look into how we could use machine learning to quantify or forecast mortgage credit risk.This phrase refers to the process of evaluating massive amounts of data in order to derive useful information for making decisions in a variety of fields.If credit risk is considered,a method based on an examination of what caused and how mortgage credit risk affected credit defaults during the still-current economic crisis of 2021 will be tried.Various approaches to credit risk calculation will be examined,ranging from the most basic to the most complex.In addition,we will conduct a case study on a sample of mortgage loans and compare the results of three different analytical approaches,logistic regression,decision tree,and gradient boost to see which one produced the most commercially useful insights.

原文链接:

万方数据
维普

A PLS-SEM Based Approach:Analyzing Generation Z Purchase Intention Through Facebook's Big Data

Vikas KumarPreetiShaiku Shahida SahebSunil Kumari...

491-503页

查看更多>>摘要：The objective of this paper is to provide a better rendition of Generation Z purchase intentions of retail products through Facebook.The study gyrated around the favorable attitude formation of Generation Z translating into intentions to purchase retail products through Facebook.The role of antecedents of attitude,namely enjoyment,credibility,and peer communication was also explored.The main purpose was to analyze the F-commerce pervasiveness(retail purchases through Facebook)among Generation Z in India and how could it be materialized effectively.A conceptual fagade was proposed after trotting out germane and urbane literature.The study focused exclusively on Generation Z population.The data were statistically analyzed using partial least squares structural equation modelling.The study found the proposed conceptual model had a high prediction power of Generation Z intentions to purchase retail products through Facebook verifying the materialization of F-commerce.Enjoyment,credibility,and peer communication were proved to be good predictors of attitude(R2=0.589)and furthermore attitude was found to be a stellar antecedent to purchase intentions(R2=0.540).

原文链接:

万方数据

Diagnosis and Detection of Alzheimer's Disease Using Learning Algorithm

Gargi Pant ShuklaSantosh KumarSaroj Kumar PandeyRohit Agarwal...

504-512页

查看更多>>摘要：In Computer-Aided Detection(CAD)brain disease classification is a vital issue.Alzheimer's Disease(AD)and brain tumors are the primary reasons of death.The studies of these diseases are carried out by Magnetic Resonance Imaging(MRI),Positron Emission Tomography(PET),and Computed Tomography(CT)scans which require expertise to understand the modality.The disease is the most prevalent in the elderly and can be fatal in its later stages.The result can be determined by calculating the mini-mental state exam score,following which the MRI scan of the brain is successful.Apart from that,various classification algorithms,such as machine learning and deep learning,are useful for diagnosing MRI scans.However,they do have some limitations in terms of accuracy.This paper proposes some insightful pre-processing methods that significantly improve the classification performance of these MRI images.Additionally,it reduced the time it took to train the model of various pre-existing learning algorithms.A dataset was obtained from Alzheimer's Disease Neurological Initiative(ADNI)and converted from a 4D format to a 2D format.Selective clipping,grayscale image conversion,and histogram equalization techniques were used to pre-process the images.After pre-processing,we proposed three learning algorithms for AD classification,that is random forest,XGBoost,and Convolution Neural Networks(CNN).Results are computed on dataset and show that it outperformed with exiting work in terms of accuracy is 97.57％and sensitivity is 97.60％.

原文链接:

万方数据

A Clinical Data Analysis Based Diagnostic Systems for Heart Disease Prediction Using Ensemble Method

Ankit KumarKamred Udham SinghManish Kumar

513-525页

查看更多>>摘要：The correct diagnosis of heart disease can save lives,while the incorrect diagnosis can be lethal.The UCI machine learning heart disease dataset compares the results and analyses of various machine learning approaches,including deep learning.We used a dataset with 13 primary characteristics to carry out the research.Support vector machine and logistic regression algorithms are used to process the datasets,and the latter displays the highest accuracy in predicting coronary disease.Python programming is used to process the datasets.Multiple research initiatives have used machine learning to speed up the healthcare sector.We also used conventional machine learning approaches in our investigation to uncover the links between the numerous features available in the dataset and then used them effectively in anticipation of heart infection risks.Using the accuracy and confusion matrix has resulted in some favorable outcomes.To get the best results,the dataset contains certain unnecessary features that are dealt with using isolation logistic regression and Support Vector Machine(SVM)classification.

原文链接:

万方数据
维普