Short-term Demand Prediction for Public Bikes Considering Environment,Noise,Demand Fluctuation,and Negative Output
With the rapid increase in the number of motor vehicles,many cities around the world have begun to vigorously develop public transportation due to the increasing pressure of traffic congestion,environmental pollu-tion,and energy consumption.The green,energy-saving,and healthy bike sharing system not only solves the problem with connecting public transportation systems,but also meets other short-distance transportation needs,so it has become an important supplement to public transportation systems.The purpose of predicting the short-term demand of a public bike system(PBS)is to provide a basis for setting the target inventory of each station when making a dynamic rebalancing plan.Therefore,accurately predicting the short-term demand of a PBS is the premise of accurately making a dynamic rebalancing plan.Existing short-term demand prediction models for PBSs ignore the impacts of the difference between constant and variable environmental factors,the noise that may exist in demand data,demand fluctuation,and negative output on prediction accuracy.In this paper,a GCNN-GRU-E model considering variable environment,data noise and demand fluctuation is proposed by capturing both the spatial dependency of user demand with Graph Convolutional Neural Network(GCNN),and the temporal dependencies of user demand and variable environmental factors with Gated Recur-rent Unit(GRU).Based on the GCNN-GRU-E,a GCNN-GRU-E-C model that can automatically identify and correct negative output is proposed,and a data noise reduction scheme and five data smoothing schemes(i.e.,local fitting,moving average,weighted moving average,moving average+local fitting,and weighted moving average+local fitting)are developed.The datasets used to validate the proposed models in this paper include the transaction dataset of PBS in New York,the station status dataset of PBS in Xi'an,and the variable environ-mental factors datasets in New York and Xi'an.The transaction dataset of PBS in New York is open data automatically recorded by the system,so it is intact and there is no noise;the transaction dataset of PBS in Xi'an is not open data,and thus,the dataset used in this paper is scraped by a web crawler.There is a certain amount of noise in the original data due to some technical causes such as temporary power outage,network congestion,software and hardware failures,etc.First,we perform a necessary preprocessing on the datasets of PBSs in New York and Xi'an,as well as the dataset of variable environmental factors.And then,we use the preprocessed datasets to perform a series of comparative experiments between our two models and nine benchmark models.The experimental results show that the prediction accuracy of the GCNN-GRU-E considering the temporal characteristics of variable environment is higher than that of all the benchmark models;both temporal granularity and data quality affect the prediction accuracy;denoising and smoothing data can significantly improve the prediction accuracy of the GCNN-GRU-E;the weighted moving average+local fitting is the best data smoothing scheme;the automatic identification and correction to the negative output of the GCNN-GRU-E-C not only ensures the rationality of prediction results,improves prediction accuracy,but also ensures the correct formula-tion of subsequent dynamic rebalancing plans.Further research implication is that data sources affect data quality.For example,the transaction data of a PBS is generally free of noise,but the data scraped by a crawler may be noisy.The external environment affects the fluctuation of user demand.When a PBS has no competitors,the fluctuation of user demand is usually small,but it will usually be larger when the PBS faces competitors.Therefore,the need for preprocessing before using a dataset should be considered on a case-by-case basis.A dataset with too much noise or the larger fluctuation of user demand significantly affects the prediction accuracy of a model.At this time,data noise reduction and smoothing schemes developed in this paper should be used to preprocess a dataset to reduce the impact of noise and the fluctuation of user demand on prediction accuracy.
public bikedemand predictionvariable environmentdata noisedemand fluctuationnegative output