Study on Machine Learning Cloud Detection Considering Optimal Selection of Samples
Aiming at the problem that the traditional threshold algorithm have low accuracy of cloud detection due to spectral differences caused by characteristic differences such as cloud diurnal variation,cloud type,cloud phase state,and cloud optical thickness,This paper proposes a cloud detection algorithm model that takes into account optimal selection of samples,coupled with the physical threshold method and machine learning,and uses the data of Himawari-8 for daytime cloud detection.Through sample optimization selection,the samples include cloud features in different situations as much as possible,providing a good sample basis for the machine learning model and increasing the model generalization ability.At the same time,in addition to considering factors such as albedo,brightness temperature,brightness temperature difference,and zenith angle,the input features also add cloud recognition results based on the physical threshold method based on albedo and brightness temperature difference.And cloud detection is carried out based on the Extremely randomized trees(ET)model.The results show that cloud detection cross-validation accuracy of the model is 96.41%,with the total omission error of 2.08%and total commission error of 0.91%,respectively.The results are compared with the product data based on CALIPSO with an overall detection accuracy of 97.1%.