Food Image Classification Based on Multi-Feature Fusion
With improvements in living standards,the demand for a healthy diet is increasing daily,and the problem of food image recognition has become an important research topic.Owing to the different processing and cooking methods of food,the shape and color of similar food vary,and different types of food may present similar visual characteristics.Hence,the recognition of food images is more challenging than general image recognition.To solve these problems,a multi-feature fusion food image classification network,MTFNet,is proposed.First,the R,G,and B color channel data of the image are fused with the texture features corresponding to the local binary mode as the input of the backbone Squeeze and Excite Network(SENet).A detail attention module is then proposed to mine the weights of each channel at different positions,which can enhance the local information of the feature map of each layer and improve its local representation ability.Subsequently,the self-attention mechanism is applied to calculate the self-attention weights between each channel of the feature map,which can mine the correlation between the feature maps and extract the global features of the image.Finally,the locally enhanced and global features are concatenated and fused to classify the images.The experimental results indicate that the Top-1 accuracy of the MTFNet model is improved by 0.44,1.01,and 0.66 percentage points on the ETH Food101,ChineseFoodNet,and ISIA Food-500 food image datasets,respectively,as compared with Multi-scale Jigsaw Reconstruction Network(MJR-Net),achieving the best recognition performance.
food image classificationLocal Binary Pattern(LBP)Squeeze and Excite Network(SENet)detail attentionself-attention