Under very general assumptions,standard feedforward neural networks can approximate any con-tinuous or discontinuous function as long as the number of hidden elements in the hidden layer is large enough.Particularly,when deep learning methods are used to solve differential equations,the idea is to build a loss function,collect sample points and use the stochastic gradient descent method to train the neural network to approximate the solution of equation directly on the collected sample points,thus transform the problem of solving equation into the optimization problem of minimizing loss function.When the time-fractional diffusion equations are solved by a deep learning method,the loss function measures the approximation degree be-tween the neural network and the fractional differential operator,initial value conditions,boundary condition,etc.Theoretically,the very neural network reducing the loss function to zero is a solution of equation.In this paper,we show that the loss function in the form of mean square error can reduce to zero and the correspond-ing neural network converges uniformly to the exact solution,that is,the neural network is a solution of equa-tion.Numerical examples verify the theoretical analysis.