It is necessary to use artificial intelligence accelerator NPU(neural processing unit)or GPGPU(general-purpose graphics processing unit)for acceleration,so as to realize the fast computation of artificial intelligence and high performance com-puting in different fields.Since the matrix operation is the core operation of artificial intelligence and high performance computing,an implementation scheme of resource-efficient matrix operation unit architecture is proposed.By expanding the number of multi-pliers and adders in each sub-unit of matrix arithmetic unit and broadcasting the input data to each sub-unit of matrix arithmetic unit by row and column,the acceleration of matrix arithmetic unit can be realized.By using the data sharing between PE matrix and adopting the new PE matrix interconnection scheme,the purpose of reducing bandwidth resources and increasing computing power can be achieved.In comparison with the existing implementation scheme of matrix operation of NPU or GPGPU,the pro-posed one can achieve the same computing power with fewer adders and registers,and can complete the acceleration of the same scale matrix operation with low clock latency and bandwidth consumption.