Optimization of Direct Convolution Algorithms on ARM Processors for Deep Learning Inference
In deep learning, convolutional layers typically bear the majority of the computational workload and are often the primary contributors to Traditional Art performance bottlenecks.The widely used convolution algorithm is based on the IM2COL transform to take advantage of the highly optimized GEMM (General Matrix Multiplication) kernel acceleration,