Tuning the Performance of Convolutional Neural Network for Image Classification on GPU Agenda • Adoptions of Image classification or image recognition at Alibaba • Easy ways to improve performance of Caffe • Further performance optimization of convolution layer • Ongoing works 2 Confidential & Proprietary Image classification at Alibaba • Product Display Classification Model-Upper/Item-Bottom/Multi-Object • Fashion Style Classification • Buy-by-photo mobile Sweet / Street / Office app, search for visually similar products by images • Leverage Caffe framework Confidential & Proprietary Profiling Caffe • Most expensive part Caffe spends more than 70% of time on Convolution layers ! 4 Confidential & Proprietary Convolution layer • How does the convolution layer work in Caffe Image to Column SGemm Confidential & Proprietary The gap • Is it really fast? Blue: Caffe(imagenet model) Red: Sgemm routine of Cublas Green: Peak of K20 ImageNet model, refer to the ILSVRC12 challenge Confidential & Proprietary How does Cublas Sgemm perform 7 Confidential & Proprietary Easiest way to narrow the gap • To Overcome the low efficient of SGEMM at small scale Processing one batch Processing one batch Image to Column Image to Column Single image Batch-coalesced images every every loop loop Gemm Gemm 8 Confidential & Proprietary Performance of Fast mode • Titan black, mini-batch size is 256 9 Confidential & Proprietary Moving forward • How is cublas sgemm implemented Confidential & Proprietary
Description: