Our Migration from GPU to CPU Inferencing | Accubits Dev Blog
There is a growing demand for more sophisticated AI solutions in the market. Especially from government agencies, businesses, enterprises, and tech startups. Their requirements are simple in words- need enhanced computer vision(CV) capabilities, more accurate speech recognition or natural language processing(NLP), etc. If there is one motto commonly shared by all these buyers, it is to ‘never settle’, which is good, because the technology grows as the demand grows! At the same time, the volume of data generated in any sector is tremendously growing. The hardware requirements for building more sophisticated AI solutions are also getting more complex, so are the costs required to build it up. In this article, I try to explain my thought process of how I shifted from a GPU based architecture to a CPU based architecture for a computer vision solution.
Choosing the right hardware for deep learning tasks is a crucial decision. If you ask anyone working in deep learning, they will tell you right away that the Graphical Processing Unit (GPU) is the way to go. But if you ask me, I would say choosing the right hardware should be dependent on the task at hand and based on factors such as throughput requirements and cost. Choosing GPU for every computer vision application might be a bad idea. And I’ll tell you why!
Being a computer vision engineer at Accubits Technologies, one of my jobs is to architect and build computer vision solutions based on leading Original Equipment Manufacturer (OEM) supported frameworks such as TensorFlow, PyTorch, Caffe, and CNTK. Time to time, I deal with large computer vision requirements which include model training and further fine-tuning. I agree, it is widely accepted that for deep learning training GPUs should be used due to their significant speed and I even use them for training the model on a large scale. However, from my personal experience, I have found that in many cases GPUs face some bottlenecks. For instance, data transfer between CPU to GPU might be very costly.
Deep Learning models that come out of a GPU training session will have a large data footprint in the local file system. The amount of data we need to deal with is so huge that it sometimes cannot be accommodated in the main memory. I also learned that, in such cases, loading these huge model files alongside the data which resides in the virtual memory is very expensive. We have to compromise the overall system performance in order for the model to work, which is a huge drawback. Moreover, models from Tensorflow may not let us implement multiprocessing to our code, simply because, a single session of a deep learning model inferencing for computer vision can clog up all CPU threads leading to maximum system utilization. This also puts GPU under full load too.
A cheaper route to the same goal
What we need is a cheaper route to achieve the same goal. Here the goal is to produce accurate output, results from the computer vision solution without compromising on the quality of the output. GPU infrastructure is costly as well as it affects the overall system performance, something which is not entertained by the buyers, especially for POCs or demos. So, the challenge I faced is to find a lighter and more efficient architecture to run our CV use cases. What I observed is that tasks like inferencing are not as resource heavy as training the models and CPUs are more than enough to execute the inferencing tasks. This can lead to significant cost savings.
My search for an alternative solution for GPUs started with trying out several CPU based architectures and temporarily ended with the OpenVINO toolkit. Intel provides an array of highly scalable hardware and software solutions to meet the various power, performance and price requirements of any use cases and OpenVINO toolkit is one such solution. OpenVINO toolkit is an inference engine designed to accelerate AI inferencing on computers, servers, and embedded devices. It allows us to fast-track the development of high-performance computer vision applications.
When it comes to computer vision solutions, buyers demand fast access to data at the edge and you need to have the power to process and analyze bandwidth-intensive visual data and the agility and manageability to quickly turn this data into insights. Most of our customers believe in on-site deployment and edge computing which may optimize the bandwidth and real-time performance metrics. Also, as I said before, a GPU based Deep Learning solution is multiple times costlier than a CPU based solution. And customers aren’t ready to invest heavily in hardware requirements. In such situations, I recommend CPU-based edge computing.
After switching to CPU inferencing, the performance we got was very impressive. The model files have become less heavy than before and help to accelerate the memory intake process. Intel-powered CPU based inferencing remarks comparable results and near real-time performance. Today, AI architectures are more aligned to edge computing or edge-cloud computing. In such cases we need the solutions to be very light in terms of resource consumption and need them to be fast enough to process video feeds in real-time. By moving inference to edge with OpenVINO, my team and I were able to deploy power-efficient and smaller devices for on-site processing. We have powered many Deep Learning solutions that can be fit in a usual laptop which adheres to current CPU standards. These devices, due to their reduced footprint and weather-proof nature makes for a rugged edge platform.