In this post, we do a cost benefit analysis comparing an Exxact Deep Learning Inference Server with leading cloud instance from Google GCP. When deploying deep learning models at scale, using GPUs can gain efficiencies in terms of power and performance over CPUs. When selecting your infrastructure, we’ve seen that significant savings can be met bringing your deep learning on premises, (see our blog article here) but what about deploying inference models at scale using T4 Inference servers?
Before we go in depth with the analysis, we must look at the NVIDIA T4 GPU. This is the latest server card for deep learning, multi-precision computing for AI powers breakthrough performance from FP32 to FP16 to INT8, as well as INT4 precision. This flexibility in multi-precision performance lends the T4 to be an excellent choice for inference tasks.
Google T4 Cloud Instance
Currently Google Colab is offering free access to a T4. Anyone who has a google account, can access the T4 and we’d certainly encourage experimentation with these notebooks. Aside from this obvious gateway into GCP, you should consider On Premises when it comes to price, flexibility and hardware performance.
Exxact T4 Inference Server
If your application is scaling, the hardware costs of purchasing an Exxact Inference server makes sense with both performance and price.