Nvidia announces GH200, a chip designed for more cost-effective AI inferencing by LLMs.
Modern LLMs trained and operated by Microsoft, OpenAI, Google, etc. use Nvidia’s data center GPUs optimized for such workloads (most notably, the A1000). The company’s market valuation has shot through the roof banking on the need for LLM training while more companies are looking at developing their own chips, such as Microsoft reportedly partnering with AMD (Nvidia’s competitor) for a chip internally codenamed Athena.
On Tuesday, Nvidia announced a new chip specifically designed to run AI models. The new GH200 chip will be using the same GPU as the H100 (the highest-end AI chip from Nvidia) with 141GB of memory and a 72-core processor.
We’re giving this processor a boost. This processor is designed for the scale-out of the world’s data centers. You can take pretty much any large language model you want and put it in this and it will inference like crazy. The inference cost of large language models will drop significantly.
Jensen Huang, CEO, Nvidia
Larger AI models can be operated on the new GPU more easily owing to its higher memory capacity. This eliminates the need to invest in multiple hardware for data centers or internal use. The H100, in comparison, only has 80GB of memory.
Training an LLM is a one-time cost (or whenever the model needs to be updated). On the contrary, when the LLM produces output, it constantly needs to access the GPU to do its processing. This is called inference. The job of this new chip is to make inference cheaper and most cost-effective over the long run.
For even larger models, Nvidia is also planning to work on a system that will combine two GH200 chips.
Nvidia competitor AMD also announced its AI-capable chip called the MI300X with 192GB memory and similarly cost-effective AI inference. Notably, Google and Amazon are working on their own AI chip designs to reduce reliance on third-party providers and cut costs.