Google announces the Cloud TPU v5p, its most powerful AI accelerator yet

Key Takeaways:

– Google has announced the launch of its new Gemini large language model (LLM).
– Alongside Gemini, Google has introduced the Cloud TPU v5p, an updated version of its Cloud TPU v5e.
– The v5p pod consists of 8,960 chips and features Google’s fastest interconnect yet.
– Google claims that the v5p chips are significantly faster than the v4 TPUs, with a 2x improvement in FLOPS and 3x improvement in high-bandwidth memory.
– The TPU v5p can train a large language model like GPT3-175B 2.8 times faster than the TPU v4.
– The TPU v5p offers more cost-effective performance compared to the v5e, but the v5e offers more relative performance per dollar.
– Google DeepMind and Google Research have observed 2x speedups for LLM training workloads using TPU v5p chips compared to the TPU v4.
– The TPU v5p is not yet generally available, and developers need to contact their Google account manager to get on the list.

TechCrunch:

Google today announced the launch of its new Gemini large language model (LLM) and with that, the company also launched its new Cloud TPU v5p, an updated version of its Cloud TPU v5e, which launched into general availability earlier this year. A v5p pod consists of a total of 8,960 chips and is backed by Google’s fastest interconnect yet, with up to 4,800 Gpbs per chip.

It’s no surprise that Google promises that these chips are significantly faster than the v4 TPUs. The team claims that the v5p features a 2x improvement in FLOPS and 3x improvement in high-bandwidth memory. That’s a bit like comparing the new Gemini model to the older OpenAI GPT 3.5 model, though. Google itself, after all, already moved the state of the art beyond the TPU v4. In many ways, though, v5e pods were a bit of a downgrade from the v4 pod, with only 256 v5e chips per pod vs 4096 in the v4 pods and a total of 197 TFLOPs 16-bit floating point performance per v5e chip vs 275 for the v4 chips. For the new v5p, Google promises up to 459 TFLOPs of 16-bit floating point performance, backed by the faster interconnect.

Image Credits: Google

Google says all of this means the TPU v5p can train a large language model like GPT3-175B 2.8 times faster than the TPU v4 — and do so more cost-effectively, too (though the TPU v5e, while slower, actually offers more relative performance per dollar than the v5p).

Image Credits: Google

In our early stage usage, Google DeepMind and Google Research have observed 2X speedups for LLM training workloads using TPU v5p chips compared to the performance on our TPU v4 generation,” writes Jeff Dean, Chief Scientist, Google DeepMind and Google Research. “The robust support for ML Frameworks (JAX, PyTorch, TensorFlow) and orchestration tools enables us to scale even more efficiently on v5p. With the 2nd generation of SparseCores we also see significant improvement in the performance of embeddings-heavy workloads. TPUs are vital to enabling our largest-scale research and engineering efforts on cutting edge models like Gemini.”

The new TPU v5p isn’t generally available yet, so developers will have to reach out to their Google account manager to get on the list.

Source link

AI Eclipse TLDR:

Google has announced the launch of its new Gemini large language model (LLM) and the Cloud TPU v5p. The TPU v5p is an updated version of the Cloud TPU v5e, offering significant improvements in performance. The v5p pod consists of 8,960 chips and features Google’s fastest interconnect, providing up to 4,800 Gpbs per chip. The v5p offers a 2x improvement in FLOPS and 3x improvement in high-bandwidth memory compared to the v4 TPUs. Google claims that the TPU v5p can train a large language model like GPT3-175B 2.8 times faster than the TPU v4, while being more cost-effective. The TPU v5p is not yet generally available, and developers will need to contact their Google account manager to access it.