Page 53 - Red Hat PR REPORT

Page 53 - Red Hat PR REPORT - MAY-JUNE 2025

P. 53

5/26/25, 11:40 AM Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server - Mi…

vLLM: Extending inference innovation

Red Hat AI Inference Server builds on the industry-leading vLLM project, which was started by University of

California, Berkeley in mid-2023. The community project delivers high-throughput gen AI inference, support
for large input context, multi-GPU model acceleration, support for continuous batching and more.

vLLM’s broad support for publicly available models – coupled with its day zero integration of leading frontier
models including DeepSeek, Gemma, Llama, Llama Nemotron, Mistral, Phi and others, as well as open,
enterprise-grade reasoning models like Llama Nemotron – positions it as a de facto standard for future AI

inference innovation. Leading frontier model providers are increasingly embracing vLLM, solidifying its
critical role in shaping gen AI’s future.

Introducing Red Hat AI Inference Server

Red Hat AI Inference Server packages the leading innovation of vLLM and forges it into the enterprise-grade
capabilities of Red Hat AI Inference Server. Red Hat AI Inference Server is available as a standalone

containerized offering or as part of both RHEL AI and Red Hat OpenShift AI.

Across any deployment environment, Red Hat AI Inference Server provides users with a hardened,
supported distribution of vLLM, along with:

Intelligent LLM compression tools for dramatically reducing the size of both foundational and fine-
tuned AI models, minimizing compute consumption while preserving and potentially enhancing model
accuracy.

Optimized model repository, hosted in the Red Hat AI organization on Hugging Face, offers instant
access to a validated and optimized collection of leading AI models ready for inference deployment,
helping to accelerate efficiency by 2-4x without compromising model accuracy.

https://menews247.com/red-hat-unlocks-generative-ai-for-any-model-and-any-accelerator-across-the-hybrid-cloud-with-red-hat-ai-inference-server/ 2/3

48 49 50 51 52 53 54 55 56 57 58