Page 45 - Red Hat PR REPORT - MAY-JUNE 2025
P. 45
5/26/25, 11:40 AM Latest News
scale, navigating underlying complexities with greater ease.
Red Hat directly addresses these challenges with Red Hat AI Inference Server — an open
inference solution engineered for high performance and equipped with leading model
compression and optimization tools. This innovation empowers organizations to fully tap
into the transformative power of gen AI by delivering dramatically more responsive user
experiences and unparalleled freedom in their choice of AI accelerators, models and IT
environments.
vLLM: Extending inference innovation
Red Hat AI Inference Server builds on the industry-leading vLLM project, which was
started by University of California, Berkeley in mid-2023. The community project delivers
high-throughput gen AI inference, support for large input context, multi-GPU model
acceleration, support for continuous batching and more.
vLLM’s broad support for publicly available models – coupled with its day zero integration
of leading frontier models including DeepSeek, Gemma, Llama, Llama Nemotron, Mistral,
Phi and others, as well as open, enterprise-grade reasoning models like Llama Nemotron
– positions it as a de facto standard for future AI inference innovation. Leading frontier
model providers are increasingly embracing vLLM, solidifying its critical role in shaping gen
AI’s future.
Introducing Red Hat AI Inference Server
Red Hat AI Inference Server packages the leading innovation of vLLM and forges it into
the enterprise-grade capabilities of Red Hat AI Inference Server. Red Hat AI Inference
Server is available as a standalone containerized offering or as part of both RHEL AI and
Red Hat OpenShift AI.
Across any deployment environment, Red Hat AI Inference Server provides users with a
hardened, supported distribution of vLLM, along with:
? Intelligent LLM compression tools for dramatically reducing the size of both foundational
and fine-tuned AI models, minimizing compute consumption while preserving and
potentially enhancing model accuracy.
https://www.arabbnews.com/english/Latest-News.asp?id=18378 2/6

