Page 45 - Red Hat PR REPORT

Page 45 - Red Hat PR REPORT - MAY-JUNE 2025

P. 45

5/26/25, 11:40 AM Latest News
scale, navigating underlying complexities with greater ease.

Red Hat directly addresses these challenges with Red Hat AI Inference Server — an open

inference solution engineered for high performance and equipped with leading model

compression and optimization tools. This innovation empowers organizations to fully tap

into the transformative power of gen AI by delivering dramatically more responsive user

experiences and unparalleled freedom in their choice of AI accelerators, models and IT
environments.

vLLM: Extending inference innovation
Red Hat AI Inference Server builds on the industry-leading vLLM project, which was

started by University of California, Berkeley in mid-2023. The community project delivers

high-throughput gen AI inference, support for large input context, multi-GPU model

acceleration, support for continuous batching and more.

vLLM’s broad support for publicly available models – coupled with its day zero integration

of leading frontier models including DeepSeek, Gemma, Llama, Llama Nemotron, Mistral,

Phi and others, as well as open, enterprise-grade reasoning models like Llama Nemotron

– positions it as a de facto standard for future AI inference innovation. Leading frontier

model providers are increasingly embracing vLLM, solidifying its critical role in shaping gen

AI’s future.
Introducing Red Hat AI Inference Server

Red Hat AI Inference Server packages the leading innovation of vLLM and forges it into
the enterprise-grade capabilities of Red Hat AI Inference Server. Red Hat AI Inference

Server is available as a standalone containerized offering or as part of both RHEL AI and

Red Hat OpenShift AI.

Across any deployment environment, Red Hat AI Inference Server provides users with a

hardened, supported distribution of vLLM, along with:

? Intelligent LLM compression tools for dramatically reducing the size of both foundational

and fine-tuned AI models, minimizing compute consumption while preserving and

potentially enhancing model accuracy.

https://www.arabbnews.com/english/Latest-News.asp?id=18378 2/6

40 41 42 43 44 45 46 47 48 49 50