Page 26 - Red Hat PR REPORT - OCTOBER 2025
P. 26
Press Release
netes Gateway API Inference Extension, the NVIDIA Dynamo low latency data transfer library (NIXL),
and the DeepEP Mixture of Experts (MoE) communication library, allowing organizations to:
• Lower costs and improve response times with intelligent inference-aware model scheduling
and disaggregated serving.
• Improve response times and latency with an intelligent, inference-aware load balancer built
to handle the variable nature of AI workloads.
• Deliver operational simplicity and maximum reliability with prescriptive "Well-lit Paths" that
streamline the deployment and optimization of massive models at scale.
llm-d builds on vLLM, evolving it from a single-node, high-performance inference engine to a distributed,
consistent and scalable serving system, tightly integrated with Kubernetes, and designed for enabling
predictable performance, measurable ROI and effective infrastructure planning. All enhancements
directly address the challenges of handling highly variable LLM workloads and serving massive models like
Mixture-of-Experts (MoE) models.
A unified platform for collaborative AI
Red Hat AI 3 delivers a unified, flexible experience tailored to the collaborative demands of building
production-ready generative AI solutions. It is designed to deliver tangible value by fostering collaboration
and unifying workflows across teams through a single platform for both platform engineers and AI
engineers to execute on their AI strategy. New capabilities focused on providing the productivity and
efficiency needed to scale from proof-of-concept to production include:
• Model as a Service (MaaS) capabilities build on distributed inference and enable IT teams
to act as their own MaaS providers, serving common models centrally and delivering on-de-
mand access for both AI developers and AI applications. This allows for better cost manage-
ment and supports use cases that cannot run on public AI services due to privacy or data
concerns.
• AI hub empowers platform engineers to explore, deploy and manage foundational AI assets.
It provides a central hub with a curated catalog of models, including validated and optimized
gen AI models, a registry to manage the lifecycle of models and a deployment environment to
configure and monitor all AI assets running on OpenShift AI.
• Gen AI studio provides a hands-on environment for AI engineers to interact with models and
rapidly prototype new gen AI applications. With the AI assets endpoint feature, engineers can
easily discover and consume available models and MCP servers, which are designed to
streamline how models interact with external tools. The built-in playground provides an in-
teractive, stateless environment to experiment with models, test prompts and tune parame-
ters for use cases like chat and retrieval-augmented generation (RAG).
• New Red Hat validated and optimized models are included to simplify development. The
curated selection includes popular open-source models like OpenAI’s gpt-oss, DeepSeek-R1,
and specialized models such as Whisper for speech-to-text and Voxtral Mini for voice-enabled
agents.

