Page 26 - Red Hat PR REPORT - OCTOBER 2025
P. 26

Press Release



               netes Gateway API Inference Extension, the NVIDIA Dynamo low latency data transfer library (NIXL),
               and the DeepEP Mixture of Experts (MoE) communication library, allowing organizations to:

                       •  Lower costs and improve response times with intelligent inference-aware model scheduling
                          and disaggregated serving.
                       •  Improve response times and latency with an intelligent, inference-aware load balancer built
                          to handle the variable nature of AI workloads.
                       •  Deliver operational simplicity and maximum reliability with prescriptive "Well-lit Paths" that
                          streamline the deployment and optimization of massive models at scale.

               llm-d builds on vLLM, evolving it from a single-node, high-performance inference engine to a distributed,
               consistent and scalable serving system, tightly integrated with Kubernetes, and designed for enabling
               predictable performance, measurable ROI  and effective infrastructure planning. All enhancements
               directly address the challenges of handling highly variable LLM workloads and serving massive models like
               Mixture-of-Experts (MoE) models.

               A unified platform for collaborative AI

               Red Hat AI 3 delivers a unified, flexible experience tailored to the collaborative demands of building
               production-ready generative AI solutions. It is designed to deliver tangible value by fostering collaboration
               and unifying workflows across teams through  a single platform for both platform engineers and AI
               engineers to execute on their AI strategy. New capabilities focused on providing the productivity and
               efficiency needed to scale from proof-of-concept to production include:


                   •  Model as a Service (MaaS) capabilities build on distributed inference and enable IT teams
                       to act as their own MaaS providers, serving common models centrally and delivering on-de-
                       mand access for both AI developers and AI applications. This allows for better cost manage-
                       ment and supports use cases that cannot run on public AI services due to privacy or data
                       concerns.
                   •  AI hub empowers platform engineers to explore, deploy and manage foundational AI assets.
                       It provides a central hub with a curated catalog of models, including validated and optimized
                       gen AI models, a registry to manage the lifecycle of models and a deployment environment to
                       configure and monitor all AI assets running on OpenShift AI.
                   •  Gen AI studio provides a hands-on environment for AI engineers to interact with models and
                       rapidly prototype new gen AI applications. With the AI assets endpoint feature, engineers can
                       easily discover and consume available models and MCP servers, which are designed to
                       streamline how models interact with external tools. The built-in playground provides an in-
                       teractive, stateless environment to experiment with models, test prompts and tune parame-
                       ters for use cases like chat and retrieval-augmented generation (RAG).
                   •  New Red Hat validated and optimized models are included to simplify development. The
                       curated selection includes popular open-source models like OpenAI’s gpt-oss, DeepSeek-R1,
                       and specialized models such as Whisper for speech-to-text and Voxtral Mini for voice-enabled
                       agents.
   21   22   23   24   25   26   27   28   29   30   31