Page 41 - Red Hat PR REPORT - MAY-JUNE 2025
P. 41

5/27/25, 11:04 AM  Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server - Mi…
        Red Hat Unlocks Generative AI for Any Model and Any


        Accelerator Across the Hybrid Cloud with Red Hat AI

        Inference Server

        May 25, 2025                                                                                                14

        Red Hat AI Inference Server, powered by vLLM and enhanced with Neural Magic technologies, delivers faster, higher-performing and
        more cost-efficient AI inference across the hybrid cloud


                                                        Advertise here.



        BOSTON – RED HAT SUMMIT – MAY, 2025 — Red Hat, the world’s leading provider of open source solutions, announced Red Hat AI
        Inference Server, a significant step towards democratizing generative AI (gen AI) across the hybrid cloud. A new offering within Red Hat
        AI, the enterprise-grade inference server is born from the powerful vLLM community project and enhanced by Red Hat’s integration of
        Neural Magic technologies, offering greater speed, accelerator-efficiency and cost-effectiveness to help deliver Red Hat’s vision of running
        any gen AI model on any AI accelerator in any cloud environment. Whether deployed standalone or as an integrated component of Red
        Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, this breakthrough platform empowers organizations to more confidently
        deploy and scale gen AI in production.


        Inference is the critical execution engine of AI, where pre-trained models translate data into real-world impact. It’s the pivotal point of
        user interaction, demanding swift and accurate responses. As gen AI models explode in complexity and production deployments scale,
        inference can become a significant bottleneck, devouring hardware resources and threatening to cripple responsiveness and inflate
        operational costs. Robust inference servers are no longer a luxury, but a necessity for unlocking the true potential of AI at scale,
        navigating underlying complexities with greater ease.














        Red Hat directly addresses these challenges with Red Hat AI Inference Server — an open inference solution engineered for high
        performance and equipped with leading model compression and optimization tools. This innovation empowers organizations to fully tap
        into the transformative power of gen AI by delivering dramatically more responsive user experiences and unparalleled freedom in their
        choice of AI accelerators, models and IT environments.


        vLLM: Extending inference innovation:

        Red Hat AI Inference Server builds on the industry-leading vLLM project, which was started by University of California, Berkeley in mid-
        2023. The community project delivers high-throughput gen AI inference, support for large input context, multi-GPU model acceleration,
        support for continuous batching and more.

        vLLM’s broad support for publicly available models – coupled with its day zero integration of leading frontier models including DeepSeek,
        Gemma, Llama, Llama Nemotron, Mistral, Phi and others, as well as open, enterprise-grade reasoning models like Llama Nemotron –
        positions it as a de facto standard for future AI inference innovation. Leading frontier model providers are increasingly embracing vLLM,
        solidifying its critical role in shaping gen AI’s future.


        Introducing Red Hat AI Inference Server:

      https://mid-east.info/red-hat-unlocks-generative-ai-for-any-model-and-any-accelerator-across-the-hybrid-cloud-with-red-hat-ai-inference-server/  1/3
   36   37   38   39   40   41   42   43   44   45   46