Page 45 - Red Hat PR REPORT - MAY-JUNE 2025
P. 45

5/26/25, 11:40 AM                                            Latest News
        scale, navigating underlying complexities with greater ease.




        Red Hat directly addresses these challenges with Red Hat AI Inference Server — an open

        inference solution engineered for high performance and equipped with leading model

        compression and optimization tools. This innovation empowers organizations to fully tap

        into the transformative power of gen AI by delivering dramatically more responsive user

        experiences and unparalleled freedom in their choice of AI accelerators, models and IT
        environments.


        vLLM: Extending inference innovation
        Red Hat AI Inference Server builds on the industry-leading vLLM project, which was

        started by University of California, Berkeley in mid-2023. The community project delivers

        high-throughput gen AI inference, support for large input context, multi-GPU model

        acceleration, support for continuous batching and more.




        vLLM’s broad support for publicly available models – coupled with its day zero integration

        of leading frontier models including DeepSeek, Gemma, Llama, Llama Nemotron, Mistral,

        Phi and others, as well as open, enterprise-grade reasoning models like Llama Nemotron

        – positions it as a de facto standard for future AI inference innovation. Leading frontier

        model providers are increasingly embracing vLLM, solidifying its critical role in shaping gen

        AI’s future.
        Introducing Red Hat AI Inference Server


        Red Hat AI Inference Server packages the leading innovation of vLLM and forges it into
        the enterprise-grade capabilities of Red Hat AI Inference Server. Red Hat AI Inference

        Server is available as a standalone containerized offering or as part of both RHEL AI and

        Red Hat OpenShift AI.




        Across any deployment environment, Red Hat AI Inference Server provides users with a

        hardened, supported distribution of vLLM, along with:

        ? Intelligent LLM compression tools for dramatically reducing the size of both foundational

        and fine-tuned AI models, minimizing compute consumption while preserving and

        potentially enhancing model accuracy.


      https://www.arabbnews.com/english/Latest-News.asp?id=18378                                                    2/6
   40   41   42   43   44   45   46   47   48   49   50