Join three industry experts as they reveal 2025 data and AI trends | Jan 21

Register now

Drive AI development and deployment while safeguarding all stages of the AI lifecycle.

Powered by NVIDIA NIM microservices, the Cloudera AI Inference service delivers market-leading performance—delivering up to 36x faster inference on NVIDIA GPUs and nearly 4x the throughput on CPUs—streamlining AI management and governance seamlessly across public and private clouds.

AI Inference service diagram

One service for all your enterprise AI inference needs

One-click deployment: Move your model from development to production quickly, regardless of  environment.

One secured environment: Get robust end-to-end security covering all stages of your AI lifecycle.

One platform: Seamlessly manage all of your models through a single platform that handles all your AI needs.

One-stop support: Receive unified support from Cloudera for all your hardware and software questions.

AI Inference service key features

Hybrid and multi-cloud support

Enable deployment across on-premises*, public cloud, and hybrid environments to flexibly meet diverse enterprise infrastructure needs.

Detailed data & model lineage*

Provide comprehensive tracking and documentation of data transformations and model lifecycle events, enhancing reproducibility and auditability.

Enterprise-grade security

Implement robust security measures, including authentication, authorization*, and data encryption, to ensure data and models are protected in motion and at rest.

Real-time inference capabilities

Get real-time predictions with low latency and batch processing for larger datasets, ensuring flexibility in serving AI models based on different performance metrics.

High availability & dynamic scaling

Efficiently handle varying loads while ensuring continuous service with high availability configurations and dynamic scaling capabilities.

Flexible integration

Easily integrate existing workflows and applications with Open Inference Protocol APIs for traditional ML models and an OpenAI-compatible API for LLMs.

Support for multiple AI frameworks

Easily deploy a wide variety of model types with the integration of popular ML frameworks such as TensorFlow, PyTorch, Scikit-learn, and Hugging Face Transformers.

Advanced deployment patterns

Safely and incrementally roll out new versions of models with sophisticated deployment strategies like canary and blue-green deployments* as well as A/B testing*.

Open APIs

Deploy, manage, and monitor online models and applications* and facilitate integration with CI/CD pipelines and other MLOps tools thanks to compliance with open standards.

Business monitoring*

Continuously monitor GenAI modeI metrics like sentiment, user feedback, and drift that are crucial for maintaining model quality and performance.

* Feature coming soon. Please contact us for more information.

AI Inference service key features

Hybrid and multi-cloud support

Enable deployment across on-premises*, public cloud, and hybrid environments to flexibly meet diverse enterprise infrastructure needs.

Detailed data & model lineage*

Provide comprehensive tracking and documentation of data transformations and model lifecycle events, enhancing reproducibility and auditability.

Enterprise-grade security

Implement robust security measures, including authentication, authorization*, and data encryption, to ensure data and models are protected in motion and at rest.

Real-time inference capabilities

Get real-time predictions with low latency and batch processing for larger datasets, ensuring flexibility in serving AI models based on different performance metrics.

High availability & dynamic scaling

Efficiently handle varying loads while ensuring continuous service with high availability configurations and dynamic scaling capabilities.

Flexible integration

Easily integrate existing workflows and applications with Open Inference Protocol APIs for traditional ML models and an OpenAI-compatible API for LLMs.

Support for multiple AI frameworks

Easily deploy a wide variety of model types with the integration of popular ML frameworks such as TensorFlow, PyTorch, Scikit-learn, and Hugging Face Transformers.

Advanced deployment patterns

Safely and gradually roll out new versions of models with sophisticated deployment strategies like canary and blue-green deployments* as well as A/B testing*.

Open APIs

Deploy, manage, and monitor models and applications*, facilitating integration with CI/CD pipelines and other MLOps tools with open standards-compliant APIs.

Business monitoring*

Continuously monitor key GenAI modeI metrics such as sentiment, user feedback, and drift that are crucial for maintaining model quality and performance.

* Feature coming soon. Please contact us for more information.

Demo

Experience effortless model deployment for yourself

See how easily you can deploy large language models with powerful Cloudera tools to manage large-scale AI applications effectively.

Model registry integration: Seamlessly access, store, version, and manage models through the centralized Cloudera AI Registry repository.

Easy configuration & deployment: Deploy models across cloud environments, set up endpoints, and adjust autoscaling for efficiency.

Performance monitoring: Troubleshoot and optimize based on key metrics such as latency, throughput, resource utilization, and model health.

Cloudera AI Inference lets you unlock data’s full potential at scale with NVIDIA’s AI expertise and safeguard it with enterprise-grade security features so you can confidently protect your data and run workloads on-prem or in the cloud while deploying AI models efficiently with the necessary flexibility and governance.

—Sanjeev Mohan, Principal Analyst, SanjMo

headshot of Sanjeev Mohan

Get engaged

Webinar

Scaling generative AI with Cloudera and NVIDIA: Deploying LLMs with AI Inference

News

Cloudera Unveils AI Inference Service with Embedded NVIDIA NIM Microservices to Accelerate GenAI Development and Deployment

Documentation

Resources and guides to get you started

Cloudera AI Inference service documentation provides all the information you need from detailed feature descriptions to useful implementation guides so you can get started faster.

Ready to get started?
Let’s connect.

CloseX

Schedule a virtual demo

By registering or submitting your data, you acknowledge, understand, and agree to Cloudera's Terms and Conditions, including our Privacy Statement.
By checking this box, you consent to receive marketing and promotional communications about Cloudera’s products and services and/or related offerings from us, or sent on our behalf, in accordance with our Privacy Statement. You may withdraw your consent by using the unsubscribe or opt-out link in our communications.

Thanks for requesting a demo.


Our sales engineer will contact you soon to schedule the demo.

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.