Welcome to the first installment of a series of posts discussing the recently announced Cloudera AI Inference service.
Today, Artificial Intelligence (AI) and Machine Learning (ML) are more crucial than ever for organizations to turn data into a competitive advantage. To unlock the full potential of AI, however, businesses need to deploy models and AI applications at scale, in real-time, and with low latency and high throughput. This is where the Cloudera AI Inference service comes in. It is a powerful deployment environment that enables you to integrate and deploy generative AI (GenAI) and predictive models into your production environments, incorporating Cloudera’s enterprise-grade security, privacy, and data governance.
Over the next several weeks, we’ll explore the Cloudera AI Inference service in-depth, providing you with a comprehensive introduction to its capabilities, benefits, and use cases.
In this series, we’ll delve into topics such as:
If you’re interested in unlocking the full potential of AI and ML in your organization, stay tuned for our next posts, where we’ll dig deeper into the world of Cloudera AI Inference.
The Cloudera AI Inference service is a highly scalable, secure, and high-performance deployment environment for serving production AI models and related applications. The service is targeted at the production-serving end of the MLOPs/LLMOPs pipeline, as shown in the following diagram:
It complements Cloudera AI Workbench (previously known as Cloudera Machine Learning Workspace), a deployment environment that is more focused on the exploration, development, and testing phases of the MLOPs workflow.
The emergence of GenAI, sparked by the release of ChatGPT, has facilitated the broad availability of high-quality, open-source large language models (LLMs). Services like Hugging Face and the ONNX Model Zoo made it easy to access a wide range of pre-trained models. This availability highlights the need for a robust service that enables customers to seamlessly integrate and deploy pre-trained models from various sources into production environments. To meet the needs of our customers, the service must be highly:
These and other considerations led us to create the Cloudera AI Inference service as a new, purpose-built service for hosting all production AI models and related applications. It is ideal for deploying always-on AI models and applications that serve business-critical use cases.
The diagram above shows a high-level architecture of Cloudera AI Inference service in context:
In this first post, we introduced the Cloudera AI Inference service, explained why we built it, and took a high-level tour of its architecture. We also outlined many of its capabilities. We will dive deeper into the architecture in our next post, so please stay tuned.
This may have been caused by one of the following: