Model serving refers to the deployment infrastructure that hosts trained AI models to provide predictions or inferences in real-time. It involves managing model availability, scalability, and latency to ensure efficient, reliable delivery of outputs for live applications, such as chatbots, recommendation systems, or autonomous vehicles. This component is critical for operational AI systems, enabling seamless integration into production environments.