4DevelopersWrocław2024: Karol Horosin -Serverless Deployment of Large Language Models on AWS Lambda

youtube.com 2 tygodni temu


📣 4Developers Katowice 2025 już 15 października! Śledź nas, by być na bieżąco.
🔗 Więcej informacji o konferencji: https://4developers.org.pl/4developers-katowice-2025/

Is AWS Lambda powerful adequate to host a language model?

The rise in popularity of AI and Large Language Models (LLMs) has brought fresh challenges and opportunities(!) in the DevOps field. This talk focuses on the applicable aspects of deploying LLMs in a serverless infrastructure, with a circumstantial emphasis on AWS Lambda.

While the concept of deploying an LLM on a CPU-only architecture may seem foreign, smaller (thus needing little infrastructure) language models trained on a large body of quality data now scope performance allowing them to be utilized where they are needed the most: in cases erstwhile data privacy doesn’t let to usage public services. If anything, this is besides a wonderful learning opportunity.

We'll cover a real-world implementation of deploying Microsoft's Phi-3 model, a tiny yet powerful LLM, utilizing serverless technologies. The session will supply an end-to-end overview of the process, from setting up the AWS environment and Docker containerization to optimizing for performance and managing costs in a serverless deployment. We will besides mention Huggingface resources and community and how to get into them.

This talk is tailored for DevOps professionals, developers and enthusiasts curious in the intersection of AI and serverless technologies. The talk aims to share cognition and experience on leveraging AWS Lambda for AI deployments, addressing common challenges, and discussing best practices for efficient and effective model deployment in a cloud environment. Getting started is hard, as was for me, join me to get a good and gentle intro to this topic.