José Domingos: DevOps for Large Language Models: Optimising Operations... | DOD Warsaw 2023

youtube.com 10 miesięcy temu


"DevOps for Large Language Models: Optimising Operations and Cost Efficiency"

Deploying large-scale language models (LLMs) like GPT-4 and LLaMA have proliferated in usage cases, powering usage cases like data analysis and client service, to content generation and fraud detection.

However, as these models proceed to grow in complexity and capability, so do the challenges associated with deploying and managing them effectively. If you request to deploy LLMs in production, then it is vital to realize how DevOps and Large Language Model Ops (LLMOps) intertwine. Essentially, LLMOps borrows key practices from the DevOps pipeline, streamlining the CI/CD process of the LLM improvement lifecycle. This leads to efficient deployment, monitoring and maintenance of LLMs.

In this talk, I’ll research the process of operationalizing LLMs and research the nuances of deploying them. We’ll focus on evaluation, fine-tuning, optimisation, rigorous investigating and deployment with emphasis on leveraging open origin options for cost-effectiveness and data privacy.

Each section of the presentation will be contextualised by real-world case studies, demonstrated on the Ori Global Cloud (OGC) Platform.

Here’s a brief outline of the talk structure:

Introduction
We’ll ask and answer the foundational question: "Why should DevOps care about LLMs and why are they important?" From there, we'll delve into why LLMOps has become vital to managing deployment, and showcase the benefits of open-source LLMs. We’ll besides discuss how LLMs integrate into existing DevOps workflows.

LLM Evaluation
We’ll proceed with how to measure different LLMs to check for performance and suitability—where participants will gain insights on criteria for evaluating, covering both performance benchmarking and suitability for domain circumstantial apps. This will be explained via a case survey utilizing benchmark suites for evaluation, specified as Open LLM Leaderboard and Bench and the metrics to look for during evaluation.

Fine-tuning the LLM
Model fine-tuning is key for model performance, which can be further enhanced with new, customized data in a more circumstantial domain. This process is crucial as it can wind up utilizing little data and less compute resources. We’ll dive into the techniques of fine-tuning the models and besides cover the risks of fine-tuning.

Optimising the LLM
Why does optimisation matter? How can we do it with cost-effective approaches? What is model pruning and quantisation? What tools are out there to optimise LLMs? We’ll see how SparseGPT can prune models.

Testing for Accuracy and Performance
In the age of LLMs, there’s a immense importance on experimentation, rigorous investigating and improvement of test branches. I’ll explain what metrics to look for in LLM accuracy and performance. I’ll besides cover continuous investigating strategies.

Deploying and Monitoring
We’ll go through the different LLM deployment strategies and approaches to continuous monitoring: lowering latency and cost, while boosting performance. This will besides cover the best practices for monitoring and how can we guarantee resiliency by self-healing, thereby minimising the impact of issues on a moving deployment.

DevOpsDays Warsaw: https://devopsdays.pl/