Exploring LLM latency
- Track:
- Data Engineering and MLOps
- Type:
- Poster
- Level:
- intermediate
- Duration:
- 60 minutes
Abstract
Which LLM provider should you choose based on latency?
As LLMs become integral to modern applications, speed can make or break your UX. This poster highlights the performance trade-offs of OpenAI, Anthropic, local models, and other providers.
I will show how different models and providers perform across key metrics, helping you determine which option delivers the best user experience. I will show not only the raw numbers but also how each metric influences user perception.
We will measure:
- Time to first token
- Time to last token
- Latency variability throughout the day
- How structured responses affect performance
We will evaluate:
- State-of-the-art models
- Open-source models hosted by cloud providers
- Local models you can run on your own infrastructure
What you will learn
- Which model to use based on its latency
- How does prompt caching affects performance
You’ll have a clearer view of which solution fits your needs and how to balance performance, costs, and practical considerations.