Exploring LLM latency

Track:
Data Engineering and MLOps
Type:
Poster
Level:
intermediate
Duration:
60 minutes

Abstract

Which LLM provider should you choose based on latency?

As LLMs become integral to modern applications, speed can make or break your UX. This poster highlights the performance trade-offs of OpenAI, Anthropic, local models, and other providers.

I will show how different models and providers perform across key metrics, helping you determine which option delivers the best user experience. I will show not only the raw numbers but also how each metric influences user perception.

We will measure:

  • Time to first token
  • Time to last token
  • Latency variability throughout the day
  • How structured responses affect performance

We will evaluate:

  • State-of-the-art models
  • Open-source models hosted by cloud providers
  • Local models you can run on your own infrastructure

What you will learn

  • Which model to use based on its latency
  • How does prompt caching affects performance

You’ll have a clearer view of which solution fits your needs and how to balance performance, costs, and practical considerations.