Exploring LLM latency

Track:: Data Engineering and MLOps
Type:: Poster
Level:: intermediate
Room:: Exhibit Hall
Start:: 12:50 on 16 July 2025
End:: 13:50 on 16 July 2025
Duration:: 60 minutes

Abstract

Which LLM provider should you choose based on latency?

As LLMs become integral to modern applications, speed can make or break your UX. This poster highlights the performance trade-offs of OpenAI, Anthropic, local models, and other providers.

I will show how different models and providers perform across key metrics, helping you determine which option delivers the best user experience. I will show not only the raw numbers but also how each metric influences user perception.

We will measure:

Time to first token
Time to last token
Latency variability throughout the day
How structured responses affect performance

We will evaluate:

State-of-the-art models
Open-source models hosted by cloud providers
Local models you can run on your own infrastructure

What you will learn

Which model to use based on its latency
How does prompt caching affects performance

You’ll have a clearer view of which solution fits your needs and how to balance performance, costs, and practical considerations.