Benchmarking read latency of AWS S3, S3 Express, EBS and Instance store
Slow S3 access latency is a well known fact, but how slow, exactly? We compared read latency across different storage systems in AWS and found out that water is wet, sky is blue, and S3 is slow.
I’m building Nixiesearch, a S3-based search engine - think Elasticsearch, but without the operational nightmares. I often end up arguing with strangers online that while “decoupling storage and compute” sounds amazing in theory, in practice it’s a massive pain in the ass for low-latency workloads.
Running Lucene on top of s3 works well at small scales. The latency in your posts is a bit optimistic. I had seen p99 S3 latencies on the order of low 100s of ms. At those latencies querying directly from s3 is not a winning strategy.
As of Nixiesearch 0.8, we use S3 only for good old segment replication:
The actual search happens locally over index cached on and EBS volume.
If your index is under ~1M documents, this approach is fine. I even ran a LinkedIn poll across my bubble and learned that such tiny indexes are rare in the real world.
The problem with this approach is that for big 100M indexes is that the initial index sync might take ages:
S3 throughput depends on your instance’s network interface, which is ~10 Gbit/s for most modern instance types.
A single document with 1024-dim embeddings and byte quantization is ~1 KB. So 100M documents produce a chunky 100 GB index.
Result: welcome to 100+ seconds of cold-start sync time.
“Why do you even care about initial sync?” you might ask. “Just run a fixed-size cluster 24/7 and forget about cold starts.” Because then you lose the ability to rapidly autoscale.
Elasticsearch is actually not that elastic.
In a perfect world you just search directly over S3 without doing initial sync, but only if you can eat the access latency.
So let’s benchmark and see how bad it is.
Methodology
We focus on four storage options:
Regular S3 buckets: regional, standard latency.
S3 Directory (S3 Express) buckets: much lower latency but restricted to a single AZ.
EBS gp3 volumes: typical EC2 block storage.
Instance store NVMe: ephemeral but theoretically much faster.
For benchmarking, I wrote a tiny s3bench tool that:
Uses JVM NIO and O_DIRECT for EBS/local reads to bypass filesystem caching.
Uses raw S3 REST
GetObjectcalls instead of the AWS SDK for more control over request behavior.
We run the benchmark on EC2 instances of various sizes within the same region/AZ.
The workload is going to mimic an access pattern of a search request: a lot of random reads of 4kb blocks grouped in small bursts:
Results
After running the benchmark over S3, S3 Express, EBS gp3 and instance store on a m5id.large instance, we got these numbers:
Key observations from a search-engine perspective:
Yes, running search directly on S3 is painfully slow thanks to enormous 100+ ms p99 tail latency.
S3 Express is much, much faster than standard S3, but still 5–10× slower than EBS.
But does read latency depends on instance size? Can we make it better with more expensive EC2 node?
Looks like a hard no. You only get better throughput thanks to a faster ENI, not better per-request latency.
Another interesting observation for S3 Express endpoints that there’s a clear warm-up trend in latency:
You can clearly see that the more requests you send, the better latency you get due to caching and autoscaling inside the S3 itself.
Final words
I still think it’s possible to run search over S3-hosted indexes within a reasonable latency budget:
In practice, a cold search request needs ~5–10 round trips to S3.
If each round trip is ~5 ms, you’re looking at ~50 ms overall latency - which is perfectly acceptable for many workloads.
Yes, S3 Express is more fragile from a reliability standpoint, and you’ll need to configure your indexer to publish to multiple S3 Express buckets to survive AZ outages.
But with the right setup, search-over-S3 can absolutely work.









This is a fantastic, data-driven breakdown of the practical realities of cloud storage latencies. You highlight that 'decoupling storage and compute' often introduces significant latency penalties that are glossed over in architectural diagrams. It's refreshing to see concrete benchmarks showing that S3's p99 latency makes it unsuitable for real-time query paths, despite the theoretical appeal of stateless architectures. While S3 Express One Zone attempts to bridge this gap, the cost implications often make it less attractive than simply using ephemeral instance stores for the hot path. In high-throughput scenarios, the network overhead of any remote storage becomes the bottlneck, pushing architectures back towards locality—essentially reinventing the 'coupling' we tried to avoid.