Discussion about this post

User's avatar
Tomasz Elendt's avatar

Regarding sharding, I think that one reason people use them, concurrent retrieval and ranking, can be mitigated by using a concurrent searcher that uses parallel processing of segments. This parallelism of course is limited to the number of available CPU cores on a given node, but to most folks that should be enough. I'm not up to date and I don't know if Lucene has that already, but I remember AWS people discussed it few years ago. The problem with segments is that they are heavily unbalanced in size.

Another reason for sharding I think is indexing throughput. There, some form of partitioning of the input might be needed to some folks.

Amazing post!

Expand full comment
1 more comment...

No posts