feat: support segmented inverted index build and search#6305
feat: support segmented inverted index build and search#6305
Conversation
PR Review: feat: support segmented inverted index build and searchOverall this is a solid first slice for segmented FTS. The BM25 cross-segment scoring approach is correct (global corpus stats passed to per-segment search, top-k merge via min-heap). A few items worth addressing: P1: Duplicated scorer-merge logic (3x copy-paste)The pattern of merging let mut base_scorer = first_index.bm25_base_scorer(&tokens);
for index in indices.iter().skip(1) {
let segment_scorer = index.bm25_base_scorer(&tokens);
base_scorer.total_tokens += segment_scorer.total_tokens;
base_scorer.num_docs += segment_scorer.num_docs;
for (token, count) in segment_scorer.token_docs {
*base_scorer.token_docs.entry(token).or_insert(0) += count;
}
}The same top-k heap merge is also duplicated between P1:
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This PR teaches inverted/FTS indices to participate in the segment-based build workflow and to search across multiple committed segments with a shared BM25 scorer. It keeps the current on-disk inverted format intact while aligning FTS with the newer
execute_uncommitted() -> create_index_segment_builder() -> commit_existing_index_segments()path.This is the first vertical slice for segmented inverted indices: build, commit, and query now work end-to-end, and the follow-up work can focus on compaction and metadata acceleration instead of basic control-plane wiring.