Conversation
There was a problem hiding this comment.
2 issues found across 15 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="backend/onyx/context/search/pipeline.py">
<violation number="1" location="backend/onyx/context/search/pipeline.py:22">
P1: Importing strip_stopwords from onyx.natural_language_processing.english_stopwords will raise ModuleNotFoundError because that module/file is missing in the repo. Ensure the module is added or update the import to the correct existing stopwords module.</violation>
</file>
<file name="backend/onyx/context/search/federated/slack_search_utils.py">
<violation number="1" location="backend/onyx/context/search/federated/slack_search_utils.py:18">
P1: The new import references `onyx.natural_language_processing.english_stopwords`, but that module is not present in the repository. This will raise `ModuleNotFoundError` at import time. Add the missing module or update the import to an existing stopword provider.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| from onyx.db.models import User | ||
| from onyx.document_index.interfaces import DocumentIndex | ||
| from onyx.llm.interfaces import LLM | ||
| from onyx.natural_language_processing.english_stopwords import strip_stopwords |
There was a problem hiding this comment.
P1: Importing strip_stopwords from onyx.natural_language_processing.english_stopwords will raise ModuleNotFoundError because that module/file is missing in the repo. Ensure the module is added or update the import to the correct existing stopwords module.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/onyx/context/search/pipeline.py, line 22:
<comment>Importing strip_stopwords from onyx.natural_language_processing.english_stopwords will raise ModuleNotFoundError because that module/file is missing in the repo. Ensure the module is added or update the import to the correct existing stopwords module.</comment>
<file context>
@@ -19,6 +19,7 @@
from onyx.db.models import User
from onyx.document_index.interfaces import DocumentIndex
from onyx.llm.interfaces import LLM
+from onyx.natural_language_processing.english_stopwords import strip_stopwords
from onyx.secondary_llm_flows.source_filter import extract_source_filter
from onyx.secondary_llm_flows.time_filter import extract_time_filter
</file context>
| from onyx.llm.interfaces import LLM | ||
| from onyx.llm.models import UserMessage | ||
| from onyx.llm.utils import llm_response_to_string | ||
| from onyx.natural_language_processing.english_stopwords import ENGLISH_STOPWORDS_SET |
There was a problem hiding this comment.
P1: The new import references onyx.natural_language_processing.english_stopwords, but that module is not present in the repository. This will raise ModuleNotFoundError at import time. Add the missing module or update the import to an existing stopword provider.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/onyx/context/search/federated/slack_search_utils.py, line 18:
<comment>The new import references `onyx.natural_language_processing.english_stopwords`, but that module is not present in the repository. This will raise `ModuleNotFoundError` at import time. Add the missing module or update the import to an existing stopword provider.</comment>
<file context>
@@ -15,6 +15,7 @@
from onyx.llm.interfaces import LLM
from onyx.llm.models import UserMessage
from onyx.llm.utils import llm_response_to_string
+from onyx.natural_language_processing.english_stopwords import ENGLISH_STOPWORDS_SET
from onyx.onyxbot.slack.models import ChannelType
from onyx.prompts.federated_search import SLACK_DATE_EXTRACTION_PROMPT
</file context>
Greptile SummaryThis PR attempts to remove the NLTK dependency and replace stopword handling with a custom implementation, but the replacement module is missing from the PR, causing critical import errors. Key Changes
Critical IssuesThe PR imports from
This will cause immediate Root CauseThe PR description states "Remove NLTK and replace the stopword handling. It's the same list as nltk anyway." This suggests the author intended to include a new Confidence Score: 0/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant API as query_and_chat API
participant ProcessSearch as process_search_query
participant Pipeline as search_pipeline
participant StopWords as english_stopwords (MISSING)
participant Index as DocumentIndex
Client->>API: SendSearchQueryRequest(query, num_hits=50)
API->>ProcessSearch: stream_search_query(request)
alt Single Query
ProcessSearch->>ProcessSearch: _run_single_search(query, num_hits)
ProcessSearch->>Pipeline: search_pipeline(ChunkSearchRequest)
Pipeline->>StopWords: strip_stopwords(query)
Note over StopWords: ❌ MODULE MISSING<br/>ImportError at runtime
StopWords-->>Pipeline: query_keywords
Pipeline->>Index: search_chunks(query_keywords)
Index-->>Pipeline: chunks
Pipeline-->>ProcessSearch: chunks
else Multiple Queries
ProcessSearch->>ProcessSearch: Run searches in parallel
loop For each query
ProcessSearch->>Pipeline: search_pipeline(query, num_hits)
Pipeline->>StopWords: strip_stopwords(query)
Note over StopWords: ❌ MODULE MISSING<br/>ImportError at runtime
StopWords-->>Pipeline: query_keywords
Pipeline->>Index: search_chunks(query_keywords)
Index-->>Pipeline: chunks
end
ProcessSearch->>ProcessSearch: Merge with RRF
end
ProcessSearch->>ProcessSearch: merge_individual_chunks()
ProcessSearch->>ProcessSearch: Truncate to num_hits
ProcessSearch-->>API: sections
API-->>Client: Search results
|
| from onyx.llm.interfaces import LLM | ||
| from onyx.llm.models import UserMessage | ||
| from onyx.llm.utils import llm_response_to_string | ||
| from onyx.natural_language_processing.english_stopwords import ENGLISH_STOPWORDS_SET |
There was a problem hiding this comment.
syntax: The module onyx.natural_language_processing.english_stopwords doesn't exist in the codebase. This import will cause a runtime error.
Check that the english_stopwords.py file with ENGLISH_STOPWORDS_SET definition was included in this PR.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/onyx/context/search/federated/slack_search_utils.py
Line: 18:18
Comment:
**syntax:** The module `onyx.natural_language_processing.english_stopwords` doesn't exist in the codebase. This import will cause a runtime error.
Check that the `english_stopwords.py` file with `ENGLISH_STOPWORDS_SET` definition was included in this PR.
How can I resolve this? If you propose a fix, please make it concise.| from onyx.db.models import User | ||
| from onyx.document_index.interfaces import DocumentIndex | ||
| from onyx.llm.interfaces import LLM | ||
| from onyx.natural_language_processing.english_stopwords import strip_stopwords |
There was a problem hiding this comment.
syntax: The module onyx.natural_language_processing.english_stopwords doesn't exist in the codebase. This import will cause a runtime error.
Check that the english_stopwords.py file with strip_stopwords function was included in this PR.
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/onyx/context/search/pipeline.py
Line: 22:22
Comment:
**syntax:** The module `onyx.natural_language_processing.english_stopwords` doesn't exist in the codebase. This import will cause a runtime error.
Check that the `english_stopwords.py` file with `strip_stopwords` function was included in this PR.
How can I resolve this? If you propose a fix, please make it concise.| sections = merge_individual_chunks(chunks) | ||
|
|
||
| # Truncate to the requested number of hits | ||
| sections = sections[: request.num_hits] |
There was a problem hiding this comment.
logic: Slicing may fail if request.num_hits is None (from the model default). The slice sections[: None] returns all sections, but consider whether this is the intended behavior.
| sections = sections[: request.num_hits] | |
| sections = sections[: request.num_hits] if request.num_hits else sections |
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/ee/onyx/search/process_search_query.py
Line: 182:182
Comment:
**logic:** Slicing may fail if `request.num_hits` is `None` (from the model default). The slice `sections[: None]` returns all sections, but consider whether this is the intended behavior.
```suggestion
sections = sections[: request.num_hits] if request.num_hits else sections
```
How can I resolve this? If you propose a fix, please make it concise.
Description
Remove NLTK and replace the stopword handling. It's the same list as nltk anyway.
How Has This Been Tested?
Ran queries, works
Additional Options
Summary by cubic
Removed NLTK by replacing stopword handling and n-gram generation with lightweight local utilities, simplifying deployment with no behavior change. Added num_hits to cap search results and wired it through the search pipeline.
New Features
Refactors
Written for commit 01b2e1f. Summary will update on new commits.