Skip to content

chore(deps): upgrade numpy, unstructured, unstructured-client#7369

Merged
jmelahman merged 1 commit intomainfrom
jamison/upgrade-unstructured
Jan 12, 2026
Merged

chore(deps): upgrade numpy, unstructured, unstructured-client#7369
jmelahman merged 1 commit intomainfrom
jamison/upgrade-unstructured

Conversation

@jmelahman
Copy link
Copy Markdown
Contributor

@jmelahman jmelahman commented Jan 12, 2026

Description

These version upgrades are required to support later versions of python, chore(deps): remove requires-python < 3.13.

numpy: 1.26.4->2.4.1
unstructured: 0.15.1->0.18.27
unstructured-client: 0.25.4->0.42.6

How Has This Been Tested?

Captured by existing

Additional Options

  • Override Linear Check

Summary by cubic

Upgrade numpy to 2.4.1 and bump unstructured and unstructured-client to support Python 3.13. Updated the Unstructured client call to the new API and made element parsing safe when the response has no elements.

  • Dependencies

    • Upgraded: numpy 1.26.4 → 2.4.1; unstructured 0.15.1 → 0.18.27; unstructured-client 0.25.4 → 0.42.6
    • Notable transitive updates: unstructured adds html5lib/webencodings/python-oxmsg and swaps chardet for charset-normalizer; unstructured-client drops requests/deepdiff/jsonpath-python and adds httpcore/pydantic/aiofiles/cryptography
  • Refactors

    • Updated UnstructuredClient.general.partition to use request=... and default to [] when elements is None

Written for commit 8e0caa8. Summary will update on new commits.

@jmelahman jmelahman requested a review from a team as a code owner January 12, 2026 20:55
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 6 files

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 12, 2026

Greptile Overview

Greptile Summary

This PR upgrades three major dependencies to support Python 3.13 compatibility:

Dependency Changes:

  • numpy: 1.26.4 → 2.4.1 (major version bump from 1.x to 2.x)
  • unstructured: 0.15.1 → 0.18.27 (3 minor version bumps)
  • unstructured-client: 0.25.4 → 0.42.6 (17 minor version bumps)

Key Findings:

  1. NumPy 2.x Migration: The upgrade from NumPy 1.26.4 to 2.4.1 is a major version change. NumPy 2.0 introduced breaking changes including:

    • Removal of deprecated dtype aliases (np.bool, np.int, np.float, etc.)
    • Changes to numpy array behavior
    • The codebase has minimal numpy usage (primarily in backend/onyx/kg/ for clustering and embeddings, and in commented-out legacy code), which reduces risk
  2. Unstructured Library Updates: The unstructured libraries saw significant version jumps:

    • API usage appears compatible (dict_to_elements, PartitionRequest, PartitionParameters APIs are still in use)
    • Transitive dependencies changed: removed deepdiff, jsonpath-python, orderly-set, tabulate; added html5lib, python-oxmsg, webencodings
  3. Transitive Dependency Impact: The lock file shows numerous transitive dependency updates, including removal of some packages that were only needed by older versions

  4. Test Coverage: The PR description states "Captured by existing" tests, indicating reliance on the existing test suite to validate compatibility

Potential Risks:

  • NumPy 2.x is a major version change that could introduce subtle behavioral differences
  • The large version jump in unstructured-client (17 minor versions) spans many potential API changes
  • No explicit migration testing or compatibility verification appears documented beyond "existing tests"

Confidence Score: 4/5

  • This PR is reasonably safe to merge with thorough testing, but requires careful validation due to numpy's major version change
  • Score of 4 reflects that this is a well-structured dependency upgrade PR with minimal code changes, but the numpy 1.x→2.x major version bump introduces some risk. The codebase has limited numpy usage which reduces exposure, and the PR relies on existing test coverage. However, NumPy 2.0 introduced breaking changes that could cause subtle runtime issues not caught by all tests. The unstructured library upgrades span many versions but appear API-compatible based on code inspection.
  • Pay close attention to backend/requirements/default.txt for the transitive dependency changes, and ensure thorough testing of numpy-dependent code in backend/onyx/kg/clustering/ and backend/onyx/kg/utils/embeddings.py

Important Files Changed

File Analysis

Filename Score Overview
backend/requirements/default.txt 4/5 Updates numpy (1.26.4→2.4.1), unstructured (0.15.1→0.18.27), and unstructured-client (0.25.4→0.42.6) with transitive dependency changes
pyproject.toml 5/5 Updates numpy to 2.4.1, unstructured to 0.18.27, and unstructured-client to 0.42.6 in dependency specifications
uv.lock 5/5 Lock file updated with numpy 2.4.1, unstructured 0.18.27, and transitive dependency changes

Sequence Diagram

sequenceDiagram
    participant PR as PR #7369
    participant Deps as Dependency Update
    participant Numpy as numpy 1.26.4→2.4.1
    participant Unstructured as unstructured 0.15.1→0.18.27
    participant UnstructuredClient as unstructured-client 0.25.4→0.42.6
    participant Code as Codebase
    participant Tests as Existing Tests
    
    PR->>Deps: Upgrade dependencies
    Deps->>Numpy: Major version bump (1.x→2.x)
    Deps->>Unstructured: Version bump (3 minor releases)
    Deps->>UnstructuredClient: Version bump (17 minor releases)
    
    Note over Numpy: Breaking changes in 2.0+<br/>- Deprecated dtypes removed<br/>- API changes
    Note over Unstructured: API compatibility maintained<br/>dict_to_elements still available
    Note over UnstructuredClient: PartitionRequest API intact
    
    Numpy->>Code: Used in KG clustering/embeddings
    Unstructured->>Code: Used in file processing
    UnstructuredClient->>Code: Used in unstructured.py
    
    Code->>Tests: Validation via existing tests
    Tests-->>PR: Test coverage validates changes
Loading

@jmelahman jmelahman force-pushed the jamison/upgrade-unstructured branch from 4898630 to 72e53d7 Compare January 12, 2026 21:22
@jmelahman jmelahman force-pushed the jamison/upgrade-unstructured branch from 18d8eeb to 8e0caa8 Compare January 12, 2026 21:44
@jmelahman jmelahman added this pull request to the merge queue Jan 12, 2026
Merged via the queue into main with commit 157f672 Jan 12, 2026
74 checks passed
@jmelahman jmelahman deleted the jamison/upgrade-unstructured branch January 12, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants