Skip to content

feat(filesys): data models and migration#7402

Merged
evan-onyx merged 15 commits intomainfrom
feat/file-struct1
Jan 28, 2026
Merged

feat(filesys): data models and migration#7402
evan-onyx merged 15 commits intomainfrom
feat/file-struct1

Conversation

@evan-onyx
Copy link
Copy Markdown
Contributor

@evan-onyx evan-onyx commented Jan 14, 2026

Description

Addresses https://linear.app/onyx-app/issue/ENG-3415/migrations-and-data-models

data models and migration for our initial filesys implementation

How Has This Been Tested?

tested that upgrade and downgrade work

Additional Options

  • [Optional] Override Linear Check

Summary by cubic

Adds hierarchy data models and a migration to represent source folder/space trees, link documents to their parent nodes, and let personas attach to hierarchy nodes and individual documents for scoped search. Addresses Linear ENG-3415; also tracks hierarchy fetch attempts and stores the last fetch time on connector credential pairs.

  • New Features

    • HierarchyNode model with node_type, parent/child tree, optional link, and optional document_id for nodes that are documents.
    • HierarchyNode now includes permission fields (external_user_emails, external_user_group_ids, is_public) matching Document.
    • Document now has parent_hierarchy_node_id and relationships to its parent node and to a node when the document itself is a hierarchy item.
    • HierarchyFetchAttempt model to track status, counts, errors, and timestamps per connector credential pair.
    • Persona__HierarchyNode association to link personas to hierarchy nodes; relationships added on Persona and HierarchyNode.
    • Persona__Document association to link personas to individual documents; relationships added on Persona and Document.
    • ConnectorCredentialPair now stores last_time_hierarchy_fetch.
  • Migration

    • Create hierarchy_node and hierarchy_fetch_attempt tables with FKs and indexes.
    • Seed SOURCE nodes for all DocumentSource values.
    • Backfill document.parent_hierarchy_node_id to the matching SOURCE node based on connector source (deterministic via MIN connector_id).
    • Create persona__hierarchy_node table with FKs and indexes.
    • Create persona__document table with FKs and indexes.
    • Add last_time_hierarchy_fetch column to connector_credential_pair.
    • Add partial unique index to enforce a single SOURCE node per source.

Written for commit 9929509. Summary will update on new commits.

@evan-onyx evan-onyx requested a review from a team as a code owner January 14, 2026 02:20
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 14, 2026

Greptile Summary

This PR introduces the foundational data models and database migration for a hierarchical file system implementation. It adds two new tables (hierarchy_node and hierarchy_fetch_attempt) to represent structural organization of documents across different sources like Google Drive, Confluence, Slack, etc.

Key changes:

  • Created HierarchyNode model to represent folders, spaces, channels, and other structural containers
  • Created HierarchyFetchAttempt model to track hierarchy sync operations similar to existing index attempts
  • Added HierarchyNodeType enum with source-specific types (folders, spaces, projects, channels, drives, etc.)
  • Extended Document model with parent_hierarchy_node_id to link documents to their containing hierarchy node
  • Migration automatically creates SOURCE-type root nodes for all existing document sources and backfills existing documents to point to their respective source root nodes
  • Proper bidirectional relationships established between documents and hierarchy nodes

The implementation is lightweight and focused on hierarchy structure only, keeping permissions and sync logic on the Document model. The migration includes both upgrade and downgrade paths with proper index cleanup.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The migration is well-structured with proper foreign keys, indexes, and constraints. The upgrade creates tables in the correct order (hierarchy_node before document column addition), backfills existing data deterministically, and the downgrade properly cleans up in reverse order. The models follow existing SQLAlchemy patterns in the codebase with appropriate type hints and relationships. Author confirmed testing of both upgrade and downgrade paths.
  • No files require special attention

Important Files Changed

Filename Overview
backend/alembic/versions/81c22b1e2e78_hierarchy_nodes_v1.py Added database migration to create hierarchy_node and hierarchy_fetch_attempt tables with proper indexes and relationships, plus data backfill for existing documents
backend/onyx/db/enums.py Added HierarchyNodeType enum with types for different source hierarchies (folders, spaces, projects, channels, etc.)
backend/onyx/db/models.py Added HierarchyNode and HierarchyFetchAttempt models with bidirectional relationships to Document table

Sequence Diagram

sequenceDiagram
    participant Migration as Alembic Migration
    participant DB as PostgreSQL Database
    participant HierarchyNode as hierarchy_node table
    participant Document as document table
    participant HierarchyFetch as hierarchy_fetch_attempt table

    Note over Migration,DB: Migration Upgrade Process

    Migration->>DB: Create hierarchy_node table
    DB-->>Migration: Table created with columns:<br/>id, raw_node_id, display_name,<br/>link, source, node_type,<br/>document_id, parent_id
    
    Migration->>DB: Add indexes on hierarchy_node
    DB-->>Migration: Created ix_hierarchy_node_parent_id<br/>Created ix_hierarchy_node_source_type<br/>Created uq_hierarchy_node_raw_id_source

    Migration->>DB: Create hierarchy_fetch_attempt table
    DB-->>Migration: Table created with columns:<br/>id, connector_credential_pair_id,<br/>status, nodes_fetched, nodes_updated,<br/>timestamps

    Migration->>DB: Add indexes on hierarchy_fetch_attempt
    DB-->>Migration: Created ix_hierarchy_fetch_attempt_status<br/>Created ix_hierarchy_fetch_attempt_time_created<br/>Created ix_hierarchy_fetch_attempt_cc_pair

    loop For each DocumentSource
        Migration->>HierarchyNode: Insert SOURCE-type node
        Note right of HierarchyNode: raw_node_id = source value<br/>display_name from lookup dict<br/>node_type = 'source'<br/>parent_id = NULL
    end

    Migration->>Document: Add parent_hierarchy_node_id column
    Document-->>Migration: Column added (nullable)

    Migration->>DB: Create foreign key constraint
    DB-->>Migration: fk_document_parent_hierarchy_node created

    Migration->>DB: Create index on document.parent_hierarchy_node_id
    DB-->>Migration: ix_document_parent_hierarchy_node_id created

    Migration->>DB: Execute UPDATE query to backfill documents
    Note over DB: Query joins document_by_connector_credential_pair<br/>with connector to get source,<br/>then joins hierarchy_node to set parent
    DB-->>Migration: All existing documents linked to SOURCE nodes

    Note over Migration,DB: Migration Complete - Ready for hierarchy data
Loading

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/alembic/versions/81c22b1e2e78_hierarchy_nodes_v1.py">

<violation number="1" location="backend/alembic/versions/81c22b1e2e78_hierarchy_nodes_v1.py:231">
P2: The index `ix_persona__hierarchy_node_persona_id` is redundant. The composite primary key `(persona_id, hierarchy_node_id)` already creates an index that efficiently supports lookups by `persona_id` alone. This extra index wastes disk space and adds write overhead. Consider removing this index (keep only the `hierarchy_node_id` index, which is necessary).</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@evan-onyx evan-onyx force-pushed the feat/file-struct1 branch 10 times, most recently from a8d7850 to b6d0627 Compare January 23, 2026 23:52
@evan-onyx evan-onyx force-pushed the feat/file-struct1 branch 4 times, most recently from a8a8894 to 290e9e3 Compare January 27, 2026 05:16
@evan-onyx evan-onyx added this pull request to the merge queue Jan 28, 2026
Merged via the queue into main with commit c2b11ca Jan 28, 2026
78 of 80 checks passed
@evan-onyx evan-onyx deleted the feat/file-struct1 branch January 28, 2026 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants