feat(filesys): data models and migration by evan-onyx · Pull Request #7402 · onyx-dot-app/onyx

evan-onyx · 2026-01-14T02:20:06Z

Description

Addresses https://linear.app/onyx-app/issue/ENG-3415/migrations-and-data-models

data models and migration for our initial filesys implementation

How Has This Been Tested?

tested that upgrade and downgrade work

Additional Options

[Optional] Override Linear Check

Summary by cubic

Adds hierarchy data models and a migration to represent source folder/space trees, link documents to their parent nodes, and let personas attach to hierarchy nodes and individual documents for scoped search. Addresses Linear ENG-3415; also tracks hierarchy fetch attempts and stores the last fetch time on connector credential pairs.

New Features
- HierarchyNode model with node_type, parent/child tree, optional link, and optional document_id for nodes that are documents.
- HierarchyNode now includes permission fields (external_user_emails, external_user_group_ids, is_public) matching Document.
- Document now has parent_hierarchy_node_id and relationships to its parent node and to a node when the document itself is a hierarchy item.
- HierarchyFetchAttempt model to track status, counts, errors, and timestamps per connector credential pair.
- Persona__HierarchyNode association to link personas to hierarchy nodes; relationships added on Persona and HierarchyNode.
- Persona__Document association to link personas to individual documents; relationships added on Persona and Document.
- ConnectorCredentialPair now stores last_time_hierarchy_fetch.
Migration
- Create hierarchy_node and hierarchy_fetch_attempt tables with FKs and indexes.
- Seed SOURCE nodes for all DocumentSource values.
- Backfill document.parent_hierarchy_node_id to the matching SOURCE node based on connector source (deterministic via MIN connector_id).
- Create persona__hierarchy_node table with FKs and indexes.
- Create persona__document table with FKs and indexes.
- Add last_time_hierarchy_fetch column to connector_credential_pair.
- Add partial unique index to enforce a single SOURCE node per source.

^{Written for commit 9929509. Summary will update on new commits.}

greptile-apps · 2026-01-14T02:22:29Z

Greptile Summary

This PR introduces the foundational data models and database migration for a hierarchical file system implementation. It adds two new tables (hierarchy_node and hierarchy_fetch_attempt) to represent structural organization of documents across different sources like Google Drive, Confluence, Slack, etc.

Key changes:

Created HierarchyNode model to represent folders, spaces, channels, and other structural containers
Created HierarchyFetchAttempt model to track hierarchy sync operations similar to existing index attempts
Added HierarchyNodeType enum with source-specific types (folders, spaces, projects, channels, drives, etc.)
Extended Document model with parent_hierarchy_node_id to link documents to their containing hierarchy node
Migration automatically creates SOURCE-type root nodes for all existing document sources and backfills existing documents to point to their respective source root nodes
Proper bidirectional relationships established between documents and hierarchy nodes

The implementation is lightweight and focused on hierarchy structure only, keeping permissions and sync logic on the Document model. The migration includes both upgrade and downgrade paths with proper index cleanup.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The migration is well-structured with proper foreign keys, indexes, and constraints. The upgrade creates tables in the correct order (hierarchy_node before document column addition), backfills existing data deterministically, and the downgrade properly cleans up in reverse order. The models follow existing SQLAlchemy patterns in the codebase with appropriate type hints and relationships. Author confirmed testing of both upgrade and downgrade paths.
No files require special attention

Important Files Changed

Filename	Overview
backend/alembic/versions/81c22b1e2e78_hierarchy_nodes_v1.py	Added database migration to create `hierarchy_node` and `hierarchy_fetch_attempt` tables with proper indexes and relationships, plus data backfill for existing documents
backend/onyx/db/enums.py	Added `HierarchyNodeType` enum with types for different source hierarchies (folders, spaces, projects, channels, etc.)
backend/onyx/db/models.py	Added `HierarchyNode` and `HierarchyFetchAttempt` models with bidirectional relationships to `Document` table

Sequence Diagram

sequenceDiagram
    participant Migration as Alembic Migration
    participant DB as PostgreSQL Database
    participant HierarchyNode as hierarchy_node table
    participant Document as document table
    participant HierarchyFetch as hierarchy_fetch_attempt table

    Note over Migration,DB: Migration Upgrade Process

    Migration->>DB: Create hierarchy_node table
    DB-->>Migration: Table created with columns:<br/>id, raw_node_id, display_name,<br/>link, source, node_type,<br/>document_id, parent_id
    
    Migration->>DB: Add indexes on hierarchy_node
    DB-->>Migration: Created ix_hierarchy_node_parent_id<br/>Created ix_hierarchy_node_source_type<br/>Created uq_hierarchy_node_raw_id_source

    Migration->>DB: Create hierarchy_fetch_attempt table
    DB-->>Migration: Table created with columns:<br/>id, connector_credential_pair_id,<br/>status, nodes_fetched, nodes_updated,<br/>timestamps

    Migration->>DB: Add indexes on hierarchy_fetch_attempt
    DB-->>Migration: Created ix_hierarchy_fetch_attempt_status<br/>Created ix_hierarchy_fetch_attempt_time_created<br/>Created ix_hierarchy_fetch_attempt_cc_pair

    loop For each DocumentSource
        Migration->>HierarchyNode: Insert SOURCE-type node
        Note right of HierarchyNode: raw_node_id = source value<br/>display_name from lookup dict<br/>node_type = 'source'<br/>parent_id = NULL
    end

    Migration->>Document: Add parent_hierarchy_node_id column
    Document-->>Migration: Column added (nullable)

    Migration->>DB: Create foreign key constraint
    DB-->>Migration: fk_document_parent_hierarchy_node created

    Migration->>DB: Create index on document.parent_hierarchy_node_id
    DB-->>Migration: ix_document_parent_hierarchy_node_id created

    Migration->>DB: Execute UPDATE query to backfill documents
    Note over DB: Query joins document_by_connector_credential_pair<br/>with connector to get source,<br/>then joins hierarchy_node to set parent
    DB-->>Migration: All existing documents linked to SOURCE nodes

    Note over Migration,DB: Migration Complete - Ready for hierarchy data

cubic-dev-ai

No issues found across 3 files

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/alembic/versions/81c22b1e2e78_hierarchy_nodes_v1.py">

<violation number="1" location="backend/alembic/versions/81c22b1e2e78_hierarchy_nodes_v1.py:231">
P2: The index `ix_persona__hierarchy_node_persona_id` is redundant. The composite primary key `(persona_id, hierarchy_node_id)` already creates an index that efficiently supports lookups by `persona_id` alone. This extra index wastes disk space and adds write overhead. Consider removing this index (keep only the `hierarchy_node_id` index, which is necessary).</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

backend/alembic/versions/81c22b1e2e78_hierarchy_nodes_v1.py

evan-onyx requested a review from a team as a code owner January 14, 2026 02:20

cubic-dev-ai bot reviewed Jan 14, 2026

View reviewed changes

cubic-dev-ai bot reviewed Jan 15, 2026

View reviewed changes

backend/alembic/versions/81c22b1e2e78_hierarchy_nodes_v1.py Outdated Show resolved Hide resolved

evan-onyx force-pushed the feat/file-struct1 branch 10 times, most recently from a8d7850 to b6d0627 Compare January 23, 2026 23:52

evan-onyx force-pushed the feat/file-struct1 branch 4 times, most recently from a8a8894 to 290e9e3 Compare January 27, 2026 05:16

acaprau approved these changes Jan 27, 2026

View reviewed changes

evan-onyx added 11 commits January 27, 2026 15:11

feat(filesys): data models and migration

51b3c29

add connection table

37a05ae

cubic my goat

443309f

update head

db09ca7

hierarchyfetching time column

3609a60

update rev

0f13317

unique

9e942b2

update head

161dfe2

permissions

4ffa302

update

f535d04

document to persona connection

c3940b1

evan-onyx added 3 commits January 27, 2026 15:11

rebase migration

bbb228f

another migration rebase

ab9679d

cubic comments

86473fb

evan-onyx force-pushed the feat/file-struct1 branch from b7df988 to 86473fb Compare January 27, 2026 23:11

nit

9929509

evan-onyx added this pull request to the merge queue Jan 28, 2026

Merged via the queue into main with commit c2b11ca Jan 28, 2026
78 of 80 checks passed

evan-onyx deleted the feat/file-struct1 branch January 28, 2026 00:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(filesys): data models and migration#7402

feat(filesys): data models and migration#7402
evan-onyx merged 15 commits intomainfrom
feat/file-struct1

evan-onyx commented Jan 14, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

greptile-apps bot commented Jan 14, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

evan-onyx commented Jan 14, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Additional Options

Summary by cubic

Uh oh!

greptile-apps bot commented Jan 14, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

evan-onyx commented Jan 14, 2026 •

edited by cubic-dev-ai bot

Loading