Skip to content

feat(posthog): track message origin analytics in posthog#7313

Merged
rohoswagger merged 5 commits intomainfrom
roshan/chat-message-origin-analytics
Jan 10, 2026
Merged

feat(posthog): track message origin analytics in posthog#7313
rohoswagger merged 5 commits intomainfrom
roshan/chat-message-origin-analytics

Conversation

@rohoswagger
Copy link
Copy Markdown
Contributor

@rohoswagger rohoswagger commented Jan 9, 2026

Description

add message origin analytics in posthog for all chat messages

How Has This Been Tested?

in posthog

Additional Options

  • [Optional] Override Linear Check

Summary by cubic

Track the origin of every chat message (web app, Chrome extension, API, Slack) and send analytics to PostHog. Improves visibility into usage across surfaces without changing user behavior.

  • New Features
    • Added MessageOrigin enum and threaded it through SendMessageRequest/CreateChatMessageRequest.
    • Set origin automatically per entry point: webapp (default), chrome_extension (UI), api (server APIs), slackbot (Slack handler), unknown (fallback). Force API origin when authenticated via API key or PAT.
    • Backend emits user_message_sent with origin, has_files, has_project, has_persona, deep_research, and tenant_id.
    • Frontend (extension) emits extension_chat_query with extension_context, assistant_id, has_files, and deep_research.
    • Safe telemetry loading via versioned fetch with noop fallback to avoid runtime errors when telemetry is unavailable.

Written for commit 3ef78fe. Summary will update on new commits.

@rohoswagger rohoswagger requested a review from a team as a code owner January 9, 2026 17:58
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 9 files

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Overview

This PR adds message origin analytics tracking to PostHog across all chat entry points (webapp, Chrome extension, API, Slack). The implementation includes:

What Changed

  1. Backend Models: Added MessageOrigin enum with 5 values (webapp, chrome_extension, api, slackbot, unknown)
  2. Request Models: Added origin field to both SendMessageRequest and CreateChatMessageRequest
  3. Analytics Tracking: Integrated PostHog telemetry in process_message.py with fallback for unavailable telemetry module
  4. Frontend Origin Detection: Detects whether requests come from webapp or Chrome extension using getExtensionContext()
  5. API Endpoint Updates: Set explicit origins (API, SLACKBOT) at relevant entry points

Critical Issue Found

Frontend Origin Not Propagated to Backend: In useChatController.ts, the origin (chrome_extension or webapp) is correctly determined and set in the local CurrentMessageFIFO stack (line 696), but this origin is never actually passed to the backend. The origin remains stuck in client-side state and isn't included in the sendMessage() API call parameters. This means:

  • All frontend messages (both webapp and extension) will arrive at the backend with the default origin (WEBAPP)
  • The Chrome extension origin tracking will fail silently
  • Analytics will show all frontend messages as "webapp" regardless of actual source

Secondary Issues

  1. Inconsistent Default Origins: SendMessageRequest defaults to WEBAPP while CreateChatMessageRequest defaults to UNKNOWN, causing unpredictable behavior
  2. No Origin Override in API Endpoints: The OneShotQARequest doesn't accept an origin parameter, so hardcoding API origin may mask other use cases
  3. Type Duplication: MessageOrigin type is duplicated between Python and TypeScript, risking sync issues
  4. Missing Type Hints: PostHog properties dict lacks explicit typing for better IDE support and correctness checking

Risk Assessment

  • High Risk: The frontend origin detection is completely non-functional due to missing parameter propagation. Users relying on extension origin tracking will get incorrect data
  • Medium Risk: Analytics accuracy depends on internal API endpoints properly setting the origin, which is only partially done
  • Low Risk: Fallback mechanism for PostHog module is well-designed and shouldn't cause runtime errors

The feature is incomplete and would send inaccurate telemetry if merged as-is.

Confidence Score: 2/5

  • NOT SAFE TO MERGE - Critical functionality gap: frontend origin detection is implemented but never sent to backend, causing silent failure of Chrome extension origin tracking.
  • Score reflects a critical logic bug where the frontend correctly detects the message origin (webapp vs extension) but fails to propagate this information to the backend API call. This means the entire Chrome extension origin tracking feature is non-functional. Additionally, inconsistent default values between request models and lack of origin parameter support in some API endpoints create further correctness issues. The PostHog telemetry fallback is well-designed (using fetch_versioned_implementation_with_fallback), which prevents runtime errors, but the data being sent will be inaccurate.
  • web/src/app/chat/hooks/useChatController.ts (critical), backend/onyx/server/query_and_chat/models.py (defaults), backend/onyx/server/query_and_chat/query_backend.py (hardcoded origin)

Important Files Changed

File Analysis

Filename Score Overview
backend/onyx/server/query_and_chat/models.py 4/5 Added MessageOrigin enum with 5 values (webapp, chrome_extension, api, slackbot, unknown). Added origin field to SendMessageRequest (default: WEBAPP) and CreateChatMessageRequest (default: UNKNOWN). Inconsistent defaults between the two models could cause analytics confusion.
backend/onyx/chat/process_message.py 4/5 Added PostHog event_telemetry tracking with origin and other analytics properties. Uses fetch_versioned_implementation_with_fallback with noop_fallback for safe loading. Missing explicit type hints for the properties dict and good pattern otherwise.
backend/onyx/server/query_and_chat/query_backend.py 3/5 Sets origin=MessageOrigin.API in get_answer_stream. However, hardcodes the origin without checking if OneShotQARequest might have its own origin field, which could cause the caller's intent to be ignored.
web/src/app/chat/hooks/useChatController.ts 2/5 Determines messageOrigin correctly (chrome_extension vs webapp) and sets it in CurrentMessageFIFO stack. However, this origin is never actually passed through to the sendMessage API call - the origin sits in the local stack but isn't included in the actual HTTP request parameters.

Sequence Diagram

sequenceDiagram
    participant Frontend as Frontend (useChatController)
    participant API as Backend API<br/>(send-chat-message)
    participant Chat as Chat Processor<br/>(handle_stream_message_objects)
    participant PostHog as PostHog Analytics

    Frontend->>Frontend: getExtensionContext()<br/>Determine: webapp vs chrome_extension
    Frontend->>Frontend: Set origin in CurrentMessageFIFO
    Note over Frontend: ⚠️ ISSUE: origin not passed<br/>to sendMessage() params
    
    Frontend->>API: sendMessage(params)<br/>origin field MISSING!
    
    API->>Chat: SendMessageRequest<br/>with origin (defaults to WEBAPP)
    
    Chat->>Chat: Extract origin from request
    Chat->>PostHog: event_telemetry("user_message_sent")<br/>properties: {origin, has_files,<br/>has_project, has_persona,<br/>deep_research, tenant_id}
    
    PostHog-->>Chat: Success
    
    Chat->>Chat: Continue message processing
    Chat-->>API: Stream response
    API-->>Frontend: Streamed packets
Loading

@rohoswagger
Copy link
Copy Markdown
Contributor Author

@greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds comprehensive message origin tracking across all chat message entry points (webapp, Chrome extension, API, Slack) and emits telemetry events to PostHog for analytics.

What Changed

Backend:

  • Added MessageOrigin enum with values: webapp, chrome_extension, api, slackbot, unknown
  • Extended SendMessageRequest and CreateChatMessageRequest models with an origin field (defaults to UNKNOWN)
  • Modified process_message.py to emit user_message_sent PostHog events with origin and contextual properties (has_files, has_project, has_persona, deep_research, tenant_id)
  • Updated all backend entry points to set appropriate origin values:
    • API endpoints in ee/chat_backend.py and query_backend.py set origin=API
    • Slack handler sets origin=SLACKBOT
    • Main chat endpoint trusts client-provided origin value

Frontend:

  • Added MessageOrigin TypeScript type matching backend enum
  • Created getExtensionContext() utility to detect extension usage based on URL pathname
  • Modified useChatController.ts to determine origin (webapp vs chrome_extension) and pass it when sending messages
  • Added frontend PostHog tracking for extension queries with extension_chat_query event
  • Used safe telemetry loading pattern with fetch_versioned_implementation_with_fallback and noop fallback

How It Fits

The changes integrate cleanly with existing telemetry infrastructure (mt_cloud_telemetry, PostHog client). The origin field threads through the entire message flow from entry point → request models → processing → telemetry emission. The implementation follows established patterns for versioned feature loading and falls back gracefully when telemetry is unavailable.

Confidence Score: 5/5

  • Safe to merge - no breaking changes, proper fallback handling, non-blocking telemetry tracking
  • The implementation is well-structured with proper error handling (noop fallback for telemetry), type safety (enum on backend, union type on frontend), and follows existing codebase patterns. The only issue is a non-critical data integrity concern where the main web endpoint trusts client-provided origin values, but this doesn't break functionality.
  • No files require special attention - the single non-blocking issue flagged in chat_backend.py is a data integrity improvement, not a functional bug

Important Files Changed

File Analysis

Filename Score Overview
backend/onyx/server/query_and_chat/models.py 5/5 Adds MessageOrigin enum and origin field to SendMessageRequest and CreateChatMessageRequest models with UNKNOWN default
backend/onyx/chat/process_message.py 5/5 Adds PostHog telemetry tracking for user_message_sent event with origin, files, project, persona, and deep_research properties; properly propagates origin through message flow
backend/onyx/chat/chat_utils.py 5/5 Adds optional origin parameter to prepare_chat_message_request, defaulting to UNKNOWN if not provided
backend/ee/onyx/server/query_and_chat/chat_backend.py 5/5 Sets origin=MessageOrigin.API for both simplified chat endpoints, correctly identifying API-sourced messages
backend/onyx/server/query_and_chat/query_backend.py 5/5 Sets origin=MessageOrigin.API for one-shot Q&A endpoint with helpful comment explaining the choice
backend/onyx/onyxbot/slack/handlers/handle_regular_answer.py 5/5 Sets origin=MessageOrigin.SLACKBOT when preparing chat messages from Slack
web/src/app/chat/services/lib.tsx 5/5 Adds MessageOrigin type and origin parameter to sendMessage function, defaults to 'unknown' with comment about explicit setting
web/src/app/chat/hooks/useChatController.ts 5/5 Determines origin via getExtensionContext, passes to sendMessage, and tracks extension_chat_query event in PostHog for extension users
web/src/lib/extension/utils.ts 5/5 Adds getExtensionContext function to detect extension usage based on pathname (/chat/nrf paths)

Sequence Diagram

sequenceDiagram
    participant User
    participant Frontend
    participant ExtensionUtils
    participant SendMessage
    participant Backend
    participant ProcessMessage
    participant PostHog
    
    User->>Frontend: Sends chat message
    Frontend->>ExtensionUtils: getExtensionContext()
    ExtensionUtils-->>Frontend: {isExtension, context}
    Frontend->>Frontend: Determine origin<br/>(webapp or chrome_extension)
    
    Frontend->>SendMessage: sendMessage(message, origin)
    SendMessage->>Backend: POST /api/chat/send-chat-message<br/>SendMessageRequest{origin}
    
    Backend->>ProcessMessage: handle_stream_message_objects(new_msg_req)
    ProcessMessage->>ProcessMessage: Translate to CreateChatMessageRequest<br/>with origin field
    
    ProcessMessage->>PostHog: event_telemetry("user_message_sent",<br/>{origin, has_files, has_project, etc})
    PostHog-->>ProcessMessage: (async tracking)
    
    ProcessMessage-->>Backend: Stream packets
    Backend-->>Frontend: SSE stream
    
    alt Extension user
        Frontend->>PostHog: capture("extension_chat_query",<br/>{extension_context, assistant_id, etc})
    end
    
    Frontend-->>User: Display response
Loading

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Jan 9, 2026

Additional Comments (1)

backend/onyx/server/query_and_chat/chat_backend.py
[P2] The /send-chat-message endpoint trusts the client-provided origin field without validation. API clients calling this endpoint could set origin="webapp" or origin="chrome_extension", polluting telemetry data. Consider determining origin server-side based on authentication method or request characteristics (e.g., API key vs web session), or restrict this endpoint to web UI only and direct API clients to use dedicated API endpoints like those in ee/chat_backend.py.

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/onyx/server/query_and_chat/chat_backend.py
Line: 510:516

Comment:
[P2] The `/send-chat-message` endpoint trusts the client-provided `origin` field without validation. API clients calling this endpoint could set `origin="webapp"` or `origin="chrome_extension"`, polluting telemetry data. Consider determining origin server-side based on authentication method or request characteristics (e.g., API key vs web session), or restrict this endpoint to web UI only and direct API clients to use dedicated API endpoints like those in `ee/chat_backend.py`.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 34 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="web/tailwind-themes/tailwind.config.js">

<violation number="1">
P3: The comment contradicts the code - it says the plugin "is not needed here" but then immediately adds it. Consider rewording to clarify that the plugin IS needed for Tailwind v3 and will become unnecessary after upgrading to v4.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@rohoswagger rohoswagger enabled auto-merge January 9, 2026 22:25
Copy link
Copy Markdown
Member

@wenxi-onyx wenxi-onyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments :)

const { forcedToolIds } = useForcedTools();
const { fetchProjects, setCurrentMessageFiles, beginUpload } =
useProjectsContext();
const posthog = usePostHog();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to only do this if we know it's the chrome extension (since we only emit for chrome ext)? I.e. could there be unnecessary performance impact in web app?

TBH, I'm really not sure about this, so relying on you to decide :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idt the performance impact would be measurable, our chat is not super quick anyways LOL

@rohoswagger rohoswagger added this pull request to the merge queue Jan 9, 2026
@wenxi-onyx wenxi-onyx removed this pull request from the merge queue due to a manual request Jan 9, 2026
@rohoswagger rohoswagger added this pull request to the merge queue Jan 10, 2026
Merged via the queue into main with commit e2b60bf Jan 10, 2026
79 of 81 checks passed
@rohoswagger rohoswagger deleted the roshan/chat-message-origin-analytics branch January 10, 2026 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants