feat: add optional JSON functions support#1466
Draft
crm26 wants to merge 1 commit intoapache:mainfrom
Draft
Conversation
Add `datafusion-functions-json` as an optional feature (`json`), giving Python users `json_get_str`, `json_get`, `->`, `->>` and other JSON operators in SQL queries. When built with `--features json`, JSON functions are automatically registered with every SessionContext. Default builds are unaffected. Tested locally: json_get_str extracts values, nested paths work, GROUP BY on extracted JSON fields works. Changes: - Add `datafusion-functions-json` to workspace dependencies - Add optional dependency and `json` feature flag to core crate - Register JSON functions in SessionContext creation when feature is enabled Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7bac57e to
4c7d253
Compare
Member
|
Sounds like a very nice feature to have, but it would mean that we're now pulling in non-official code/functions into the official release. I don't know if that's a hard blocker, but I do want to bring the topic up on the mailing list for a wider audience before we merge this. I'm moving it to draft for that reason. |
Member
|
Also for this PR to go in we would want first class dataframe API support and not just SQL support and unit tests to cover. Since you're using claude you might be able to use at least portions of the skill I've started working on #1460 to help write those pieces. But first let's get a temperature read on the community. I'm 50/50 on the idea. |
Author
|
Thanks Tim. I have use cases that need json support. I am seeing a material
speed up using dataforge over duckdb with the unofficial library. Let me
know how I can help.
Thanks,
Christian
…On Mon, Mar 30, 2026 at 8:32 AM Tim Saucer ***@***.***> wrote:
*timsaucer* left a comment (apache/datafusion-python#1466)
<#1466?email_source=notifications&email_token=AN334FEZ4F6OQHS7KAPXFA34TJSPBA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJVGQ3DSNJSGQ22M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2LK4DSL5RW63LNMVXHIX3POBSW4X3DNRUWG2Y#issuecomment-4154695245>
Also for this PR to go in we would want first class dataframe API support
and not just SQL support and unit tests to cover. Since you're using claude
you might be able to use at least portions of the skill I've started
working on #1460 <#1460>
to help write those pieces.
But first let's get a temperature read on the community. I'm 50/50 on the
idea.
—
Reply to this email directly, view it on GitHub
<#1466?email_source=notifications&email_token=AN334FEZ4F6OQHS7KAPXFA34TJSPBA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJVGQ3DSNJSGQ22M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2LK4DSL5RW63LNMVXHIX3POBSW4X3DNRUWG2Y#issuecomment-4154695245>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AN334FDWMPAR3GHPSLYMINL4TJSPBAVCNFSM6AAAAACXGBYRJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCNJUGY4TKMRUGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes N/A — new feature request.
Rationale
DataFusion Python users currently have no way to query JSON fields in SQL. The
datafusion-functions-jsoncrate (underdatafusion-contrib) providesjson_extract,json_get,->,->>and other JSON operators, but these are only available in Rust. This PR exposes them to Python users via an optional feature flag.What changes are included in this PR?
datafusion-functions-json(v0.53) to workspace dependenciesjsonfeature flag to core crateSessionContextcreation when feature is enabled3 files changed, 11 insertions.
Are these changes tested?
Not yet — requesting feedback on approach before adding tests. Tests would verify:
json_extract_string(col, '$.path')works in SQL queriesjsonfeature) compiles and runs without regressionSessionContext()creationAre there any user-facing changes?
When built with
--features json:Default builds are unaffected.
🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com