feat: add storage_options to BasePath#6297
feat: add storage_options to BasePath#6297cmccabe wants to merge 7 commits intolance-format:mainfrom
Conversation
For Azure, different storage accounts can have buckets (containers) of the same name. Also storage account is the unit of rate limit cap. When expressed as base, today we try to match base based on the prefix, and that would mean it can only match 1 bucket but not the others if user defines multiple bases of the same buckets in different storage accounts. This PR adds the ability to define storage_options as a part of the base definition.
PR Review: feat: add storage_options to BasePathThe feature design is sound — storage options as part of base path identity is the right approach for Azure multi-account scenarios. Proto design, Java/Python bindings, conflict resolution, and ambiguity handling all look good. Two issues to flag: P1:
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Xuanwo
left a comment
There was a problem hiding this comment.
Thank you for working on this, only some questions.
| input_params: Option<&'a ObjectStoreParams>, | ||
| ) -> Cow<'a, ObjectStoreParams> { | ||
| if let Some(params) = input_params | ||
| && base_path.storage_options.is_empty() |
There was a problem hiding this comment.
if base_path.storage_options.is_empty(), we will return the input_params directly so that the base_path_prefix doesn't work anymore.
| { | ||
| return Cow::Borrowed(params); | ||
| } | ||
| let mut merged_storage_options = base_path.storage_options.clone(); |
There was a problem hiding this comment.
Hi, can you add a comments for object_store_params_for_base_path to declare the expected merge order? It seems we can to use input_params to override the base path's params.
| // Storage options relevant to identifying the base path. For example, azure_storage_account_name | ||
| // goes here. Credentials and options which don't affect the identity of the data being stored | ||
| // should not be placed here. | ||
| map<string, string> storage_options = 5; |
There was a problem hiding this comment.
inconveniently, this is a spec change which means it needs a vote
There was a problem hiding this comment.
In this file, we have a place that needs to be changed around ExternalBaseResolver::resolve_external_uri
This becomes interesting with this change, because I can now have multiple bases of the same prefix, and the resolution logic cannot determine which one to use.
I think there is a way to resolve this, by checking the external link's query parameters, for example I can write az://container/file.jpg?base=1 to indicate the exact base to use. @Xuanwo curious about your opinion on this.
Another alternative commonly used is az://container@account/... but that feels too Azure/OCI specific.
For Azure, different storage accounts can have buckets (containers) of the same name. Also storage account is the unit of rate limit cap.
When expressed as base, today we try to match base based on the prefix, and that would mean it can only match 1 bucket but not the others if user defines multiple bases of the same buckets in different storage accounts.
This PR adds the ability to define storage_options as a part of the base definition.