[WIP]HIVE-29228: Support vended credentials for S3#6458
Conversation
0e22fb9 to
56887bb
Compare
| <dependency> | ||
| <groupId>org.apache.hive</groupId> | ||
| <artifactId>hive-exec</artifactId> | ||
| <version>${hive.version}</version> | ||
| <classifier>core</classifier> | ||
| <scope>provided</scope> |
There was a problem hiding this comment.
rest-catalog now depends on a hive-exec-core? scope was test-only before
There was a problem hiding this comment.
Unfortunately, Hive's authorization framework is currently located in hive-exec.
There was a problem hiding this comment.
hm, we should avoid hard coupling between HMS and HS2. is there some common place we could extract auth functionality?
| /** | ||
| * An S3 location. | ||
| */ | ||
| class S3Location { |
There was a problem hiding this comment.
should this be in common or server?
There was a problem hiding this comment.
SPI classes are located in metastore-server. As this is package-private, we can easily move to another package or project.
https://docs.google.com/document/d/1zJedKkYdxulquj34sHMhyLn7D-AiGPcelADiK-AnPNo/edit?usp=sharing
| /** | ||
| * The list of I/O operations to access a storage system. | ||
| */ | ||
| public enum StorageOperation { | ||
| LIST, READ, CREATE, DELETE, |
There was a problem hiding this comment.
Is WRITE/PUT already covered by CREATE?
There was a problem hiding this comment.
Yes.
BTW, using CREATE here is intentional. Amazon S3 supports both creation and overwrite as a single PUT API. Some storage services provide separate API endpoints for them. I intentionally expressed a creation here. We may support PUT here in the future. (probably lakehouse systems don't need a overwrite)
|
|
||
| // AWS Integration tests don't run by default. You need to set up your environment. | ||
| // | ||
| // ACCOUNT_ID={your AWS account ID} |
There was a problem hiding this comment.
could we configure it with docker cluster?
please see 350c4d4#diff-c712864f40434ee3783c8a59a6a21c92f5c8406b687452fe78971291d54ec8f7R36-R46
There was a problem hiding this comment.
If it is a testing purpose, it sounds ok. If it is for real integrations, AWS provides some better ways than pure env variables.
https://docs.google.com/document/d/1zJedKkYdxulquj34sHMhyLn7D-AiGPcelADiK-AnPNo/edit?tab=t.0#heading=h.tjvz8btavp51
There was a problem hiding this comment.
yes, for quick-start dev setup
henrib
left a comment
There was a problem hiding this comment.
1-May be vending should be aware of the storage operation.
2-An interface would open to other integrations, delegation to a different/centralized vending mechanism (Knox + Ranger comes to mind).
| * The list of I/O operations to access a storage system. | ||
| */ | ||
| public enum StorageOperation { | ||
| LIST, READ, CREATE, DELETE, |
There was a problem hiding this comment.
I agree; we should have the 'meta' (LIST, CREATE, DELETE) and the 'data' (READ-ONLY, READ-WRITE).
There was a problem hiding this comment.
Metadata & Data are concepts for Iceberg. So, it should be handled in IcebergVendedCredentialProvider. Fine-grained control against meta and data could be a potential idea. I'm not sure if we should immediately introduce it, as Polaris or Gravitino doesn't give such detailed access.
|
|
||
| public IcebergVendedCredentialProvider(Configuration conf) { | ||
| this.catalog = MetaStoreUtils.getDefaultCatalog(conf); | ||
| this.vendedCredentialProvider = new CompositeVendedCredentialProvider(conf); |
There was a problem hiding this comment.
This should be an interface; implementation should be fully configurable (class name in config).
There was a problem hiding this comment.
This is a facade, and its content can be configurable.
https://docs.google.com/document/d/1zJedKkYdxulquj34sHMhyLn7D-AiGPcelADiK-AnPNo/edit?tab=t.0#heading=h.nfdkm2jftq6s
|
3c5e969 to
f54653a
Compare
| List<HivePrivilegeObject> output) { | ||
| var context = new HiveAuthzContext.Builder().build(); | ||
| try { | ||
| authorizer.checkPrivileges(HiveOperationType.QUERY, input, output, context); |
There was a problem hiding this comment.
This would do multiple hops with ranger. Instead we could use another API showPrivileges() in ranger to get what privileges the user has on a particular asset. We can then prioritize the privilege in Hive and proceed accordingly. This would result in 1 hop with ranger.



What changes were proposed in this pull request?
Credential vending implementation. The decision points are articulated here.
https://docs.google.com/document/d/1zJedKkYdxulquj34sHMhyLn7D-AiGPcelADiK-AnPNo/edit?usp=sharing
https://issues.apache.org/jira/browse/HIVE-29228
I plan to implement
/v1/{prefix}/namespaces/{namespace}/tables/{table}/credentialsin another PR.Why are the changes needed?
For better security control.
Does this PR introduce any user-facing change?
No. It is disabled by default.
How was this patch tested?
I added integration tests.