Skip to content

[WIP]HIVE-29228: Support vended credentials for S3#6458

Draft
okumin wants to merge 9 commits into
apache:masterfrom
okumin:HIVE-29228-vended-credentials
Draft

[WIP]HIVE-29228: Support vended credentials for S3#6458
okumin wants to merge 9 commits into
apache:masterfrom
okumin:HIVE-29228-vended-credentials

Conversation

@okumin

@okumin okumin commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Credential vending implementation. The decision points are articulated here.
https://docs.google.com/document/d/1zJedKkYdxulquj34sHMhyLn7D-AiGPcelADiK-AnPNo/edit?usp=sharing

https://issues.apache.org/jira/browse/HIVE-29228

I plan to implement /v1/{prefix}/namespaces/{namespace}/tables/{table}/credentials in another PR.

Why are the changes needed?

For better security control.

Does this PR introduce any user-facing change?

No. It is disabled by default.

How was this patch tested?

I added integration tests.

Comment on lines +29 to +34
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
<classifier>core</classifier>
<scope>provided</scope>

@deniskuzZ deniskuzZ Apr 30, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest-catalog now depends on a hive-exec-core? scope was test-only before

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, Hive's authorization framework is currently located in hive-exec.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, we should avoid hard coupling between HMS and HS2. is there some common place we could extract auth functionality?

/**
* An S3 location.
*/
class S3Location {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be in common or server?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPI classes are located in metastore-server. As this is package-private, we can easily move to another package or project.
https://docs.google.com/document/d/1zJedKkYdxulquj34sHMhyLn7D-AiGPcelADiK-AnPNo/edit?usp=sharing

Comment on lines +21 to +25
/**
* The list of I/O operations to access a storage system.
*/
public enum StorageOperation {
LIST, READ, CREATE, DELETE,

@deniskuzZ deniskuzZ Apr 30, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is WRITE/PUT already covered by CREATE?

@okumin okumin May 2, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

BTW, using CREATE here is intentional. Amazon S3 supports both creation and overwrite as a single PUT API. Some storage services provide separate API endpoints for them. I intentionally expressed a creation here. We may support PUT here in the future. (probably lakehouse systems don't need a overwrite)


// AWS Integration tests don't run by default. You need to set up your environment.
//
// ACCOUNT_ID={your AWS account ID}

@deniskuzZ deniskuzZ Apr 30, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we configure it with docker cluster?
please see 350c4d4#diff-c712864f40434ee3783c8a59a6a21c92f5c8406b687452fe78971291d54ec8f7R36-R46

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a testing purpose, it sounds ok. If it is for real integrations, AWS provides some better ways than pure env variables.
https://docs.google.com/document/d/1zJedKkYdxulquj34sHMhyLn7D-AiGPcelADiK-AnPNo/edit?tab=t.0#heading=h.tjvz8btavp51

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for quick-start dev setup

@henrib henrib left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1-May be vending should be aware of the storage operation.
2-An interface would open to other integrations, delegation to a different/centralized vending mechanism (Knox + Ranger comes to mind).

* The list of I/O operations to access a storage system.
*/
public enum StorageOperation {
LIST, READ, CREATE, DELETE,

@henrib henrib May 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; we should have the 'meta' (LIST, CREATE, DELETE) and the 'data' (READ-ONLY, READ-WRITE).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata & Data are concepts for Iceberg. So, it should be handled in IcebergVendedCredentialProvider. Fine-grained control against meta and data could be a potential idea. I'm not sure if we should immediately introduce it, as Polaris or Gravitino doesn't give such detailed access.


public IcebergVendedCredentialProvider(Configuration conf) {
this.catalog = MetaStoreUtils.getDefaultCatalog(conf);
this.vendedCredentialProvider = new CompositeVendedCredentialProvider(conf);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an interface; implementation should be fully configurable (class name in config).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sonarqubecloud

sonarqubecloud Bot commented May 3, 2026

Copy link
Copy Markdown

List<HivePrivilegeObject> output) {
var context = new HiveAuthzContext.Builder().build();
try {
authorizer.checkPrivileges(HiveOperationType.QUERY, input, output, context);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would do multiple hops with ranger. Instead we could use another API showPrivileges() in ranger to get what privileges the user has on a particular asset. We can then prioritize the privilege in Hive and proceed accordingly. This would result in 1 hop with ranger.

@okumin

okumin commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants