You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is aset of governance standards will provide a foundation for managing ML projects effectively and professionally. Remember to adapt and extend these standards as the project evolves. Having a well-defined governance framework is crucial for long-term success.
I. Versioning and Metadata Management:
Project Versioning:
Semantic Versioning (SemVer) will be used for the overall project version (e.g., v1.2.3). This allows for clear communication of changes between releases.
The project version will be stored in the project_config.yaml file and updated with each release.
Git tags will be used to mark specific project versions in the repository.
Component Versioning:
Versioning will be applied to individual components (data schemas, models, etc.) where applicable.
Component versions will also follow SemVer and be stored in the project_config.yaml file.
The project_metadata.json file will track the specific versions of each component used in a given project instance.
Metadata Store:
A project_metadata.json file will serve as the central repository for all project metadata.
This file will include:
Project and component versions.
Data schema versions.
Model versions.
Docker image tags.
Creation timestamps.
Any other relevant metadata.
Metadata Updates:
Metadata will be updated programmatically using the update_metadata.py script (or a similar mechanism).
Updates will occur after key events, such as model training, data schema changes, or deployment.
This ensures that the metadata is always consistent and up-to-date.
II. Dependency Management:
Dependency Declaration:
All project dependencies will be explicitly declared in the requirements.txt file.
This file will be generated and managed by the create_project.py script based on the dependencies specified in project_config.yaml.
Virtual Environments:
Virtual environments will be used for all projects to isolate dependencies and prevent conflicts.
The setup script will automatically create a virtual environment if one doesn't exist.
Dependency Updates:
Dependencies will be reviewed and updated periodically to ensure compatibility and access to the latest features and security patches.
Dependency updates will be tested thoroughly before being incorporated into the project.
III. Security:
Service Accounts (GCP):
Separate service accounts will be used for different components (e.g., DataFlow, Cloud Functions) to follow the principle of least privilege.
The governance section of the project_config.yaml file will define the allowed service accounts.
Components will reference these governance-defined service accounts using variable substitution (e.g., $governance.service_account_dataflow).
Secrets Management:
Sensitive information (API keys, passwords, etc.) will never be stored directly in the configuration files or code.
A secure secrets management solution such as Vault or Google Cloud Secret Manager) will be used to store and access secrets.
Code Reviews:
All code changes will undergo a code review process before being merged into the main branch.
Code reviews will focus on security best practices, code quality, and adherence to the project's coding standards.
IV. Reproducibility:
Configuration Management:
All project configurations will be stored in version-controlled configuration files (e.g., project_config.yaml).
This ensures that the project can be easily reproduced in different environments.
Automated Build and Deployment:
Automated build and deployment pipelines will be used to minimize manual steps and ensure consistency.
These pipelines will use the configuration files and metadata to build and deploy the project.
Data Versioning:
Data versioning will be implemented using appropriate tools (e.g., DVC, Git LFS) to track changes in datasets.
This allows for reproducibility of experiments and models.
V. Component Standards:
Naming Conventions:
Components will follow a consistent naming convention to clearly indicate their purpose and version (e.g., dataflow_process_files_and_load_to_bq_v1_2).
Code Style:
Code will adhere to a defined style guide (e.g., PEP 8 for Python) to maintain consistency and readability.
Documentation:
All components will be well-documented, including their purpose, inputs, outputs, and dependencies.
VI. Governance Process:
Governance Committee:
A governance committee will be responsible for defining, maintaining, and enforcing these governance standards.
Standard Updates:
These standards will be reviewed and updated periodically to reflect best practices and evolving project needs.
Enforcement:
The governance committee will ensure that all projects adhere to these standards.