Summary
Draft a dataset card for CPPsite 2.0 or another public CPP-related source that may be useful for future delivery benchmark review.
The card should describe the source, label meaning, assay context, release posture, limitations, and benchmark-readiness status. This is a documentation contribution, not a request to add row-level data.
Why this matters
Permea is building an open technical foundation for sequence-first delivery and expression engineering. Dataset cards help make delivery evidence easier to inspect before it is used in a benchmark surface.
CPP and membrane-penetration evidence is an important adjacent task family. A careful dataset card can help contributors understand what a source supports, what it does not support, and what would be required before any benchmark task is proposed.
Suggested scope
Create a draft dataset card in an appropriate documentation location or propose the card content in the issue discussion first.
The draft should cover:
- dataset or source name
- source URL or citation
- molecule or sequence type
- label definition
- positive and negative criteria, if available
- assay or evidence context
- public release posture
- known limitations
- possible benchmark task family
- review status and open questions
Acceptance criteria
- A draft dataset card is proposed with clear source attribution.
- Label meaning is described conservatively.
- Assay context and limitations are included where known.
- Release posture is stated without assuming row-level redistribution rights.
- Benchmark-readiness status is marked as draft, candidate, or review-needed.
- The draft avoids unsupported biological, therapeutic, or clinical claims.
Claim boundaries
This issue does not ask for wet-lab validation, clinical interpretation, or a claim that a CPP dataset proves delivery behavior. The goal is source-aware documentation for possible future benchmark review.
References to relevant docs/files
docs/CONTRIBUTION_OBJECTS.md
docs/DELIVERY_DATASET_COMMONS.md
docs/SCIENTIFIC_THESIS.md
docs/BENCHMARK_EXECUTION_LAYER.md
Notes for contributors
Keep the first pass narrow and reviewable. If source rights or row-level release posture are unclear, document the uncertainty rather than uploading data.
Summary
Draft a dataset card for CPPsite 2.0 or another public CPP-related source that may be useful for future delivery benchmark review.
The card should describe the source, label meaning, assay context, release posture, limitations, and benchmark-readiness status. This is a documentation contribution, not a request to add row-level data.
Why this matters
Permea is building an open technical foundation for sequence-first delivery and expression engineering. Dataset cards help make delivery evidence easier to inspect before it is used in a benchmark surface.
CPP and membrane-penetration evidence is an important adjacent task family. A careful dataset card can help contributors understand what a source supports, what it does not support, and what would be required before any benchmark task is proposed.
Suggested scope
Create a draft dataset card in an appropriate documentation location or propose the card content in the issue discussion first.
The draft should cover:
Acceptance criteria
Claim boundaries
This issue does not ask for wet-lab validation, clinical interpretation, or a claim that a CPP dataset proves delivery behavior. The goal is source-aware documentation for possible future benchmark review.
References to relevant docs/files
docs/CONTRIBUTION_OBJECTS.mddocs/DELIVERY_DATASET_COMMONS.mddocs/SCIENTIFIC_THESIS.mddocs/BENCHMARK_EXECUTION_LAYER.mdNotes for contributors
Keep the first pass narrow and reviewable. If source rights or row-level release posture are unclear, document the uncertainty rather than uploading data.