Skip to content

Add a dataset card draft for CPPsite 2.0 #19

@Permea-lab-admin

Description

@Permea-lab-admin

Summary

Draft a dataset card for CPPsite 2.0 or another public CPP-related source that may be useful for future delivery benchmark review.

The card should describe the source, label meaning, assay context, release posture, limitations, and benchmark-readiness status. This is a documentation contribution, not a request to add row-level data.

Why this matters

Permea is building an open technical foundation for sequence-first delivery and expression engineering. Dataset cards help make delivery evidence easier to inspect before it is used in a benchmark surface.

CPP and membrane-penetration evidence is an important adjacent task family. A careful dataset card can help contributors understand what a source supports, what it does not support, and what would be required before any benchmark task is proposed.

Suggested scope

Create a draft dataset card in an appropriate documentation location or propose the card content in the issue discussion first.

The draft should cover:

  • dataset or source name
  • source URL or citation
  • molecule or sequence type
  • label definition
  • positive and negative criteria, if available
  • assay or evidence context
  • public release posture
  • known limitations
  • possible benchmark task family
  • review status and open questions

Acceptance criteria

  • A draft dataset card is proposed with clear source attribution.
  • Label meaning is described conservatively.
  • Assay context and limitations are included where known.
  • Release posture is stated without assuming row-level redistribution rights.
  • Benchmark-readiness status is marked as draft, candidate, or review-needed.
  • The draft avoids unsupported biological, therapeutic, or clinical claims.

Claim boundaries

This issue does not ask for wet-lab validation, clinical interpretation, or a claim that a CPP dataset proves delivery behavior. The goal is source-aware documentation for possible future benchmark review.

References to relevant docs/files

  • docs/CONTRIBUTION_OBJECTS.md
  • docs/DELIVERY_DATASET_COMMONS.md
  • docs/SCIENTIFIC_THESIS.md
  • docs/BENCHMARK_EXECUTION_LAYER.md

Notes for contributors

Keep the first pass narrow and reviewable. If source rights or row-level release posture are unclear, document the uncertainty rather than uploading data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataset-cardDataset card contributions and dataset documentation.documentationImprovements or additions to documentationgood first issueGood for newcomers

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions