Skip to content

Loader downloads audio even when InputMode.TEXT_ONLY is specified #2

@cleanshafay

Description

@cleanshafay

When using the MMUSEDFallacy loader with InputMode.TEXT_ONLY, the loader still proceeds to download and process audio files, which is unexpected behavior. This causes unnecessary bandwidth and storage usage, especially in constrained environments.


Steps to Reproduce:

  1. Install and import the dataset loader.

  2. Use the following snippet to load the dataset in TEXT_ONLY mode:

    from pathlib import Path
    from mamkit.datasets import MMUSEDFallacy, InputMode
    
    base_data_path = Path('./data')
    loader = MMUSEDFallacy(
        task_name='afc',
        input_mode=InputMode.TEXT_ONLY,
        base_data_path=base_data_path
    )
    split_info = loader.get_splits('mm-argfallacy-2025')
  3. Observe that the loader initiates:

    • Downloading audio files.
    • Extracting and building audio clips.

Expected Behavior:

When input_mode=InputMode.TEXT_ONLY is set, the loader should:

  • Skip downloading audio (download_audio)
  • Skip generating clips (generate_clips)
  • Avoid referencing audio paths like snippet_paths and dialogue_paths

Actual Behavior:

Despite requesting text-only mode:

  • The loader downloads audio via self.download_audio() in load().
  • Audio clips are generated via self.generate_clips().

These steps are triggered unconditionally if audio_clips does not already exist

if not self.clips_path.exists():
    logging.info('Downloading audio data...')
    self.download_audio()
    logging.info('Download completed!')

    logging.info('Building audio clips...')
    self.generate_clips()
    logging.info('Build completed')

    shutil.rmtree(self.audio_path)

Suggested Fix:

Conditionally skip the audio download/clip generation in load() based on the input_mode value:

if self.input_mode != InputMode.TEXT_ONLY and not self.clips_path.exists():
    ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions