When using the MMUSEDFallacy loader with InputMode.TEXT_ONLY, the loader still proceeds to download and process audio files, which is unexpected behavior. This causes unnecessary bandwidth and storage usage, especially in constrained environments.
Steps to Reproduce:
-
Install and import the dataset loader.
-
Use the following snippet to load the dataset in TEXT_ONLY mode:
from pathlib import Path
from mamkit.datasets import MMUSEDFallacy, InputMode
base_data_path = Path('./data')
loader = MMUSEDFallacy(
task_name='afc',
input_mode=InputMode.TEXT_ONLY,
base_data_path=base_data_path
)
split_info = loader.get_splits('mm-argfallacy-2025')
-
Observe that the loader initiates:
- Downloading audio files.
- Extracting and building audio clips.
Expected Behavior:
When input_mode=InputMode.TEXT_ONLY is set, the loader should:
- Skip downloading audio (
download_audio)
- Skip generating clips (
generate_clips)
- Avoid referencing audio paths like
snippet_paths and dialogue_paths
Actual Behavior:
Despite requesting text-only mode:
- The loader downloads audio via
self.download_audio() in load().
- Audio clips are generated via
self.generate_clips().
These steps are triggered unconditionally if audio_clips does not already exist
if not self.clips_path.exists():
logging.info('Downloading audio data...')
self.download_audio()
logging.info('Download completed!')
logging.info('Building audio clips...')
self.generate_clips()
logging.info('Build completed')
shutil.rmtree(self.audio_path)
Suggested Fix:
Conditionally skip the audio download/clip generation in load() based on the input_mode value:
if self.input_mode != InputMode.TEXT_ONLY and not self.clips_path.exists():
...
When using the
MMUSEDFallacyloader withInputMode.TEXT_ONLY, the loader still proceeds to download and process audio files, which is unexpected behavior. This causes unnecessary bandwidth and storage usage, especially in constrained environments.Steps to Reproduce:
Install and import the dataset loader.
Use the following snippet to load the dataset in
TEXT_ONLYmode:Observe that the loader initiates:
Expected Behavior:
When
input_mode=InputMode.TEXT_ONLYis set, the loader should:download_audio)generate_clips)snippet_pathsanddialogue_pathsActual Behavior:
Despite requesting text-only mode:
self.download_audio()inload().self.generate_clips().These steps are triggered unconditionally if
audio_clipsdoes not already existSuggested Fix:
Conditionally skip the audio download/clip generation in
load()based on theinput_modevalue: