Skip to Content
Ai LogDecoded-image cache (flash) for the training data pipeline

Decoded-image cache for DermaDetectDataset

Why

Fine-tuning in ddtrain is CPU/JPEG-decode bound, not GPU bound — during the Round-1 ablations the GPU sat at ~0% utilization while dataloader workers spent ~140 ms per image doing Image.open(...).convert("RGB") + downscale-from-original. The ResNet50 forward/backward finishes almost instantly and then waits on the next batch, so an expensive GPU idles behind the CPU decoder. This hits all three Round-1 streams equally (they share the same dataset/trainer over the same ~370k-image corpus).

What

A lazy, self-populating decoded-image cache keyed by dataset version + image size. The expensive decode + resize happens once per image; the result (a uint8 [image_size, image_size, 3] array) is written to flash and read back on every later epoch (and by every other stream).

  • Path: cache_dir/<version>/<image_size>/<uuid[:2]>/<uuid>.npy (sharded by uuid prefix so no directory holds 370k files).
  • Lazy: no build step. First epoch decodes + writes; later epochs read. Populates only images actually used (so it composes with subsampling).
  • Atomic writes (temp file + os.replace) → safe for concurrent dataloader workers and multiple streams writing the same flash cache.
  • Augmentation-safe: only the deterministic decode+resize is cached; the random train augments (flip / affine / normalize) still run per-epoch on the cached base image, so augmentation diversity is unchanged.
  • Fails open: a corrupt entry is rebuilt; a full/unwritable cache silently falls back to live decode — the cache can never break training.

Storing at the target size means the transform pipeline’s Resize is a no-op on a hit, so cached output is bit-identical to the live-decode path (verified: max|Δ| = 0).

How to enable

Add one line to any training config’s dataset: block (works for every model type and both streams’ configs):

dataset: cache_dir: /mnt/flash/dermadetect_cache # flash SSD, ~55 GB for the full v1 corpus

/mnt/flash is an SSD-backed LVM volume (414 GB free) on the dev box. Leave cache_dir unset to keep the old live-decode behaviour.

Measured effect

Per-image image-loading cost on the v1 val split, eval transforms:

pathms/image
live JPEG decode + resize~140
cached .npy read~1–5

25–120× faster image loading (varies with OS page cache). Correctness: max|nocache − cache| = 0.000 over the sampled images (miss and hit). First epoch is unchanged (it populates); every subsequent epoch should be GPU-bound instead of decode-bound.

Changes

  • ddtrain/datasets/dataset.py_load_image / _decode_image / _write_cache; new image_size, cache_dir, dataset_version params on DermaDetectDataset.
  • ddtrain/config.pyDatasetConfig.cache_dir.
  • ddtrain/training/trainer.py — passes cache_dir / image_size / version to both datasets.

No model, loss, or training-loop changes. Branch: feat/decode-tensor-cache off main, so Streams 1 & 2 can git merge it independently.

Last updated on