Loading IIRC and incremental datasets

Usage with incremental-CIFAR100, IIRC-CIFAR100, incremental-Imagenet, and IIRC-Imagenet

[1]:
import sys
sys.path.append("../..")

from iirc.datasets_loader import get_lifelong_datasets
from iirc.definitions import PYTORCH, IIRC_SETUP
from iirc.utils.download_cifar import download_extract_cifar100

For using these datasets with the preset tasks schedules, the original CIFAR100 and/or ImageNet2012 need to be downloaded first.

In the case of CIFAR100, the dataset can be downloaded using the following method

[2]:
download_extract_cifar100("../../data")
downloading CIFAR 100
dataset downloaded
extracting CIFAR 100
dataset extracted

In the case of ImageNet, it has to be downloaded manually, and be arranged in the following manner: * Imagenet * train * n01440764 * n01443537 * … * val * n01440764 * n01443537 * …

Then the get_lifelong_datasets function should be used. The tasks schedules/configurations preset per dataset are:

  • Incremental-CIFAR100: 10 configurations, each starting with 50 classes in the first task, followed by 10 tasks each having 5 classes

  • IIRC-CIFAR100: 10 configurations, each starting with 10 superclasses in the first task, followed by 21 tasks each having 5 classes

  • Incremental-Imagenet-full: 5 configurations, each starting with 160 classes in the first task, followed by 28 tasks each having 30 classes

  • Incremental-Imagenet-lite: 5 configurations, each starting with 160 classes in the first task, followed by 9 tasks each having 30 classes

  • IIRC-Imagenet-full: 5 configurations, each starting with 63 superclasses in the first task, followed by 34 tasks each having 30 classes

  • IIRC-Imagenet-lite: 5 configurations, each starting with 63 superclasses in the first task, followed by 9 tasks each having 30 classes

Although these configurations might seem they are limiting the choices, but the point here is to have a standard set of tasks and class orders so that the results are comparable across different works, otherwise if needed, new task configurations can be added manually as well in the metadata folder

We also need a transformations function that takes the image and converts it to a tensor, as well as normalize the image, apply augmentations, etc.

There are two such functions that can be provided: essential_transforms_fn and augmentation_transforms_fn

essential_transforms_fn should include any essential transformations that should be applied to the PIL image (such as convert to tensor), while augmentation_transforms_fn should also include the essential transformations, in addition to any augmentations that need to be applied (such as random horizontal flipping, etc)

[3]:
import torchvision.transforms as transforms

essential_transforms_fn = transforms.ToTensor()
augmentation_transforms_fn = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor()
])
[4]:
# The datasets supported are ("incremental_cifar100", "iirc_cifar100", "incremental_imagenet_full", "incremental_imagenet_lite",
# "iirc_imagenet_full", "iirc_imagenet_lite")
lifelong_datasets, tasks, class_names_to_idx = \
    get_lifelong_datasets(dataset_name = "iirc_cifar100",
                          dataset_root = "../../data", # the imagenet folder (where the train and val folders reside, or the parent directory of cifar-100-python folder
                          setup = IIRC_SETUP,
                          framework = PYTORCH,
                          tasks_configuration_id = 0,
                          essential_transforms_fn = essential_transforms_fn,
                          augmentation_transforms_fn = augmentation_transforms_fn,
                          joint = False
                         )
Creating iirc_cifar100
Setup used: IIRC
Using PyTorch
Dataset created

joint can also be set to True in case of joint training (all classes will come in one task)

The result of the previous function has the following form:

[5]:
lifelong_datasets # four splits
[5]:
{'train': <iirc.lifelong_dataset.torch_dataset.Dataset at 0x20f7746a670>,
 'intask_valid': <iirc.lifelong_dataset.torch_dataset.Dataset at 0x20f7567bf70>,
 'posttask_valid': <iirc.lifelong_dataset.torch_dataset.Dataset at 0x20f7567bfa0>,
 'test': <iirc.lifelong_dataset.torch_dataset.Dataset at 0x20f7567bfd0>}
[6]:
print(tasks[:3])
[['flowers', 'small_mammals', 'trees', 'aquatic_mammals', 'fruit_and_vegetables', 'people', 'food_containers', 'vehicles', 'large_carnivores', 'insects'], ['television', 'spider', 'shrew', 'mountain', 'hamster'], ['road', 'poppy', 'household_furniture', 'woman', 'bee']]
[7]:
print(class_names_to_idx)
{'flowers': 0, 'small_mammals': 1, 'trees': 2, 'aquatic_mammals': 3, 'fruit_and_vegetables': 4, 'people': 5, 'food_containers': 6, 'vehicles': 7, 'large_carnivores': 8, 'insects': 9, 'television': 10, 'spider': 11, 'shrew': 12, 'mountain': 13, 'hamster': 14, 'road': 15, 'poppy': 16, 'household_furniture': 17, 'woman': 18, 'bee': 19, 'tulip': 20, 'clock': 21, 'orange': 22, 'beaver': 23, 'rocket': 24, 'bicycle': 25, 'can': 26, 'squirrel': 27, 'wardrobe': 28, 'bus': 29, 'whale': 30, 'sweet_pepper': 31, 'telephone': 32, 'leopard': 33, 'bowl': 34, 'skyscraper': 35, 'baby': 36, 'cockroach': 37, 'boy': 38, 'lobster': 39, 'motorcycle': 40, 'forest': 41, 'tank': 42, 'orchid': 43, 'chair': 44, 'crab': 45, 'girl': 46, 'keyboard': 47, 'otter': 48, 'bed': 49, 'butterfly': 50, 'lawn_mower': 51, 'snail': 52, 'caterpillar': 53, 'wolf': 54, 'pear': 55, 'tiger': 56, 'pickup_truck': 57, 'cup': 58, 'reptiles': 59, 'train': 60, 'sunflower': 61, 'beetle': 62, 'apple': 63, 'palm_tree': 64, 'plain': 65, 'large_omnivores_and_herbivores': 66, 'rose': 67, 'tractor': 68, 'crocodile': 69, 'mushroom': 70, 'couch': 71, 'lamp': 72, 'mouse': 73, 'bridge': 74, 'turtle': 75, 'willow_tree': 76, 'man': 77, 'lizard': 78, 'maple_tree': 79, 'lion': 80, 'elephant': 81, 'seal': 82, 'sea': 83, 'dinosaur': 84, 'worm': 85, 'bear': 86, 'castle': 87, 'plate': 88, 'dolphin': 89, 'medium_sized_mammals': 90, 'streetcar': 91, 'bottle': 92, 'kangaroo': 93, 'snake': 94, 'house': 95, 'chimpanzee': 96, 'raccoon': 97, 'porcupine': 98, 'oak_tree': 99, 'pine_tree': 100, 'possum': 101, 'skunk': 102, 'fish': 103, 'fox': 104, 'cattle': 105, 'ray': 106, 'aquarium_fish': 107, 'cloud': 108, 'flatfish': 109, 'rabbit': 110, 'trout': 111, 'camel': 112, 'table': 113, 'shark': 114}

lifelong_datasets has four splits, where train is for training, intask_valid is for validation during task training (in case of IIRC setup, this split is using incomplete information like the train split), posttask_valid is for validation after each task training (in case of IIRC setup, this split is using complete information like the test split), and finally the test split

[ ]: