edflow.data.dataset module

Datasets TLDR

Datasets contain examples, which can be accessed by an index:

example = Dataset[index]

Each example is annotated by labels. These can be accessed via the labels attribute of the dataset:

label = Dataset.labels[key][index]

To make a working dataset you need to implement a get_example() method, which must return a dict, a __len__() method and define the labels attribute, which must be a dict, that can be empty.

Warning

Dataset, which are specified in the edflow config must accept one positional argument config!

If you have to worry about dataloading take a look at the LateLoadingDataset. You can define datasets to return examples containing callables for heavy dataloading, which are only executed by the LateLoadingDataset. Having this class as the last in your dataset pipline can potentially speed up your data loading.

Summary

Reference