edflow.data.dataset module¶
Datasets TLDR¶
Datasets contain examples, which can be accessed by an index:
example = Dataset[index]
Each example is annotated by labels. These can be accessed via the
labels attribute of the dataset:
label = Dataset.labels[key][index]
To make a working dataset you need to implement a get_example() method, which must return a dict,
a __len__() method and define the labels attribute, which must
be a dict, that can be empty.
Warning
Dataset, which are specified in the edflow config must accept one
positional argument config!
If you have to worry about dataloading take a look at the
LateLoadingDataset. You can define datasets to return examples
containing callables for heavy dataloading, which are only executed by the
LateLoadingDataset. Having this class as the last in your dataset
pipline can potentially speed up your data loading.