edflow.data.dataset module¶
Datasets TLDR¶
Datasets contain examples, which can be accessed by an index:
example = Dataset[index]
Each example is annotated by labels. These can be accessed via the
labels
attribute of the dataset:
label = Dataset.labels[key][index]
To make a working dataset you need to implement a get_example()
method, which must return a dict
,
a __len__()
method and define the labels
attribute, which must
be a dict, that can be empty.
Warning
Dataset, which are specified in the edflow config must accept one
positional argument config
!
If you have to worry about dataloading take a look at the
LateLoadingDataset
. You can define datasets to return examples
containing callables for heavy dataloading, which are only executed by the
LateLoadingDataset
. Having this class as the last in your dataset
pipline can potentially speed up your data loading.