edflow.data.util.util_dsets module

Summary

Classes:

DataFolder

Given the root of a possibly nested folder containing datafiles and a Callable that generates the labels to the datafile from its full name, this class creates a labeled dataset.

RandomlyJoinedDataset

Load multiple examples which have the same label.

Functions:

JoinedDataset

Concat n_joins random samples based on the condition that example_i[key] == example_j[key] for all i,j.

getDebugDataset

Loads a dataset from the config and makes ist reasonably small.

Reference

edflow.data.util.util_dsets.JoinedDataset(dataset, key, n_joins)[source]

Concat n_joins random samples based on the condition that example_i[key] == example_j[key] for all i,j. Key must be in labels of dataset.

edflow.data.util.util_dsets.getDebugDataset(config)[source]

Loads a dataset from the config and makes ist reasonably small. The config syntax works as in getSeqDataset(). See there for more extensive documentation.

Parameters

config (dict) –

An edflow config, with at least the keys

debugdataset and nested inside it dataset, debug_length, defining the basedataset and its size.

Returns

A dataset based on the basedataset of the specifed length.

Return type

SubDataset

class edflow.data.util.util_dsets.RandomlyJoinedDataset(config)[source]

Bases: edflow.data.dataset_mixin.DatasetMixin, edflow.util.PRNGMixin

Load multiple examples which have the same label.

Required config parameters:
RandomlyJoinedDataset/dataset

The dataset from which to load examples.

RandomlyJoinedDataset/key

The key of the label to join on.

Optional config parameters:
test_mode=False

If True, behaves deterministic.

RandomlyJoinedDataset/n_joins=2

How many examples to load.

RandomlyJoinedDataset/balance=False

If True and not in test_mode, sample join labels uniformly.

RandomlyJoinedDataset/avoid_identity=True

If True and not in test_mode, never return a pair containing the same image if possible.

The i-th example returns:
‘examples’

A list of examples, where each example has the same label as specified by key. If data_balancing is False, the first element of the list will be the i-th example of the dataset.

The dataset’s labels are the same as that of dataset. Be careful, examples[j] of the i-th example does not correspond to the i-th entry of the labels but to the examples[j][“index_”]-th entry.

__init__(config)[source]

Initialize self. See help(type(self)) for accurate signature.

property labels

Careful this can only give labels of the original item, not the joined ones. Use ‘examples[j][“index_”]’ to get the correct label index.

get_example(i)[source]

Note

Please the documentation of DatasetMixin to not be confused.

Add default behaviour for datasets defining an attribute data, which in turn is a dataset. This happens often when stacking several datasets on top of each other.

The default behaviour now is to return self.data.get_example(idx) if possible, and otherwise revert to the original behaviour.

class edflow.data.util.util_dsets.DataFolder(image_root, read_fn, label_fn, sort_keys=None, in_memory_keys=None, legacy=True, show_bar=False)[source]

Bases: edflow.data.dataset_mixin.DatasetMixin

Given the root of a possibly nested folder containing datafiles and a Callable that generates the labels to the datafile from its full name, this class creates a labeled dataset.

A filtering of unwanted Data can be achieved by having the label_fn return None for those specific files. The actual files are only read when __getitem__ is called.

If for example label_fn returns a dict with the keys ['a', 'b', 'c'] and read_fn returns one with keys ['d', 'e'] then the dict returned by __getitem__ will contain the keys ['a', 'b', 'c', 'd', 'e', 'file_path_', 'index_'].

__init__(image_root, read_fn, label_fn, sort_keys=None, in_memory_keys=None, legacy=True, show_bar=False)[source]
Parameters
  • image_root (str) – Root containing the files of interest.

  • read_fn (Callable) – Given the path to a file, returns the datum as a dict.

  • label_fn (Callable) – Given the path to a file, returns a dict of labels. If label_fn returns None, this file is ignored.

  • sort_keys (list) – A hierarchy of keys by which the data in this Dataset are sorted.

  • in_memory_keys (list) – keys which will be collected from examples when the dataset is cached.

  • legacy (bool) – Use the old read ethod, where only the path to the current file is passed to the reader. The new version will see all labels, that have been previously collected.

  • show_bar (bool) – Show a loading bar when loading labels.

get_example(i)[source]

Load the files specified in example i.