edflow.data.util.util_dsets module¶
Summary¶
Classes:
Given the root of a possibly nested folder containing datafiles and a Callable that generates the labels to the datafile from its full name, this class creates a labeled dataset. |
|
Load multiple examples which have the same label. |
Functions:
Concat n_joins random samples based on the condition that example_i[key] == example_j[key] for all i,j. |
|
Loads a dataset from the config and makes ist reasonably small. |
Reference¶
-
edflow.data.util.util_dsets.
JoinedDataset
(dataset, key, n_joins)[source]¶ Concat n_joins random samples based on the condition that example_i[key] == example_j[key] for all i,j. Key must be in labels of dataset.
-
edflow.data.util.util_dsets.
getDebugDataset
(config)[source]¶ Loads a dataset from the config and makes ist reasonably small. The config syntax works as in
getSeqDataset()
. See there for more extensive documentation.- Parameters
config (dict) –
- An edflow config, with at least the keys
debugdataset
and nested inside itdataset
,debug_length
, defining the basedataset and its size.
- Returns
A dataset based on the basedataset of the specifed length.
- Return type
SubDataset
-
class
edflow.data.util.util_dsets.
RandomlyJoinedDataset
(config)[source]¶ Bases:
edflow.data.dataset_mixin.DatasetMixin
,edflow.util.PRNGMixin
Load multiple examples which have the same label.
- Required config parameters:
- RandomlyJoinedDataset/dataset
The dataset from which to load examples.
- RandomlyJoinedDataset/key
The key of the label to join on.
- Optional config parameters:
- test_mode=False
If True, behaves deterministic.
- RandomlyJoinedDataset/n_joins=2
How many examples to load.
- RandomlyJoinedDataset/balance=False
If True and not in test_mode, sample join labels uniformly.
- RandomlyJoinedDataset/avoid_identity=True
If True and not in test_mode, never return a pair containing the same image if possible.
- The i-th example returns:
- ‘examples’
A list of examples, where each example has the same label as specified by key. If data_balancing is False, the first element of the list will be the i-th example of the dataset.
The dataset’s labels are the same as that of dataset. Be careful, examples[j] of the i-th example does not correspond to the i-th entry of the labels but to the examples[j][“index_”]-th entry.
-
property
labels
¶ Careful this can only give labels of the original item, not the joined ones. Use ‘examples[j][“index_”]’ to get the correct label index.
-
get_example
(i)[source]¶ Note
Please the documentation of
DatasetMixin
to not be confused.Add default behaviour for datasets defining an attribute
data
, which in turn is a dataset. This happens often when stacking several datasets on top of each other.The default behaviour now is to return
self.data.get_example(idx)
if possible, and otherwise revert to the original behaviour.
-
class
edflow.data.util.util_dsets.
DataFolder
(image_root, read_fn, label_fn, sort_keys=None, in_memory_keys=None, legacy=True, show_bar=False)[source]¶ Bases:
edflow.data.dataset_mixin.DatasetMixin
Given the root of a possibly nested folder containing datafiles and a Callable that generates the labels to the datafile from its full name, this class creates a labeled dataset.
A filtering of unwanted Data can be achieved by having the
label_fn
returnNone
for those specific files. The actual files are only read when__getitem__
is called.If for example
label_fn
returns a dict with the keys['a', 'b', 'c']
andread_fn
returns one with keys['d', 'e']
then the dict returned by__getitem__
will contain the keys['a', 'b', 'c', 'd', 'e', 'file_path_', 'index_']
.-
__init__
(image_root, read_fn, label_fn, sort_keys=None, in_memory_keys=None, legacy=True, show_bar=False)[source]¶ - Parameters
image_root (str) – Root containing the files of interest.
read_fn (Callable) – Given the path to a file, returns the datum as a dict.
label_fn (Callable) – Given the path to a file, returns a dict of labels. If
label_fn
returnsNone
, this file is ignored.sort_keys (list) – A hierarchy of keys by which the data in this Dataset are sorted.
in_memory_keys (list) – keys which will be collected from examples when the dataset is cached.
legacy (bool) – Use the old read ethod, where only the path to the current file is passed to the reader. The new version will see all labels, that have been previously collected.
show_bar (bool) – Show a loading bar when loading labels.
-