edflow.eval.pipeline module

To produce consistent results we adopt the following pipeline:

Step 1: Evaluate model on a test dataset and write out all data of interest:

  • generated image

  • latent representations

Step 2: Load the generated data in a Datafolder using the EvalDataset

Step 3: Pass both the test Dataset and the Datafolder to the evaluation scripts

Sometime in the future: (Step 4): Generate a report:

  • latex tables

  • paths to videos

  • plots

Usage

The pipeline is easily setup: In you Iterator (Trainer or Evaluator) add the EvalHook and as many callbacks as you like. You can also pass no callback at all.

Warning

To use the output with edeval you must set config=config.

from edflow.eval.pipeline import EvalHook

def my_callback(root, data_in, data_out, config):
    # Do somethin fancy with the data
    results = ...

    return results

class MyIterator(PyHookedModelIterator):

    def __init__(self, config, root, model, **kwargs):

        self.model = model

        self.hooks += [EvalHook(self.dataset,
                                callbacks={'cool_cb': my_callback},
                                config=config,  # Must be specified for edeval
                                step_getter=self.get_global_step)]

    def eval_op(self, inputs):
        return {'generated': self.model(inputs)}

    self.step_ops(self):
        return self.eval_op

Next you run your evaluation on your data using your favourite edflow command.

edflow -n myexperiment -e the_config.yaml -p path_to_project

This will create a new evaluation folder inside your project’s eval directory. Inside this folder everything returned by your step ops is stored. In the case above this would mean your outputs would be stored as generated:index.something. But you don’t need to concern yourself with that, as the outputs can now be loaded using the EvalDataFolder.

All you need to do is pass the EvalDataFolder the root folder in which the data has been saved, which is the folder where you can find the model_outputs.csv. Now you have all the generated data easily usable at hand. The indices of the data in the EvalDataFolder correspond to the indices of the data in the dataset, which was used to create the model outputs. So you can directly compare inputs, targets etc, with the outputs of your model!

If you specified a callback, this all happens automatically. Each callback receives at least 4 parameters: The root, where the data lives, the two datasets data_in, which was fed into the model and data_out, which was generated by the model, and the config. You can specify additional keyword arguments by defining them in the config under eval_pipeline/callback_kwargs.

Should you want to run evaluations on the generated data after it has been generated, you can run the edeval command while specifying the path to the model outputs csv and the callbacks you want to run.

edeval -c path/to/model_outputs.csv -cb name1:callback1 name2:callback2

The callbacks must be supplied using name:callback pairs. Names must be unique as edeval will construct a dictionary from these inputs.

If at some point you need to specify new parameters in your config or change existing ones, you can do so exactly like you would when running the edflow command. Simply pass the parameters you want to add/change via the commandline like this:

edeval -c path/to/model_outputs.csv -cb name1:callback1 --key1 val1 --key/path/2 val2

Warning

Changing config parameters from the commandline adds some dangers to the eval worklow: E.g. you can change parameters which determine the construction of the generating dataset, which potentially breaks the mapping between inputs and outputs.

Summary

Classes:

EvalHook

Stores all outputs in a reusable fashion.

TemplateEvalHook

EvalHook that disables itself when the eval op returns None.

Functions:

add_meta_data

Prepends kwargs of interest to a csv file as comments (#)

apply_callbacks

Runs all given callbacks on the datasets in_data and out_data.

cbargs2cbdict

Turns a list of name:callback into a dict {name: callback}

config2cbdict

Extracts the callbacks inside a config and returns them as dict.

decompose_name

param name

determine_loader

Returns a loader name for a given file extension

determine_saver

Applies some heuristics to save an object.

image_saver

param savepath

is_loadable

param filename

isimage

param np_arr

load_callbacks

Loads all callbacks, i.e.

main

np_saver

param savepath

save_example

Manages the writing process of a single datum: (1) Determine type, (2) Choose saver, (3) save.

save_output

Saves the ouput of some model contained in example in a reusable manner.

standalone_eval_meta_dset

Runs all given callbacks on the data in the EvalDataFolder constructed from the given csv.abs

Reference

class edflow.eval.pipeline.EvalHook(datasets, sub_dir_keys=[], labels_key=None, callbacks={}, config=None, step_getter=None, keypath='step_ops')[source]

Bases: edflow.hooks.hook.Hook

Stores all outputs in a reusable fashion.

__init__(datasets, sub_dir_keys=[], labels_key=None, callbacks={}, config=None, step_getter=None, keypath='step_ops')[source]

Warning

To work with edeval you must specify config=config when instantiating the EvalHook.

Parameters
  • datasets (dict(split: DatasetMixin)) – The Datasets used for creating the new data.

  • sub_dir_keys (list(str)) – Keys found in example, which will be used to make a subdirectory for the stored example. Subdirectories are made in a nested fashion in the order of the list. The keys will be removed from the example dict and not be stored explicitly.

  • labels_key (str) – All data behind the key found in the example`s, will be stored in large arrays and later loaded as labels. This should be small data types like ``int` or str or small numpy arrays.

  • callbacks (dict(name: str or Callable)) – All callbacks are called at the end of the epoch. Must accept root as argument as well as the generating dataset and the generated dataset and a config (in that order). Additional keyword arguments found at eval_pipeline/callback_kwargs will also be passed to the callbacks. You can also leave this empty and supply import paths via config.

  • config (object, dict) – An object containing metadata. Must be dumpable by yaml. Usually the edflow config. You can define callbacks here as well. These must be under the keypath eval_pipeline/callbacks. Also you can define additional keyword arguments passed to the callbacks as described in callbacks.

  • step_getter (Callable) – Function which returns the global step as int.

  • keypath (str) – Path in result which will be stored.

before_epoch(epoch)[source]

Sets up the dataset for the current epoch.

before_step(step, fetches, feeds, batch)[source]

Get dataset indices from batch.

after_step(step, last_results)[source]

Save examples and store label values.

at_exception(*args, **kwargs)[source]

Save all meta data. The already written data is not lost in any even if this fails.

after_epoch(epoch)[source]

Save meta data for reuse and then start the evaluation callbacks

save_meta()[source]
class edflow.eval.pipeline.TemplateEvalHook(*args, **kwargs)[source]

Bases: edflow.eval.pipeline.EvalHook

EvalHook that disables itself when the eval op returns None.

__init__(*args, **kwargs)[source]

Warning

To work with edeval you must specify config=config when instantiating the EvalHook.

Parameters
  • datasets (dict(split: DatasetMixin)) – The Datasets used for creating the new data.

  • sub_dir_keys (list(str)) – Keys found in example, which will be used to make a subdirectory for the stored example. Subdirectories are made in a nested fashion in the order of the list. The keys will be removed from the example dict and not be stored explicitly.

  • labels_key (str) – All data behind the key found in the example`s, will be stored in large arrays and later loaded as labels. This should be small data types like ``int` or str or small numpy arrays.

  • callbacks (dict(name: str or Callable)) – All callbacks are called at the end of the epoch. Must accept root as argument as well as the generating dataset and the generated dataset and a config (in that order). Additional keyword arguments found at eval_pipeline/callback_kwargs will also be passed to the callbacks. You can also leave this empty and supply import paths via config.

  • config (object, dict) – An object containing metadata. Must be dumpable by yaml. Usually the edflow config. You can define callbacks here as well. These must be under the keypath eval_pipeline/callbacks. Also you can define additional keyword arguments passed to the callbacks as described in callbacks.

  • step_getter (Callable) – Function which returns the global step as int.

  • keypath (str) – Path in result which will be stored.

before_epoch(*args, **kwargs)[source]

Sets up the dataset for the current epoch.

before_step(*args, **kwargs)[source]

Get dataset indices from batch.

after_step(step, last_results)[source]

Save examples and store label values.

after_epoch(*args, **kwargs)[source]

Save meta data for reuse and then start the evaluation callbacks

at_exception(*args, **kwargs)[source]

Save all meta data. The already written data is not lost in any even if this fails.

edflow.eval.pipeline.save_output(root, example, index, sub_dir_keys=[], keypath='step_ops')[source]

Saves the ouput of some model contained in example in a reusable manner.

Parameters
  • root (str) – Storage directory

  • example (dict) – name: datum pairs of outputs.

  • index (list(int) – dataset index corresponding to example.

  • sub_dir_keys (list(str) – Keys found in example, which will be used to make a subirectory for the stored example. Subdirectories are made in a nested fashion in the order of the list. The keys will be removed from the example dict and not be stored. Directories are name key:val to be able to completely recover the keys. (Default value = [])

Returns

path_dics – Name: path pairs of the saved ouputs.

Warning

Make sure the values behind the sub_dir_keys are compatible with the file system you are saving data on.

Return type

dict

edflow.eval.pipeline.add_meta_data(eval_root, metadata, description=None)[source]

Prepends kwargs of interest to a csv file as comments (#)

Parameters
  • eval_root (str) – Where the meta.yaml will be written.

  • metadata (dict) – config like object, which will be written in the meta.yaml.

  • description (str) – Optional description string. Will be added unformatted as yaml multiline literal.

Returns

meta_path – Full path of the meta.yaml.

Return type

str

edflow.eval.pipeline.save_example(savepath, datum)[source]

Manages the writing process of a single datum: (1) Determine type, (2) Choose saver, (3) save.

Parameters
  • savepath (str) – Where to save. Must end with .{} to put in the file ending via .format().

  • datum (object) – Some python object to save.

Returns

  • savepath (str) – Where the example has been saved. This string has been formatted and can be used to load the file at the described location.

  • loader_name (str) – The name of a loader, which can be passed to the meta.yaml ‘s loaders entry.

edflow.eval.pipeline.determine_saver(py_obj)[source]

Applies some heuristics to save an object.

Parameters

py_obj (object) – Some python object to be saved.

Raises

NotImplementedError – If py_obj is of unrecognized type. Feel free to implement your own savers and publish them to edflow.

edflow.eval.pipeline.determine_loader(ext)[source]

Returns a loader name for a given file extension

Parameters

ext (str) – File ending excluding the .. Same as what would be returned by os.path.splitext()

Returns

name – Name of the meta loader (see meta_loaders .

Return type

str

Raises

ValueError – If the file extension cannot be handled by the implemented loaders. Feel free to implement you own and publish them to edflow.

edflow.eval.pipeline.decompose_name(name)[source]
Parameters

name

edflow.eval.pipeline.is_loadable(filename)[source]
Parameters

filename

edflow.eval.pipeline.isimage(np_arr)[source]
Parameters

np_arr

edflow.eval.pipeline.image_saver(savepath, image)[source]
Parameters
  • savepath

  • image

edflow.eval.pipeline.np_saver(savepath, np_arr)[source]
Parameters
  • savepath

  • np_arr

edflow.eval.pipeline.standalone_eval_meta_dset(path_to_meta_dir, callbacks, additional_kwargs={}, other_config=None)[source]

Runs all given callbacks on the data in the EvalDataFolder constructed from the given csv.abs

Parameters
  • path_to_csv (str) – Path to the csv file.

  • callbacks (dict(name: str or Callable)) – Import commands used to construct the functions applied to the Data extracted from path_to_csv.

  • additional_kwargs (dict) – Keypath-value pairs added to the config, which is extracted from the model_outputs.csv. These will overwrite parameters in the original config extracted from the csv.

  • other_config (str) – Path to additional config used to update the existing one as taken from the model_outputs.csv . Cannot overwrite the dataset. Only used for callbacks. Parameters in this other config will overwrite the parameters in the original config and those of the commandline arguments.

Returns

outputs – The collected outputs of the callbacks.

Return type

dict

edflow.eval.pipeline.load_callbacks(callbacks)[source]

Loads all callbacks, i.e. if the callback is given as str, will load the module behind the import path, otherwise will do nothing.

edflow.eval.pipeline.apply_callbacks(callbacks, root, in_data, out_data, config, callback_kwargs={})[source]

Runs all given callbacks on the datasets in_data and out_data.

Parameters
  • callbacks (dict(name: Callable)) – List of all callbacks to apply. All callbacks must accept at least the signitatue callback(root, data_in, data_out, config). If supplied via the config, additional keyword arguments are passed to the callback. These are expected under the keypath eval_pipeline/callback_kwargs.

  • in_data (DatasetMixin) – Dataset used to generate the content in out_data.

  • out_data (DatasetMixin) – Generated data. Example i is expected to be generated using in_data[i].

  • config (dict) – edflow config dictionary.

  • callback_kwargs (dict) – Keyword Arguments for the callbacks.

Returns

outputs – All results generated by the callbacks at the corresponding key.

Return type

dict(name: callback output)

edflow.eval.pipeline.cbargs2cbdict(arglist)[source]

Turns a list of name:callback into a dict {name: callback}

edflow.eval.pipeline.config2cbdict(config)[source]

Extracts the callbacks inside a config and returns them as dict. Callbacks must be defined at eval_pipeline/callback_kwargs.

Parameters

config (dict) – A config dictionary.

Returns

callbacks – All name:callback pairs as dict {name: callback}

Return type

dict

edflow.eval.pipeline.main()[source]