edflow.eval.pipeline module¶
To produce consistent results we adopt the following pipeline:
Step 1: Evaluate model on a test dataset and write out all data of interest:
generated image
latent representations
Step 2: Load the generated data in a Datafolder using the EvalDataset
Step 3: Pass both the test Dataset and the Datafolder to the evaluation scripts
Sometime in the future: (Step 4): Generate a report:
latex tables
paths to videos
plots
Usage¶
The pipeline is easily setup: In you Iterator (Trainer or Evaluator) add the EvalHook and as many callbacks as you like. You can also pass no callback at all.
Warning
To use the output with edeval
you must set config=config
.
from edflow.eval.pipeline import EvalHook
def my_callback(root, data_in, data_out, config):
# Do somethin fancy with the data
results = ...
return results
class MyIterator(PyHookedModelIterator):
def __init__(self, config, root, model, **kwargs):
self.model = model
self.hooks += [EvalHook(self.dataset,
callbacks={'cool_cb': my_callback},
config=config, # Must be specified for edeval
step_getter=self.get_global_step)]
def eval_op(self, inputs):
return {'generated': self.model(inputs)}
self.step_ops(self):
return self.eval_op
Next you run your evaluation on your data using your favourite edflow command.
edflow -n myexperiment -e the_config.yaml -p path_to_project
This will create a new evaluation folder inside your project’s eval directory.
Inside this folder everything returned by your step ops is stored. In the case
above this would mean your outputs would be stored as
generated:index.something
. But you don’t need to concern yourself with
that, as the outputs can now be loaded using the EvalDataFolder
.
All you need to do is pass the EvalDataFolder the root folder in which the data
has been saved, which is the folder where you can find the
model_outputs.csv
. Now you have all the generated data easily usable at
hand. The indices of the data in the EvalDataFolder correspond to the indices
of the data in the dataset, which was used to create the model outputs. So
you can directly compare inputs, targets etc, with the outputs of your model!
If you specified a callback, this all happens automatically. Each callback
receives at least 4 parameters: The root
, where the data lives, the two
datasets data_in
, which was fed into the model and data_out
, which was
generated by the model, and the config
. You can specify additional keyword
arguments by defining them in the config under
eval_pipeline/callback_kwargs
.
Should you want to run evaluations on the generated data after it has been
generated, you can run the edeval
command while specifying the path
to the model outputs csv and the callbacks you want to run.
edeval -c path/to/model_outputs.csv -cb name1:callback1 name2:callback2
The callbacks must be supplied using name:callback
pairs. Names must be
unique as edeval
will construct a dictionary from these inputs.
If at some point you need to specify new parameters in your config or change
existing ones, you can do so exactly like you would when running the edflow
command. Simply pass the parameters you want to add/change via the commandline
like this:
edeval -c path/to/model_outputs.csv -cb name1:callback1 --key1 val1 --key/path/2 val2
Warning
Changing config parameters from the commandline adds some dangers to the eval worklow: E.g. you can change parameters which determine the construction of the generating dataset, which potentially breaks the mapping between inputs and outputs.
Summary¶
Classes:
Stores all outputs in a reusable fashion. |
|
EvalHook that disables itself when the eval op returns None. |
Functions:
Prepends kwargs of interest to a csv file as comments (#) |
|
Runs all given callbacks on the datasets |
|
Turns a list of |
|
Extracts the callbacks inside a config and returns them as dict. |
|
|
|
Returns a loader name for a given file extension |
|
Applies some heuristics to save an object. |
|
|
|
|
|
|
|
Loads all callbacks, i.e. |
|
|
|
Manages the writing process of a single datum: (1) Determine type, (2) Choose saver, (3) save. |
|
Saves the ouput of some model contained in |
|
Runs all given callbacks on the data in the |
Reference¶
-
class
edflow.eval.pipeline.
EvalHook
(datasets, sub_dir_keys=[], labels_key=None, callbacks={}, config=None, step_getter=None, keypath='step_ops')[source]¶ Bases:
edflow.hooks.hook.Hook
Stores all outputs in a reusable fashion.
-
__init__
(datasets, sub_dir_keys=[], labels_key=None, callbacks={}, config=None, step_getter=None, keypath='step_ops')[source]¶ Warning
To work with
edeval
you must specifyconfig=config
when instantiating the EvalHook.- Parameters
datasets (dict(split: DatasetMixin)) – The Datasets used for creating the new data.
sub_dir_keys (list(str)) – Keys found in
example
, which will be used to make a subdirectory for the stored example. Subdirectories are made in a nested fashion in the order of the list. The keys will be removed from the example dict and not be stored explicitly.labels_key (str) – All data behind the key found in the
example`s, will be stored in large arrays and later loaded as labels. This should be small data types like ``int`
orstr
or smallnumpy
arrays.callbacks (dict(name: str or Callable)) – All callbacks are called at the end of the epoch. Must accept root as argument as well as the generating dataset and the generated dataset and a config (in that order). Additional keyword arguments found at
eval_pipeline/callback_kwargs
will also be passed to the callbacks. You can also leave this empty and supply import paths viaconfig
.config (object, dict) – An object containing metadata. Must be dumpable by
yaml
. Usually theedflow
config. You can define callbacks here as well. These must be under the keypatheval_pipeline/callbacks
. Also you can define additional keyword arguments passed to the callbacks as described incallbacks
.step_getter (Callable) – Function which returns the global step as
int
.keypath (str) – Path in result which will be stored.
-
-
class
edflow.eval.pipeline.
TemplateEvalHook
(*args, **kwargs)[source]¶ Bases:
edflow.eval.pipeline.EvalHook
EvalHook that disables itself when the eval op returns None.
-
__init__
(*args, **kwargs)[source]¶ Warning
To work with
edeval
you must specifyconfig=config
when instantiating the EvalHook.- Parameters
datasets (dict(split: DatasetMixin)) – The Datasets used for creating the new data.
sub_dir_keys (list(str)) – Keys found in
example
, which will be used to make a subdirectory for the stored example. Subdirectories are made in a nested fashion in the order of the list. The keys will be removed from the example dict and not be stored explicitly.labels_key (str) – All data behind the key found in the
example`s, will be stored in large arrays and later loaded as labels. This should be small data types like ``int`
orstr
or smallnumpy
arrays.callbacks (dict(name: str or Callable)) – All callbacks are called at the end of the epoch. Must accept root as argument as well as the generating dataset and the generated dataset and a config (in that order). Additional keyword arguments found at
eval_pipeline/callback_kwargs
will also be passed to the callbacks. You can also leave this empty and supply import paths viaconfig
.config (object, dict) – An object containing metadata. Must be dumpable by
yaml
. Usually theedflow
config. You can define callbacks here as well. These must be under the keypatheval_pipeline/callbacks
. Also you can define additional keyword arguments passed to the callbacks as described incallbacks
.step_getter (Callable) – Function which returns the global step as
int
.keypath (str) – Path in result which will be stored.
-
-
edflow.eval.pipeline.
save_output
(root, example, index, sub_dir_keys=[], keypath='step_ops')[source]¶ Saves the ouput of some model contained in
example
in a reusable manner.- Parameters
root (str) – Storage directory
example (dict) – name: datum pairs of outputs.
index (list(int) – dataset index corresponding to example.
sub_dir_keys (list(str) – Keys found in
example
, which will be used to make a subirectory for the stored example. Subdirectories are made in a nested fashion in the order of the list. The keys will be removed from the example dict and not be stored. Directories are namekey:val
to be able to completely recover the keys. (Default value = [])
- Returns
path_dics – Name: path pairs of the saved ouputs.
Warning
Make sure the values behind the
sub_dir_keys
are compatible with the file system you are saving data on.- Return type
dict
-
edflow.eval.pipeline.
add_meta_data
(eval_root, metadata, description=None)[source]¶ Prepends kwargs of interest to a csv file as comments (#)
- Parameters
eval_root (str) – Where the meta.yaml will be written.
metadata (dict) – config like object, which will be written in the meta.yaml.
description (str) – Optional description string. Will be added unformatted as yaml multiline literal.
- Returns
meta_path – Full path of the meta.yaml.
- Return type
str
-
edflow.eval.pipeline.
save_example
(savepath, datum)[source]¶ Manages the writing process of a single datum: (1) Determine type, (2) Choose saver, (3) save.
- Parameters
savepath (str) – Where to save. Must end with .{} to put in the file ending via .format().
datum (object) – Some python object to save.
- Returns
savepath (str) – Where the example has been saved. This string has been formatted and can be used to load the file at the described location.
loader_name (str) – The name of a loader, which can be passed to the
meta.yaml
‘sloaders
entry.
-
edflow.eval.pipeline.
determine_saver
(py_obj)[source]¶ Applies some heuristics to save an object.
- Parameters
py_obj (object) – Some python object to be saved.
- Raises
NotImplementedError – If
py_obj
is of unrecognized type. Feel free to implement your own savers and publish them to edflow.
-
edflow.eval.pipeline.
determine_loader
(ext)[source]¶ Returns a loader name for a given file extension
- Parameters
ext (str) – File ending excluding the
.
. Same as what would be returned byos.path.splitext()
- Returns
name – Name of the meta loader (see
meta_loaders
.- Return type
str
- Raises
ValueError – If the file extension cannot be handled by the implemented loaders. Feel free to implement you own and publish them to
edflow
.
-
edflow.eval.pipeline.
standalone_eval_meta_dset
(path_to_meta_dir, callbacks, additional_kwargs={}, other_config=None)[source]¶ Runs all given callbacks on the data in the
EvalDataFolder
constructed from the given csv.abs- Parameters
path_to_csv (str) – Path to the csv file.
callbacks (dict(name: str or Callable)) – Import commands used to construct the functions applied to the Data extracted from
path_to_csv
.additional_kwargs (dict) – Keypath-value pairs added to the config, which is extracted from the
model_outputs.csv
. These will overwrite parameters in the original config extracted from the csv.other_config (str) – Path to additional config used to update the existing one as taken from the
model_outputs.csv
. Cannot overwrite the dataset. Only used for callbacks. Parameters in this other config will overwrite the parameters in the original config and those of the commandline arguments.
- Returns
outputs – The collected outputs of the callbacks.
- Return type
dict
-
edflow.eval.pipeline.
load_callbacks
(callbacks)[source]¶ Loads all callbacks, i.e. if the callback is given as str, will load the module behind the import path, otherwise will do nothing.
-
edflow.eval.pipeline.
apply_callbacks
(callbacks, root, in_data, out_data, config, callback_kwargs={})[source]¶ Runs all given callbacks on the datasets
in_data
andout_data
.- Parameters
callbacks (dict(name: Callable)) – List of all callbacks to apply. All callbacks must accept at least the signitatue
callback(root, data_in, data_out, config)
. If supplied via the config, additional keyword arguments are passed to the callback. These are expected under the keypatheval_pipeline/callback_kwargs
.in_data (DatasetMixin) – Dataset used to generate the content in
out_data
.out_data (DatasetMixin) – Generated data. Example i is expected to be generated using
in_data[i]
.config (dict) – edflow config dictionary.
callback_kwargs (dict) – Keyword Arguments for the callbacks.
- Returns
outputs – All results generated by the callbacks at the corresponding key.
- Return type
dict(name: callback output)
-
edflow.eval.pipeline.
cbargs2cbdict
(arglist)[source]¶ Turns a list of
name:callback
into a dict{name: callback}
-
edflow.eval.pipeline.
config2cbdict
(config)[source]¶ Extracts the callbacks inside a config and returns them as dict. Callbacks must be defined at
eval_pipeline/callback_kwargs
.- Parameters
config (dict) – A config dictionary.
- Returns
callbacks – All name:callback pairs as
dict
{name: callback}
- Return type
dict