Serialization Tutorial
In this package there are two ways of performing serialization and
deserialization: the “classic” and “new” methods. The classic method predates
the built-in JSON serialization of param
while the new method extends
the built-in serialization to new file types. The new method is still in beta.
Classic method
As an example, suppose we have parameterized classes and instances:
import param
class TrainingHyperparameters(param.Parameterized):
lr = param.Number(1e-5, doc='The learning rate')
max_epochs = param.Integer(10)
model_regex = param.String(
"model-{epoch:05d}.pkl",
doc='Regular exp for storing model weights after every epoch')
t_params = TrainingHyperparameters()
class ModelHyperparameters(param.Parameterized):
layers = param.ListSelector(
[], objects=['conv', 'fc', 'recurrent'],
doc='Sequence of layers by type, bottom-first')
activations = param.ObjectSelector('relu', objects=['tanh', 'relu'])
m_params = ModelHyperparameters()
m_params.layers = ['conv', 'conv', 'fc']
param_dict = {
'training': t_params,
'model': m_params,
}
We can serialize these easily into JSON, YAML, or INI using
pydrobert.param.serialization
:
import pydrobert.param.serialization as serial
serial.serialize_to_json('conf.json', param_dict)
serial.serialize_to_yaml('conf.yaml', param_dict) # requires ruamel.yaml or pyyaml
serial.serialize_to_ini('conf.ini', param_dict)
where we get
{
"training": {
"lr": 1e-05,
"max_epochs": 10,
"model_regex": "model-{epoch:05d}.pkl"
},
"model": {
"activations": "relu",
"layers": [
"conv",
"conv",
"fc"
]
}
}
or
training:
lr: 1e-05 # The learning rate
max_epochs: 10
model_regex: model-{epoch:05d}.pkl # Regular exp for storing model weights after every epoch
model:
activations: relu # Choices: "tanh", "relu"
layers: # Sequence of layers by type, bottom-first. Element choices: "conv", "fc", "recurrent"
- conv
- conv
- fc
or
# == Help ==
# [training]
# lr: The learning rate
# model_regex: Regular exp for storing model weights after every epoch
# [model]
# activations: Choices: "tanh", "relu"
# layers: Sequence of layers by type, bottom-first. A JSON string. Element choices: "conv", "fc", "recurrent"
[training]
lr = 1e-05
max_epochs = 10
model_regex = model-{epoch:05d}.pkl
[model]
activations = relu
layers = ["conv", "conv", "fc"]
respectively.
Deserialization proceeds similarly. Files can be used to populate parameters in existing parameterized instances.
t_params.lr = 10000.
assert t_params.lr == 10000.
serial.deserialize_from_yaml('conf.yaml', param_dict)
assert t_params.lr == 1e-05
pydrobert.param.argparse
contains convenience functions for
(de)serializing config files right from the command line.
import argparse, pydrobert.param.argparse as pargparse
parser = argparse.ArgumentParser()
pargparse.add_parameterized_read_group(parser, parameterized=param_dict)
pargparse.add_parameterized_print_group(parser, parameterized=param_dict)
Sometimes, the default (de)serialization routines are unsuited for the data. For example, INI files do not have a standard format for lists of values. For this, and many other container types, values are parsed with JSON syntax. If we wanted to parse lists differently, such as a comma-delimited list, we can design a custom serializer and deserializer for handling our layers parameter:
class CommaSerializer(serial.DefaultListSelectorSerializer):
def help_string(self, name, parameterized):
choices_help_string = super(CommaSerializer, self).help_string(name, parameterized)
return 'Elements separated by commas. ' + choices_help_string
def serialize(self, name, parameterized):
val = super(CommaSerializer, self).serialize(name, parameterized)
return ','.join(str(x) for x in val)
class CommaDeserializer(serial.DefaultListSelectorDeserializer):
def deserialize(self, name, block, parameterized):
block = block.split(',')
super(CommaDeserializer, self).deserialize(name, block, parameterized)
serial.serialize_to_ini(
'conf.ini', param_dict,
# (de)serialize by type
serializer_type_dict={param.ListSelector: CommaSerializer()},
)
serial.deserialize_from_ini(
'conf.ini', param_dict,
# or by name!
deserializer_name_dict={'model': {'layers': CommaDeserializer()}},
)
With conf.ini
:
# == Help ==
# [training]
# lr: The learning rate
# model_regex: Regular expression for storing model weights after every epoch
# [model]
# activations: Choices: "tanh", "relu"
# layers: Sequence of layers by type, bottom-first. Elements separated by commas. Element choices: "conv", "fc", "recurrent"
[training]
max_epochs = 10
model_regex = model-{epoch:05d}.pkl
lr = 1e-05
[model]
activations = relu
layers = conv,conv,fc
New method
Because (de)serialization is straightforward in most cases, the param
built-in serialization protocol matches the classic serialization protocol
above in most values for JSON:
t_params = TrainingHyperparameters()
with open("conf.json", "w") as f:
f.write(t_params.param.serialize_parameters())
yielding
{"name": "TrainingHyperparameters00002", "lr": 1e-05, "max_epochs": 10, "model_regex": "model-{epoch:05d}.pkl"}
Note the additional inclusion of the “name” parameter. Deserialization is similarly performed:
with open("conf.json") as f:
t_params = TrainingHyperparameters.param.deserialize_parameters(f.read())
Using a similar strategy as param
did for JSON, I have extended
serialization to YAML. The custom protocol requires registration once at
runtime to be used
serial.register_serializer("yaml")
Afterwards files can be read and written to in YAML.
with open("conf.yaml", "w") as f:
f.write(t_params.param.serialize_parameters(mode="yaml"))
yielding
name: TrainingHyperparameters00002 # String identifier for this object.
lr: 1e-05 # The learning rate
max_epochs: 10
model_regex: model-{epoch:05d}.pkl # Regular exp for storing model weights after every epoch
There are a few other goodies as well. Once again there are convenience functions for (de)serialization to/from different file types (including JSON)
parser = argparse.ArgumentParser()
pargparse.add_deserialization_group_to_parser(
parser, TrainingHyperparameters, 't_params')
pargparse.add_serialization_group_to_parser(parser, t_params)
namespace = parser.parse_args(['--read-json', 'conf.json'])
assert namespace.t_params.pprint() == t_params.pprint()
parser.parse_args(['--print-yaml']) # prints to stdout and exits
You’ll note that the new style does away with the dictionary of parameterized
objects. param
prefers to recreate this structure by nesting
parameterized instances as parameters. As of writing, nesting cannot be
serialized
by default in param
. pydrobert.param
offers a solution in the
form of “reckless” parsing. Once registered, the 'reckless_json'
and
'reckless_yaml'
act as drop-in replacements for the 'json'
and
'yaml'
modes which can also handle nesting. Unfortunately, they do so by
making assumptions which aren’t always correct. See
pydrobert.param.serialization.register_serializer()
for more discussion.