Software Design¶
Betty allows for an easy-to-use, modular, and maintainable programming interface for multilevel optimization (MLO) by breaking down MLO into two high-level concepts — (1) optimization problems, and (2) problem dependencies — for which we design two abstract Python classes:
Problem
class: an abstraction of optimization problems.Engine
class: an abstraction of problem dependencies.
In this chapter, we will introduce each of these concepts/classes in depth.
Problem¶
Under our abstraction, each optimization problem \(P\) in MLO is defined by the (1)
module, (2) the optimizer, (3) the data loader, (4) the sets of the upper and lower
constraining problems, (5) the loss function, (6) the problem (or optimization)
configuration, (7) the name, and (8) other optional components. The example usage of
the Problem
class is shown below:
""" Setup of module, optimizer, and data loader """
my_module, my_optimizer, my_data_loader = problem_setup()
class MyProblem(ImplicitProblem):
def training_step(self, batch):
""" Users define the loss function here """
loss = loss_fn(batch, self.module, self.other_probs, ...)
acc = get_accuracy(batch, self.module, ...)
return {'loss': loss, 'acc': acc}
""" Optimization Configuration """
config = Config(type="darts", steps=5, first_order=True, retain_graph=True)
""" Problem Instantiation """
prob = MyProblem(
name='myproblem',
module=my_module,
optimizer=my_optimizer,
train_data_loader=my_data_loader,
config=config,
device=device
)
To better understand the Problem
class, we take a deeper dive into each component.
(0) Problem type¶
Automatic differentiation for multilevel optimization can be roughly categorized into
two types: iterative differentiation (ITD) and implicit differentiation (AID). While AID
allows users to use native PyTorch modules and optimizers, ITD requires patching both
modules and optimizers to follow a functional programming paradigm. Due to this
difference, we provide separate classes IterativeProblem
and ImplicitProblem
respecitvely for ITD and AID. Empirically, we observe that AID often achieves better
memory efficiency, training wall time, and final accuracy. Thus, we highly recommend
using the ImplicitProblem
class as a default setting.
(1) Module¶
The module defines the parameters to be learned in the current optimization problem, and
corresponds to \(\theta_k\) in our mathematical formulation (Chapter). In practice, the module is usually defined using PyTorch’s
torch.nn.Module
, and is passed to the Problem
class through the constructor.
(2) Optimizer¶
The optimizer updates parameters for the above module. In practice, the optimizer is
most commonly defined using PyTorch’s torch.optim.Optimizer
, and is also passed to
the Problem
class through the constructor.
(3) Data loader¶
The data loader defines the associated training data, denoted \(\mathcal{D}_k\) in
our mathematical formulation. It is normally defined using PyTorch’s
torch.utils.data.DataLoader
, but it can be any Python Iterator
. The data loader
can also be provided through the class constructor.
(4) Upper & Lower Constraining Problem Sets¶
While the upper & lower constraining problem sets
\(\mathcal{U}_k\;\&\;\mathcal{L}_k\) are at the core of our mathematical
formulation, we don’t allow users to directly specifiy them in the Problem
class.
Rather, we design Betty so that the constraining sets are provided directly from
Engine
, the class where all problem dependencies are handled. In doing so, users
need to provide the hierarchical problem dependencies only once when they initialize
Engine
, and can avoid the potentially error-prone and cumbersome process of
provisioning constraining problems manually every time they define new problems.
(5) Loss function¶
The loss function defines the optimization objective \(\mathcal{C}_k\) in our
formulation. Unlike previous components, the loss function is defined through the
training_step
method as shown above. In addition, the training_step
method
provides an option to define other metrics (e.g. accuracy in image classification),
which can be returned with the Python dictionary. When the return type is not a Python
dictionary, the API will assume that the returned value is the loss by default.
Furthermore, the returned dictionary/value of training_step
will be automatically
logged with our logger to a visualization tool (e.g. tensorboard) as well as the
standard output stream (i.e. print in the terminal). Our training_step
method is
highly inspired by PyTorch Lightning’s training_step
method.
(6) Optimization Configuration¶
Unlike automatic differentiation in neural networks, autodiff in MLO requires
approximating gradients with, for example, implicit differentiation. Since there can be
different approximation methods and configurations, we allow users to specify all
choices through the Config
data class. In addition, Config
allows users to
specify other training details such as gradient accumulation steps, logging steps, and
fp16 training options. We provide the default value for each attribute in Config
,
so, in most cases, users will only need to specify 3-4 attributes based on their needs.
(7) Name¶
Users oftentimes need to access constraining problems
\(\mathcal{U}_k\;\&\;\mathcal{L}_k\) when defining the loss function in
training_step
. However, since constraining problems are directly provided by the
Engine
class, users lack the way to access constraining problems from the current
problem. Thus, we design the name
attribute, through which users can access other
problems in the Problem
and the Engine
classes. For example, when your MLO
involves Problem1(name='prob1', ...)
and Problem2(name='prob2', ...)
, you can
access Problem2
from Problem1
with self.prob2
.
(8) Other Optional Components¶
While not considered essential components, learning rate schedulers or parameter callbacks (e.g. parameter clipping/clamping) can optionally be provided by users as well. Interested users can refer to the API documentation for these features.
Engine¶
While Problem
manages each optimization problem, Engine
handles a dataflow graph
based on the user-provided hierarchical problem dependencies. An example usage of the
Engine
class is provided below:
class MyEngine(Engine):
@torch.no_grad()
def validation(self):
val_loss = loss_fn(self.prob1, self.prob2, test_loader)
val_acc = acc_fn(self.prob1, self.prob2, test_loader)
return {'loss': val_loss, 'acc': val_acc}
p1 = Problem1(name='prob1', ...)
p2 = Problem2(name='prob2', ...)
dependencies = {"u2l": {p1: [p2]}, "l2u": {p1: [p2]}}
engine_config = EngineConfig(train_iters=5000, valid_step=100)
engine = MyEngine(problems=[p1, p2], dependencies=dependencies, config=engine_config)
engine.run()
Here, we take a deeper look into each component of Engine
.
(1) Problems¶
Users should provide all of the involved optimization problems through the problems argument.
(2) Dependencies¶
As discussed in this section, MLO has two types of dependencies
between problems: upper-to-lower and lower-to-upper. We allow users to define two
separate graphs, one for each type of edge, using a Python dictionary, in which
keys/values respectively represent start/end nodes of the edge. When user-defined
dependency graphs are provided, Engine
compiles them and finds all paths required
for automatic differentiation with a modified depth-first search algorithm. Moreover,
Engine
determines constraining problem sets for each problem based on the dependency
graphs, as mentioned above.
(3) Validation¶
We currently allow users to define one validation stage for the whole multilevel
optimization program. This can be achieved by implementing the validation
method in
Engine
as shown above. As in the training_step
method of the Problem
class,
users can return whichever metrics they want to log with the Python dictionary.
(4) Engine Configuration¶
Users can specify several configurations for the whole multilevel optimization program, such as the total training iterations, the validation step, and the logger type.
(5) Run¶
Once all initialization steps are complete, users can run the MLO program by calling the
Engine’s run
method, which repeatedly calls step
methods of lowermost problems.
The step
methods of upper-level problems will be automatically called from the
step
methods of lower-level problems following lower-to-upper edges.
To summarize, Betty provides a PyTorch-like programming interface for defining multiple optimization problems, which can scale up to large MLO programs with complex dependencies, as well as a modular interface for a variety of best-response Jacobian algorithms, without requiring mathematical and programming proficiency.