Software Design =============== Betty allows for an easy-to-use, modular, and maintainable programming interface for multilevel optimization (MLO) by breaking down MLO into two high-level concepts --- (1) optimization problems, and (2) problem dependencies --- for which we design two abstract Python classes: - ``Problem`` class: an abstraction of optimization problems. - ``Engine`` class: an abstraction of problem dependencies. In this chapter, we will introduce each of these concepts/classes in depth. Problem ------- Under our abstraction, each optimization problem :math:`P` in MLO is defined by the (1) module, (2) the optimizer, (3) the data loader, (4) the sets of the upper and lower constraining problems, (5) the loss function, (6) the problem (or optimization) configuration, (7) the name, and (8) other optional components. The example usage of the ``Problem`` class is shown below: .. code:: python """ Setup of module, optimizer, and data loader """ my_module, my_optimizer, my_data_loader = problem_setup() class MyProblem(ImplicitProblem): def training_step(self, batch): """ Users define the loss function here """ loss = loss_fn(batch, self.module, self.other_probs, ...) acc = get_accuracy(batch, self.module, ...) return {'loss': loss, 'acc': acc} """ Optimization Configuration """ config = Config(type="darts", steps=5, first_order=True, retain_graph=True) """ Problem Instantiation """ prob = MyProblem( name='myproblem', module=my_module, optimizer=my_optimizer, train_data_loader=my_data_loader, config=config, device=device ) To better understand the ``Problem`` class, we take a deeper dive into each component. (0) Problem type ~~~~~~~~~~~~~~~~ Automatic differentiation for multilevel optimization can be roughly categorized into two types: iterative differentiation (ITD) and implicit differentiation (AID). While AID allows users to use native PyTorch modules and optimizers, ITD requires patching both modules and optimizers to follow a functional programming paradigm. Due to this difference, we provide separate classes ``IterativeProblem`` and ``ImplicitProblem`` respecitvely for ITD and AID. Empirically, we observe that AID often achieves better memory efficiency, training wall time, and final accuracy. Thus, we highly recommend using the ``ImplicitProblem`` class as a default setting. (1) Module ~~~~~~~~~~ The module defines the parameters to be learned in the current optimization problem, and corresponds to :math:`\theta_k` in our mathematical formulation (:doc:`Chapter `). In practice, the module is usually defined using PyTorch's ``torch.nn.Module``, and is passed to the ``Problem`` class through the constructor. (2) Optimizer ~~~~~~~~~~~~~ The optimizer updates parameters for the above module. In practice, the optimizer is most commonly defined using PyTorch's ``torch.optim.Optimizer``, and is also passed to the ``Problem`` class through the constructor. (3) Data loader ~~~~~~~~~~~~~~~ The data loader defines the associated training data, denoted :math:`\mathcal{D}_k` in our mathematical formulation. It is normally defined using PyTorch's ``torch.utils.data.DataLoader``, but it can be any Python ``Iterator``. The data loader can also be provided through the class constructor. (4) Upper & Lower Constraining Problem Sets ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ While the upper & lower constraining problem sets :math:`\mathcal{U}_k\;\&\;\mathcal{L}_k` are at the core of our mathematical formulation, we don't allow users to directly specifiy them in the ``Problem`` class. Rather, we design Betty so that the constraining sets are provided directly from ``Engine``, the class where all problem dependencies are handled. In doing so, users need to provide the hierarchical problem dependencies only once when they initialize ``Engine``, and can avoid the potentially error-prone and cumbersome process of provisioning constraining problems manually every time they define new problems. (5) Loss function ~~~~~~~~~~~~~~~~~ The loss function defines the optimization objective :math:`\mathcal{C}_k` in our formulation. Unlike previous components, the loss function is defined through the ``training_step`` method as shown above. In addition, the ``training_step`` method provides an option to define other metrics (e.g. accuracy in image classification), which can be returned with the Python dictionary. When the return type is not a Python dictionary, the API will assume that the returned value is the loss by default. Furthermore, the returned dictionary/value of ``training_step`` will be automatically logged with our logger to a visualization tool (e.g. tensorboard) as well as the standard output stream (i.e. print in the terminal). Our ``training_step`` method is highly inspired by `PyTorch Lightning's `_ ``training_step`` method. (6) Optimization Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Unlike automatic differentiation in neural networks, autodiff in MLO requires approximating gradients with, for example, implicit differentiation. Since there can be different approximation methods and configurations, we allow users to specify all choices through the ``Config`` data class. In addition, ``Config`` allows users to specify other training details such as gradient accumulation steps, logging steps, and fp16 training options. We provide the default value for each attribute in ``Config``, so, in most cases, users will only need to specify 3-4 attributes based on their needs. (7) Name ~~~~~~~~ Users oftentimes need to access constraining problems :math:`\mathcal{U}_k\;\&\;\mathcal{L}_k` when defining the loss function in ``training_step``. However, since constraining problems are directly provided by the ``Engine`` class, users lack the way to access constraining problems from the current problem. Thus, we design the ``name`` attribute, through which users can access other problems in the ``Problem`` and the ``Engine`` classes. For example, when your MLO involves ``Problem1(name='prob1', ...)`` and ``Problem2(name='prob2', ...)``, you can access ``Problem2`` from ``Problem1`` with ``self.prob2``. (8) Other Optional Components ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ While not considered essential components, learning rate schedulers or parameter callbacks (e.g. parameter clipping/clamping) can optionally be provided by users as well. Interested users can refer to the API documentation for these features. Engine ------ While ``Problem`` manages each optimization problem, ``Engine`` handles a dataflow graph based on the user-provided hierarchical problem dependencies. An example usage of the ``Engine`` class is provided below: .. code:: python class MyEngine(Engine): @torch.no_grad() def validation(self): val_loss = loss_fn(self.prob1, self.prob2, test_loader) val_acc = acc_fn(self.prob1, self.prob2, test_loader) return {'loss': val_loss, 'acc': val_acc} p1 = Problem1(name='prob1', ...) p2 = Problem2(name='prob2', ...) dependencies = {"u2l": {p1: [p2]}, "l2u": {p1: [p2]}} engine_config = EngineConfig(train_iters=5000, valid_step=100) engine = MyEngine(problems=[p1, p2], dependencies=dependencies, config=engine_config) engine.run() Here, we take a deeper look into each component of ``Engine``. (1) Problems ~~~~~~~~~~~~ Users should provide all of the involved optimization problems through the `problems` argument. (2) Dependencies ~~~~~~~~~~~~~~~~ As discussed in :doc:`this section `, MLO has two types of dependencies between problems: upper-to-lower and lower-to-upper. We allow users to define two separate graphs, one for each type of edge, using a Python dictionary, in which keys/values respectively represent start/end nodes of the edge. When user-defined dependency graphs are provided, ``Engine`` compiles them and finds all paths required for automatic differentiation with a modified depth-first search algorithm. Moreover, ``Engine`` determines constraining problem sets for each problem based on the dependency graphs, as mentioned above. (3) Validation ~~~~~~~~~~~~~~ We currently allow users to define one validation stage for the *whole* multilevel optimization program. This can be achieved by implementing the ``validation`` method in ``Engine`` as shown above. As in the ``training_step`` method of the ``Problem`` class, users can return whichever metrics they want to log with the Python dictionary. (4) Engine Configuration ~~~~~~~~~~~~~~~~~~~~~~~~ Users can specify several configurations for the whole multilevel optimization program, such as the total training iterations, the validation step, and the logger type. (5) Run ~~~~~~~ Once all initialization steps are complete, users can run the MLO program by calling the Engine's ``run`` method, which repeatedly calls ``step`` methods of lowermost problems. The ``step`` methods of upper-level problems will be automatically called from the ``step`` methods of lower-level problems following lower-to-upper edges. To summarize, Betty provides a PyTorch-like programming interface for defining multiple optimization problems, which can scale up to large MLO programs with complex dependencies, as well as a modular interface for a variety of best-response Jacobian algorithms, without requiring mathematical and programming proficiency.