betty.hypergradient¶

Unlike traditional automatic differentiation techniques that calculate analytic Jacobian for each operation, multilevel optimization requires approximating best-response Jacobian for each level optimization problem. Below is the list of approximation techniques that are supported by Betty.

finite difference¶

betty.hypergradient.darts.darts(vector, curr, prev, sync)[source]¶

Approximate the matrix-vector multiplication with the best response Jacobian by the finite difference method. More specifically, we modified the finite difference method proposed in DARTS: Differentiable Architecture Search by re-interpreting it from the implicit differentiation perspective. Empirically, this method achieves better memory efficiency, training wall time, and test accuracy that other methods.

Parameters:

vector (Sequence of Tensor) – Vector with which matrix-vector multiplication with best-response Jacobian (matrix) would be performed.
curr (Problem) – A current level problem
prev (Problem) – A directly lower-level problem to the current problem

Returns:

(Intermediate) gradient

Return type:

Sequence of Tensor

neumann series¶

betty.hypergradient.neumann.neumann(vector, curr, prev, sync)[source]¶

Approximate the matrix-vector multiplication with the best response Jacobian by the Neumann Series as proposed in Optimizing Millions of Hyperparameters by Implicit Differentiation based on implicit function theorem (IFT). Users may specify learning rate (neumann_alpha) and unrolling steps (neumann_iterations) in Config.

Parameters:

vector (Sequence of Tensor) – Vector with which matrix-vector multiplication with best-response Jacobian (matrix) would be performed.
curr (Problem) – A current level problem
prev (Problem) – A directly lower-level problem to the current problem

Returns:

(Intermediate) gradient

Return type:

Sequence of Tensor

conjugate gradient¶

betty.hypergradient.cg.cg(vector, curr, prev, sync)[source]¶

Approximate the matrix-vector multiplication with the best response Jacobian by the (PyTorch’s) default autograd method. Users may need to specify learning rate (cg_alpha) and conjugate gradient descent iterations (cg_iterations) in Config.

Parameters:

vector (Sequence of Tensor) – Vector with which matrix-vector multiplication with best-response Jacobian (matrix) would be performed.
curr (Problem) – A current level problem
prev (Problem) – A directly lower-level problem to the current problem

Returns:

(Intermediate) gradient

Return type:

Sequence of Tensor

reinforce¶

betty.hypergradient.reinforce.reinforce(vector, curr, prev)[source]¶

Approximate the matrix-vector multiplication with the best response Jacobian by the REINFORCE method. The use of REINFORCE algorithm allows users to differentiate through optimization with non-differentiable processes such as sampling. This method has not been completely implemented yet.

Parameters:

vector (Sequence of Tensor) – Vector with which matrix-vector multiplication with best-response Jacobian (matrix) would be performed.
curr (Problem) – A current level problem
prev (Problem) – A directly lower-level problem to the current problem

Returns:

(Intermediate) gradient

Return type:

Sequence of Tensor