betty.hypergradient¶
Unlike traditional automatic differentiation techniques that calculate analytic Jacobian for each operation, multilevel optimization requires approximating best-response Jacobian for each level optimization problem. Below is the list of approximation techniques that are supported by Betty.
finite difference¶
- betty.hypergradient.darts.darts(vector, curr, prev, sync)[source]¶
Approximate the matrix-vector multiplication with the best response Jacobian by the finite difference method. More specifically, we modified the finite difference method proposed in DARTS: Differentiable Architecture Search by re-interpreting it from the implicit differentiation perspective. Empirically, this method achieves better memory efficiency, training wall time, and test accuracy that other methods.
- Parameters:
- Returns:
(Intermediate) gradient
- Return type:
Sequence of Tensor
neumann series¶
- betty.hypergradient.neumann.neumann(vector, curr, prev, sync)[source]¶
Approximate the matrix-vector multiplication with the best response Jacobian by the Neumann Series as proposed in Optimizing Millions of Hyperparameters by Implicit Differentiation based on implicit function theorem (IFT). Users may specify learning rate (
neumann_alpha
) and unrolling steps (neumann_iterations
) inConfig
.- Parameters:
- Returns:
(Intermediate) gradient
- Return type:
Sequence of Tensor
conjugate gradient¶
- betty.hypergradient.cg.cg(vector, curr, prev, sync)[source]¶
Approximate the matrix-vector multiplication with the best response Jacobian by the (PyTorch’s) default autograd method. Users may need to specify learning rate (
cg_alpha
) and conjugate gradient descent iterations (cg_iterations
) inConfig
.- Parameters:
- Returns:
(Intermediate) gradient
- Return type:
Sequence of Tensor
reinforce¶
- betty.hypergradient.reinforce.reinforce(vector, curr, prev)[source]¶
Approximate the matrix-vector multiplication with the best response Jacobian by the REINFORCE method. The use of REINFORCE algorithm allows users to differentiate through optimization with non-differentiable processes such as sampling. This method has not been completely implemented yet.
- Parameters:
- Returns:
(Intermediate) gradient
- Return type:
Sequence of Tensor