Gradient Descender

The GradientDescender class represents a gradient descent algorithm.

Use in initialising a `NeuralNet`

When initialising a NeuralNet, the gradient descender is passed to NeuralNetFactory.

Provided Gradient Descenders

There are two gradient descenders provided by default: Adam and Stochastic gradient descent.

Adam Gradient Descender

Adam gradient descent is the go-to gradient descent algorithm for most projects. It is a complex algorithm that uses a combination of many different learning rates, which each speed up or slow down independently while learning.

To create an AdamGradientDescender with default settings, simply run

AdamGradientDescender gradientDescender = new();

This uses the settings reccommended by the creators of the algorithm. These settings work well for most projects. If you would like to set your own values, however, specify in any of the three hyper-parameters like so:

AdamGradientDescender gradientDescender = new(learningRate: ..., momentumDecay: ..., varianceDecay: ...);

With respect to the original article: - learningRate corresponds to α - momentumDecay corresponds to β1 - varianceDecay corresponds to β2

Stochastic Gradient Descender

Stochastic gradient descent (also known as vanilla mini-batch gradient descent) is the simplest gradient descent algorithm. Adam gradient descent may be faster than stochastic gradient descent in most cases, but stochastic gradient descent is a safe fall-back if you suspect adam gradient descent is not suited for your project.

For example, if you think a learning rate of 0.001 is best for your project, you can create a StochasticGradientDescender by

StochasticGradientDescender gradientDescender = new(learningRate: 0.001);

Stochastic gradient descent is far simpler than Adam gradient descent. Instead of dynamically speeding up or slowing down learning rates, stochastic gradient descent sticks with one single learning rate throughout the learning process. This means that picking the right learning rate is crucial, and will depend highly on your own project.

Making your own Gradient Descender (Technical)

If you have a gradient descent algorithm in mind, you can make your own gradient descender class.

To work with NeuralNet, and in particular, serialization, your class will have to: - inherit from the abstract class GradientDescender - implement all the hyper-parameters you need as fields - annotate all the hyper-parameters you need with SerializableHyperParameter - have a default constructor - implement Parameter GradientDescentStep(Parameter gradient)

You will need to have read Parameter and SerializableHyperParameter before reading this.

Let's implement momentum gradient descent as an example:

First, we import the required namespaces for Parameter, GradientDescender and SerializableHyperParameter respectively.

using NeuralNetLearning.Maths;
using NeuralNetLearning.Maths.GradientDescent;
using NeuralNetLearning.Serialization;

public class MomentumGradientDescender : GradientDescender
{

Next, we mark all the hyper-parameters that we want to be able to save and read in with SerializableHyperParameter. These hyper-parameters should be enough to fully reconstruct our gradient descender.

In this example, _learningRate corresponds to η in the momentum gradient descent description, and _momentumRate corresponds to γ.

    [SerializableHyperParameter("learning rate")]
    private readonly double _learningRate;

    [SerializableHyperParameter("momentum rate")]
    private readonly double _momentumRate;

    [SerializableHyperParameter("past step")]
    private Parameter _pastStep = null;

We then create our constructor for these fields. A default constructor is also required for the automatic serialization.


    public MomentumGradientDescender(double learningRate, double momentumRate)
    {
        _learningRate = learningRate;
        _momentumRate = momentumRate;
    }

    private MomentumGradientDescender()
        : this(learningRate: 0.01, momentumRate: 0.9)
    { }

Finally, we implement the momentum gradient descent algorithm in Parameter GradientDescentStep(Parameter gradient). The function takes in the cost gradient of the weights and biases as a Parameter. This Parameter holds all the cost gradients of the corresponding weight/bias entries.

GradientDescentStep(...) returns the step that should be made in parameter space to reduce cost. In other words, the return value will be added to the Parameter holding the weights and biases of the NeuralNet to reduce cost.


    public override Parameter GradientDescentStep(Parameter gradient)
    {
        if (_pastStep == null)
            _pastStep = ParameterFactory.Zero(gradient.LayerSizes);

        Parameter step = -_learningRate * gradient + _momentumRate * _pastStep;
        _pastStep = step;
        return step; // in the NeuralNet class: parameter += step;
    }
}