Generated by DocFX

Parameter (Technical)

The Parameter object is used internally by NeuralNet. Parameter encapsulates the weight matrices and the bias vectors that correspond to each layer.

Mathematical Operations

Through supporting basic mathematical operations, such as addition, scalar multiplication and component-wise operations, Parameter can be treated like a single vector in calculations. This makes it easy to implement custom gradient descent algorithms.

Example: Gradient Descent in NeuralNet

Here is an overview of how Parameter is used in NeuralNet to minimise cost through gradient descent.

The parameter object param stores the weights and biases used in the NeuralNet. These are used to calculate the layers.

In each gradient descent step, gradient stores the cost gradients of the corresponding entries in param. More explicitly, the ith entry in gradient is the cost gradient of the ith weight/bias entry in param.

step is then added to param to adjust the weights/biases and minimise cost.

The gradient descent algorithm is contained in gradientDescender. For instance, gradientDescender.GradientDescentStep(...) could use stochasic gradient descent, adam gradient descent, or any other gradient descent algorithm. To find out more, read Gradient Descender.

public class NeuralNet
{
    private Parameter param = ... ; // holds the weights and biases
    private GradientDescender gradientDescender = ... ;
    ...
    public void Fit(...)
    {
        Parameter gradient = ... ; // holds the cost gradients of the corresponding entries in `param`
        Parameter step = gradientDescender.GradientDescentStep(gradient);
        param += step;
    }
}

The way Parameters are used here abstracts away weight matrices and bias vectors. Their syntax reads like single vectors, making the algorithm clearer.

Example: Stochastic Gradient Descent

Here is an overview of a possible definition of GradientDescender.GradientDescentStep(...). This example uses the stochastic gradient descent algorithm. Following this algorithm, the cost gradient is simply multiplied by a scalar learning rate.

public class StochasticGradientDescender : GradientDescender
{
    ...
    private double _learningRate = 0.001;
    ...
    public Parameter GradientDescentStep(Parameter gradient)
    {
        Parameter step = -_learningRate * gradient;
        return step;
    }
}

(Some implementation details have been omitted: to see these, read Gradient Descender.) Again, the usage of Parameter objects abstracts away the weight matrices and bias vectors that they contain.

See the Parameter API for a list of mathematical operations, and Gradient Descender for examples on this usage.