Parameter (Technical)
The Parameter
object is used internally by NeuralNet
. Parameter
encapsulates the weight matrices and the bias vectors that correspond to each layer.
Mathematical Operations
Through supporting basic mathematical operations, such as addition, scalar multiplication and component-wise operations, Parameter
can be treated like a single vector in calculations. This makes it easy to implement custom gradient descent algorithms.
Example: Gradient Descent in NeuralNet
Here is an overview of how Parameter
is used in NeuralNet
to minimise cost through gradient descent.
The parameter object param
stores the weights and biases used in the NeuralNet. These are used to calculate the layers.
In each gradient descent step, gradient
stores the cost gradients of the corresponding entries in param
. More explicitly, the i
th entry in gradient
is the cost gradient of the i
th weight/bias entry in param
.
step
is then added to param
to adjust the weights/biases and minimise cost.
The gradient descent algorithm is contained in gradientDescender
. For instance, gradientDescender.GradientDescentStep(...)
could use stochasic gradient descent, adam gradient descent, or any other gradient descent algorithm. To find out more, read Gradient Descender.
public class NeuralNet
{
private Parameter param = ... ; // holds the weights and biases
private GradientDescender gradientDescender = ... ;
...
public void Fit(...)
{
Parameter gradient = ... ; // holds the cost gradients of the corresponding entries in `param`
Parameter step = gradientDescender.GradientDescentStep(gradient);
param += step;
}
}
The way Parameter
s are used here abstracts away weight matrices and bias vectors. Their syntax reads like single vectors, making the algorithm clearer.
Example: Stochastic Gradient Descent
Here is an overview of a possible definition of GradientDescender.GradientDescentStep(...)
.
This example uses the stochastic gradient descent algorithm. Following this algorithm, the cost gradient is simply multiplied by a scalar learning rate.
public class StochasticGradientDescender : GradientDescender
{
...
private double _learningRate = 0.001;
...
public Parameter GradientDescentStep(Parameter gradient)
{
Parameter step = -_learningRate * gradient;
return step;
}
}
(Some implementation details have been omitted: to see these, read Gradient Descender.)
Again, the usage of Parameter
objects abstracts away the weight matrices and bias vectors that they contain.
See the Parameter API for a list of mathematical operations, and Gradient Descender for examples on this usage.