ABSTRACT

Let's recall the network we built a few chapters ago. Its purpose was regression, but its method was not linear. Instead, an activation function (ReLU, for “rectified linear unit”) introduced a nonlinearity, located between the single hidden layer and the output layer. The “layers”, in this original implementation, were just tensors: weights and biases. You won't be surprised to hear that these will be replaced by modules.