ABSTRACT

The neuroscience, and cognitive psychology, involved are definitely out of scope for this book; but on a high level, we can assume that there are at least two prerequisites: First, that our visual system be able to build up complex representations out of lower-level ones, and second, that we have a set of concepts available we can map those high-level representations. On the other hand, assume the image were showing a house built of bricks. Maybe there'd be a different arrangement of edges, triangle-shaped, the roof. Neural networks are all about learning feature detectors, not having them programmed up-front. Naturally, then, learning a filter means having a layer type whose weights embody this logic. Pooling layers compute aggregates over neighboring pixels. The number of pixels of aggregate over in every dimension is specified in the layer constructor's first argument.