ABSTRACT

Artificial intelligence, machine learning and deep learning are large fields of study. We briefly show how they are related, and provide definitions of key concepts which should serve as helpful background for the rest of the book. Machine learning is one approach to AI, and deep learning is a type of machine learning. Deep learning models are trained to perform a given task, using an algorithm called stochastic gradient descent (or variants thereof). Deep learning models come in different shapes, usually referred to as neural network architectures, and we describe the most common of these architectures; convolutional neural networks have been widely used in image tasks, while recurrent neural networks saw strong adoption for text processing, however the transformer architecture has massively gained mindshare in both areas, and is currently the dominant architecture, especially in large language models. Neural networks are prone to overfitting - they tend to remember the noise in the data, rather than just the signal, although this can be addressed with techniques such as dropout and regularization. Ancillary topics are also discussed.