Federated learning for natural language processing

doi:10.1201/9781003244332-1

ABSTRACT

A key factor in the performance of a machine learning model is the availability of a large-enough training set. Increasing the size of the data available for training makes these models more complex without fear that they will be underfitted. In real practice, it often turns out that the data required to train a model are distributed among several agents. Each agent does not have enough data to train the model at the appropriate level of performance, but there is no way to combine these data since data access is hampered (or even impossible) due to legal, proprietary, or ethical reasons. Federated learning paradigms are used to solve this type of issue. This chapter discusses the basics of federated learning and its application to categorization.