ABSTRACT

This chapter discusses data.table and pandas. SQL is a Query language designed for managing data in relational database management system (RDBMS). Some of the most popular RDBMSs include MS SQL Server, MySQL, PostgreSQL, and so on. SQL is a very powerful tool for data analysis, but it works on RDBMS and generally people can not apply R or Python functions to the database tables directly. A data.frame is just like a database table that people may operate within the corresponding language. Usually a data.frame is stored in memory, but of course it can also be deserialized for storage in hard disks. When the database table is too large, a database index is used to improve the performance of data retrieval operations. Join combines columns from one or more tables for RDBMSs. People also have the Join operation available in data.table and pandas. The chapter provides information on three different types of joins: inner join, left join, and right join.