ABSTRACT

In 2001, Enron Corporation filed for bankruptcy. With the related legal investigation in the accounting fraud and corruption, the Federal Energy Regulatory Commission has made public a large set of email messages concerning the corporation. This data set is known as the Enron corpus, and contains over 600,000 messages that belong to 158 users, mostly senior management of Enron. After removing duplicates, there are about 200,000 messages. This data set is valuable for researchers interested in how emails are used in an organization and better understanding of organization structure. If we represent each user as a node, and create an edge between two nodes when there exists sufficient email correspondence between the two corresponding individuals, then we arrive at a data graph, or a social network.