ABSTRACT

Categoric variables have a limited number of discrete values, such as gender, political party, country, or distinct clusters generated by machine learning. Visualizing multiple categories can be done with Venn and Euler diagrams, mosaic plots, and graphs, but all have tradeoffs. Instead, text can represent sets, using a different format to indicate each category. This text can then be used to represent elements as stacks in Venn diagrams, words in a graph, areas of names in a mosaic plot, lists of names in a bar chart, and so on. Layouts may need to be adjusted to efficiently separate text items and maintain areas to support summary-level perception. The approach can scale to represent thousands of elements, ten to 15 unique categories, and categories with up to ten classes. While many simultaneous formats may be difficult to decode, a difference in formats may be noticeable, thus indicating a difference in category values. Examples include sets of named entities, such as regions, politicians, passengers, or Pokémon; emotion words, and intrusion alarms.