ABSTRACT

The increased use of data in the legal system has been much anticipated with both positive and negative effects expected. Most of the applications and systems anticipated to transform the legal system are driven by data in one form or another, and creating custom datasets to be used as inputs is frequently prohibitively expensive. Beyond the availability of legal documents, there are also often limits on the availability of information on analysis that is done, especially that which is carried out by companies. Human languages have syntactic ambiguity in their structure, which has no legal purpose, and it may be advantageous to find ways to remove it. Sampling is the process of collecting and selecting what data points will be included in analysis. Case law is one of the most commonly analyzed sources of data in the legal space. To date legislation is a less popular source of data for analysis than case law.