ABSTRACT

Python Script #!/usr/local/bin/python import re sickle_match = re.compile(‘D57’) lst = (“mort96us.dat”,”mort99us.dat”,”mort02us.dat”,”mort04us.dat”) for file in lst: intext = open(file, “r”) popcount = 0 counter = 0 codesection = “” for line in intext: if file == lst[0]: codesection = line[448:588] if file == lst[1]: codesection = line[161:301] if file == lst[2]: codesection = line[162:302] if file == lst[3]: codesection = line[164:304] popcount = popcount + 1 p = sickle_match.search(codesection) if p: counter = counter + 1 intext.close rate = float(counter) / float(popcount) * 100000 rate = str(rate) rate = rate[0:5] print (‘\n\nRecords listing sickle cell is ‘) print (str(counter) + ‘ in ‘ + file + ‘ file’)

print (‘\nSickle cell rate per 100,000 records is ‘) print(str(rate) + ‘ in ‘ + file + ‘ file’) exit

Ruby Script #!/usr/local/bin/ruby filearray = Array.new filearray = “mort96us.dat mort99us.dat mort02us.dat mort04us.dat”. split filearray.each do |file| text = File.open(file, “r”) counter = 0; popcount = 0; text.each_line do |line| codesection = line[448,140] if (file == filearray.fetch(0)) codesection = line[161,140] if (file == filearray.fetch(1)) codesection = line[162,140] if (file == filearray.fetch(2)) codesection = line[164,140] if (file == filearray.fetch(3)) popcount = popcount +1 counter = (counter + 1) if (codesection =~ /D57/i) end text.close rate = ((counter.to_f / popcount.to_f) * 100000).to_s[0,5] puts “\nRecords listing sickle cell is #{counter} in #{file} file” puts “Sickle cell rate per 100,000 records is #{rate} in #{file} file” end exit

The script that parses through about 5 GB of CDC records and compiles the following results:

In 1996, U.S. cases with sickle cell disease in death certificates is 708 In 1996, U.S. rate of sickle cell disease in death certificates is 30.54 per 100,000 In 1999, U.S. cases with sickle cell disease in death certificates is 799 In 1999, U.S. rate of sickle cell disease in death certificates is 33.36 per 100,000 In 2002, U.S. cases with sickle cell disease in death certificates is 827 In 2002, U.S. rate of sickle cell disease in death certificates is 33.79 per 100,000 In 2004, U.S. cases with sickle cell disease in death certificates is 876 In 2004, U.S. rate of sickle cell disease in death certificates is 36.47 per 100,000

For all four years examined, there has been a steady, increasing trend in the number of death certificates listing sickle cell disease as a cause of death or a significant condition

at the time of death. Likewise, the overall rate (per 100,000 certificates) has steadily increased in every sampled year, covering 1996 to 2004.