Tuesday, March 26, 2013

The Abuse of Math in the Courtroom

Mathematics in Forensic Science 

Posted: 03/25/2013 11:16 am 

Mathematics is fast becoming one of the most important techniques in crime
detection. Where once a Sherlock Holmes would have had to be content with a
magnifying glass, or a jury with gut instinct and rational discussion, now a
range of methods from probability and statistics are available to help.
Today, mathematics lies behind expert conclusions on a hundred forensic
matters from fingerprints to DNA.

Statistics can be a precious tool when identifying the patterns behind
confusing or misleading phenomena. The University of California at Berkeley
was sued for gender bias when it was observed that just 35 percent of female
applicants to graduate school were being accepted, versus 44 percent of
males. The investigators began by narrowing the problem down to six major
departments for which, combined, the inequality shifted to an even more
incriminating 46 percent of males versus just 30 percent of females. But
then, a department-by-department analysis showed the exact contrary of a
bias against women: in four of the six departments, they were actually
accepted at a higher percentage rate than males, and in the other two, the
male-female ratio was 37-34 percent and 28-24 percent, discrepancies too
small to have caused the overall appearance of inequality.

This curious problem, known as Simpson's paradox, shows up in all kinds of
situations: for example, a recent analysis of national SAT scores showed an
improvement in the average scores of every single ethnic group, yet the
overall average had not budged by a single point in 20 years. Another
example was a particular treatment of kidney stones, whose success rate in a
controlled study was higher than that of all other treatments, in spite of
doctors' observations to the contrary.

These things sound like paradoxes, yet they all happen. To explain them, we
can consider a simplified version of the Berkeley example. Imagine a tiny
school with just two departments: an engineering department that receives 10
applications from women and 50 from men, and a humanities department that
receives 30 applications from women and 10 from men. In engineering, the
school accepts 90 percent of the women and 80 percent of the men, and in
humanities, 30 percent of the women and 20 percent of the men; no gender
bias is apparent (or if anything, it goes the other way). But what this
means in total is that 9+9=18 women are accepted from the 40 female
applicants, and 40+2=42 men from the 60 male applicants -- which makes the
overall success rate of males 42/60=70 percent, whereas for females it's
just 18/40=45 percent! Now it does look like sex bias, yet we saw that it is
not. What's going on is that a hidden variable is playing an important role:
one has to take not just acceptance rates, but application numbers into
account. In the SAT score case above, the hidden variable was the number of
members of each ethnic group; the size of certain lower-scoring groups had
greatly increased over the years with respect to the higher-scoring groups,
preventing the overall average from rising even when the averages of each
group did rise. In the kidney stone example, it turned out to be the size of
the kidney stones that mattered.

Another area which can involve rather subtle mathematics is DNA
identification. For detection purposes, thirteen particular pairs of genes
are identified, amongst the many thousand that make up our DNA, and these
thirteen pairs are so varied from person to person that the estimated chance
of two people (not identical twins) having the same thirteen is just one in
400 trillion, far greater than the population of the world. Thus, when
forensic biologists have a good quality sample to work with, they can make
an unchallenged identification. But they often have to work with crime scene
samples that are very tiny, mixed, or degraded. In these cases, an
identification can be made to a given individual only with a certain
probability, and it is essential to be able to interpret this probability

A man was recently tried in San Francisco for a 30-year-old rape and murder,
on the grounds that a DNA match was found between a semen sample stored in
the cold-case files and an entry in a database of California sex offenders.
Furthermore, the crime sample was degraded, so that it would actually match
about one person in a million, roughly 300 people in the general population.
There was virtually no other evidence against the defendant. 

The defense held that with a chance in a million of a match in the general
population, running the sample through a database containing about one-third
of a million individuals led to a chance of 1 in 3 of finding a random match
to an innocent person. As for the prosecution, they cited the one in a
million figure, which runs the risk of being misinterpreted as the
defendant's chance of being innocent (the "prosecutor's fallacy"). The
trouble is that both conclusions are wrong. The defense argument ignores two
essential facts: firstly, that the 300 matching individuals are evenly
distributed in age and geography around the country, not concentrated in a
database of California sex offenders, and secondly the non-negligible
probability that the original murderer may actually have been in the
database for other offenses. For the prosecution, when using the one in a
million figure, they must specify that the DNA alone only narrows the pool
of potential murderers down to about 300 individuals, and must then use the
facts that the unique database match turned out to be to a man who shared
several characteristics with the original murderer, namely age, race
(according to an eyewitness statement), location, and being a sex offender
(whether registered or not), to narrow the field. Using these factors, the
probability of the defendant's innocence can be assessed as being less than
about one in seventy.

The fate of defendants can hinge on such calculations being made rigorously.
It is essential to examine the errors that are most frequently made, learn
to avoid them, and to establish controlled mathematical procedures that will
be valid in a court of law.

Leila Schneps and Coralie Colmez are the authors of Math on Trial: How
Numbers Get Used and Abused in the Courtroom
<http://www.amazon.com/Math-Trial-Numbers-Abused-Courtroom/dp/0465032923> ,
available now from Basic Books.