How Machines Learn to Discriminate

Please join us for the Information Science colloquium with guest, Solon Barocas is a Postdoctoral Research Associate at the Center for Information Technology Policy at Princeton University. His research explores issues of fairness in machine learning, methods for bringing accountability to automated decisions, the privacy implications of inference, and the role that privacy plays in mitigating economic inequality. Solon completed his doctorate in the department of Media, Culture, and Communication at New York University, where he remains an affiliate of the Information Law Institute. He also works with the Data & Society Research Institute and serves on the National Science Foundation-sponsored Council for Big Data, Ethics, and Society.

Title: How Machines Learn to Discriminate

Abstract: While machine learning might seem like a way to overcome the prejudices, implicit biases, and faulty heuristics that plague human decision-making, this talk will show that it is remarkably vulnerable to a number of problems that can render its models discriminatory. These models can inherit the prejudices of prior decision makers, reflect the widespread biases that persist in society, or discover useful regularities in a dataset that are really just preexisting patterns of exclusion and inequality.

I will start by showing that the resulting discrimination in each of these cases is unintentional, an artifact of the way machine learning works rather than conscious choices by programmers. In particular, I will trace discrimination back to difficulties in (1)determining what, exactly, machine learning should be used to infer or predict; (2) collecting datasets that offer a proper statistical representation of the constituent parts of a population; (3) assembling historical examples that have not been tainted by past or persistent prejudice; (4) considering a sufficiently rich set of factors to achieve equal rates of accuracy across different sub-populations; and (5) deciding how to deal with criteria that are highly correlated with legally proscribed features.

I will then explain why attempts to address discrimination stemming from each of these problems will be difficult, costly, or controversial. Discrimination law is unlikely to reach most of these cases and efforts to simply correct the underlying problem will only go so far. More aggressive solutions that attempt to engineer concerns with fairness into the machine learning process must relax the strict distinction between procedural fairness and distributive justice, opening these solutions to attack. Taking this into account, I will suggest practical paths forward for technologists, policymakers, and regulators.