Commentary The legal doctrine that will be key to preventing AI discrimination Chiraag Bains September 13, 2024

Spread the love

As developments in artificial intelligence (AI) accelerate, advocates have rightly focused on the technology’s potential to cause or exacerbate discrimination. Policymakers are considering a host of protections, from disclosure and audit requirements for AI products to prohibitions on the use of AI in sensitive contexts. At the top of their list should be an underappreciated but vital measure: legal liability for discrimination under the doctrine of disparate impact.

Disparate impact laws allow people to sue for discrimination based on race, sex, or another protected characteristic without having to prove that a decisionmaker intended to discriminate against them. This form of liability will be critical to preventing discrimination in a world where high-stakes decisions are increasingly made by complex algorithms. But current disparate impact protection is not up to the task. It is found in a patchwork of federal statutes, many of which the courts have weakened over the years.

To protect Americans from algorithmic discrimination—from the workplace to the marketplace, from health care to the criminal justice system—Congress should pass a new disparate impact law that covers any use of AI that impacts people’s rights and opportunities.

AI can and does produce discriminatory results

AI works by using algorithms (i.e., instructions for computers) to process and identify patterns in large amounts of data (“training”), and then use those patterns to make predictions or decisions when given new information.

Researchers and technologists have repeatedly demonstrated that algorithmic systems can produce discriminatory outputs. Sometimes, this is a result of training on unrepresentative data. In other cases, an algorithm will find and replicate hidden patterns of human discrimination it finds in the training data. Examples abound:

In 2017, Amazon scrapped a resume-screening algorithm after it disproportionately filtered out female applicants. The system had been trained on 10 years of resumes previously submitted to Amazon. It identified and then replicated a pattern of the company preferring men, downgrading resumes with indications that the applicant was a woman, such as references to having played on a women’s sports team or graduated from a women’s college. Other companies that use screening algorithms are currently facing legal scrutiny. For example, Workday is defending against a lawsuit alleging that its screening software discriminates against job applicants based on race, age, and disability.
In a landmark 2017 study, technologists Joy Boulamwini and Timnit Gebru evaluated three commercial facial recognition tools’ performance at identifying the gender of diverse people in images. The tools had nearly perfect accuracy in classifying lighter-skinned men, but error rates as high as 34% for darker-skinned women. A National Institute of Standards and Technology study of 189 algorithms later corroborated this research, using 18 million photographs from law enforcement and immigration databases. It found the tools were 10 to 100 times more likely to return false positives—that is, incorrectly match two images of different people—for East Asian and Black faces than for white faces. The study also found elevated false positive rates for Native American, Black, and Asian American people when analyzing mugshot images. These disparate failure rates likely resulted from training datasets that underrepresented women and people of color. The consequences of facial recognition failure can be severe: Reporters have documented repeated instances of Black men being arrested for crimes they did not commit based on a facial recognition “match.”
The health innovation company Optum created a widely used algorithm for hospitals to identify which patients would benefit from additional care over time. In 2019, Ziad Obermeyer and his fellow researchers discovered that the algorithm vastly understated the needs of Black patients. This occurred because it used health care costs as a proxy for illness. Black patients generated lower costs—fewer surgeries and fewer specialist visits, likely due to lower-quality health care—than equally sick white patients, so the algorithm assumed they needed less care.
A 2021 analysis by The Markup of mortgage lenders who used underwriting algorithms found that the lenders were far more likely to reject applicants of color than white applicants: 40% more likely for Latino or Hispanic Americans, 50% more likely for Asian Americans and Pacific Islanders, 70% more likely for Native Americans, and 80% more likely for Black Americans. “In every case, the prospective borrowers of color looked almost exactly the same on paper as the White applicants, except for their race,” according to the investigation. The lenders used proprietary, closed software; applicants had no visibility into how the algorithms worked.

Importantly, these examples predate today’s most powerful generative AI systems: large language models (LLMs) such as OpenAI’s GPT-4, Anthropic’s Claude, and Meta’s Llama, as well as the commercial applications that are being built on top of them. These systems can perform more complicated tasks, such as analyzing huge amounts of text and data, writing code, communicating decisions that simulate authoritative human decisionmakers, and creating audio and video outputs. They are trained on far more data, have more sophisticated algorithms, and use much more computing power. As a result, they could be even better at identifying and replicating—or “baking in”—patterns of discrimination.

Developers have tried to mitigate bias through post-training methods such as fine-tuning and reinforced learning from human feedback. But early research indicates that LLMs indeed produce stereotyped and otherwise biased results. For instance, in one study LLMs tended to associate successful women with empathy and patience, and successful men with knowledge and intelligence. In another, the model associated Muslims with terrorism and violence. In still another, LLMs associated the word “black” with weapons and terms connoting guilt, while associating the word “white” with innocuous items and terms connoting innocence.

Scenario-based testing has suggested how such associations could cause discrimination as LLMs are integrated into real-world decision systems. For example, Stanford Law School researchers recently tested whether LLMs would provide different advice for navigating common life interactions based on whether the scenario involved a Black person or a white person. In most of the scenarios, the models’ responses disadvantaged people with names typically perceived as Black. One notable example involved buying a used bike or a used car: The models suggested offering a higher price if the seller’s name sounded white than if the seller’s name sounded Black.

In another study, a team of computer science and medical researchers fed GPT-4 patient vignettes used to train medical clinicians—varying the race and gender of the patient, holding symptoms and other inputs constant, and then asking the model for likely diagnoses. They found that the model “perpetuates stereotypes about demographic groups when providing diagnostic and treatment recommendations.” For example, GPT-4 was less likely to recommend MRIs and other advanced imaging for Black patients than for white patients. It was also less likely to recommend stress testing for female cardiology patients than male cardiology patients. When human cardiologists evaluated the same vignette, “there were no significant differences in assessment of stress testing importance by patient gender”—indicating the model treated women differently without any medical basis.

Understanding disparate impact law

Anti-discrimination protections broadly come in two forms: prohibitions on disparate treatment based on protected characteristics, and prohibitions on disparate impact based on those characteristics. Depending on the statute, protected characteristics can include race, color, national origin, religion, sex (including gender identity and sexual orientation), familial status, disability, age, and marital status.

“Disparate treatment” refers to intentional discrimination in which a decisionmaker treats some people worse than others—whether explicitly or covertly, whether with hostility or without—based on a protected characteristic. For example, a landlord who advertises that units are not available to families with children (explicit familial status discrimination) or an employer who accepts applications from anyone but only hires white applicants (covert racial discrimination). Civil rights statutes broadly prohibit these forms of intentional discrimination.

“Disparate impact” refers to unintentional discrimination that occurs when a “facially neutral” practice—i.e., something that does not appear, on the surface, to discriminate—ends up disproportionately hurting a class of people, and that adverse impact can’t be justified. Both of those elements are critical: the focus on impact regardless of intent and the question of whether the disparate impact can be justified or not. Although the rules vary by statute and jurisdiction, disparate impact claims usually involve what the legal community calls a “burden-shifting analysis” along these lines:

Adverse impact: A plaintiff shows, generally through a statistical disparity, that a practice disproportionately hurts people who share a particular protected characteristic.
Legitimate interest: The defendant shows that the practice is necessary to serve a substantial and legitimate interest.
Less discriminatory alternative: The plaintiff shows that the defendant’s interest could be satisfied by a less discriminatory alternative.

After showing an adverse impact, the plaintiff will prevail if there is no valid interest for the challenged practice or if a less discriminatory alternative exists. Otherwise, the defendant wins.

Consider an employment example: A woman applies for a position in a warehouse and is rejected because she cannot meet the requirement that workers be able to lift 50 pounds of weight. The requirement is gender neutral on its face because it applies to all workers. However, the job applicant shows that it has the effect of disproportionately disqualifying women. The employer responds that the requirement is job-related because employees move heavy boxes all day. The woman then establishes that most of the packages in the warehouse are under 25 pounds, or that employees can use machines to lift them. So, either the requirement is not necessary to serve a legitimate interest, or at a minimum, that interest could be served by lowering the requirement and not unnecessarily excluding women.

Or take a housing example: Suppose a landlord screens rental applicants for criminal history and rejects anyone with an arrest or conviction record. A Black applicant sues under the Fair Housing Act, arguing that the practice disproportionately excludes Black residents, given disparate arrest and prosecution rates in that area. The landlord asserts an interest in safety. The applicant could prevail by showing that an outright ban on criminal history—including wrongful arrests, charges that are dropped, minor offenses, decades-old convictions—does not actually advance the landlord’s interest. The applicant could point to the less discriminatory alternative of considering the details of a prospective tenant’s specific record.

This burden-shifting legal architecture can seem complex. But the gist of disparate impact law is simple: Practices should not unnecessarily harm people based on race, gender, or any other protected trait.

Disparate impact liability helps root out discrimination that is unintentional but unjustified—precisely the risk with AI

Why do we need disparate impact liability? One reason is that it helps smoke out intentional discrimination. Bad actors are savvy about hiding their intent to discriminate; requiring them to identify a legitimate interest for a challenged practice will expose their discrimination if they don’t have one or if they cite an interest that is obviously a cover for bias. Inquiring about less discriminatory alternatives can reveal the ease with which discrimination could have been avoided.

The more salient reason—especially in the context of AI—is that disparate impact allows us to address the vast amount of harmful and unnecessary discrimination that results from thoughtless or overly broad policies. Recall the warehouse’s unnecessary and sexist job requirement or the landlord’s ban on tenants with a criminal history in the above examples. Or to use a disability example, consider a company that sets up an online-only job application through a portal that is not accessible to screen readers, thereby excluding candidates who are blind or have low vision.

Often, the unintentional discrimination results from the accumulation of disadvantage rooted in a history of outright subordination and divestment. With respect to race, this phenomenon is known as “structural racism.” Discriminatory policing and prosecution have contributed to the disproportionate rate at which Black Americans have criminal records, causing a racially disparate impact when employers or landlords reject applicants with any criminal history. Racially restrictive real estate covenants and redlining by mortgage lenders depressed Black homeownership and home valuations, inhibiting Black Americans’ ability to build good credit and obtain loans at favorable rates. Generations of discrimination in education denied career opportunities to Black people, women, and others, reinforcing intergenerational disparities in income and wealth generation.

Disparate impact liability helps combat these harmful outcomes and interrupt structural racism and other forms of systemic discrimination and compounded disadvantage. It is designed to ensure that irrelevant or inappropriate characteristics are not used to deny equal opportunity. In this way, disparate impact helps us ensure that policies and practices—in hiring, housing, lending, education, and beyond—are fair for all people. This should not be confused with affirmative action, which still exists in certain narrow employment and contracting circumstances and has a different function and purpose: using protected characteristics to remedy the effects of past discrimination.

Disparate impact allows us to prevent and address algorithmic discrimination

When algorithms make decisions (or produce analyses or recommendations that humans use to make decisions), identifying intentional discrimination can be particularly challenging. Intent is an expression of will that we ascribe to human beings. Inanimate machines are not capable of intent. They simply do what their designers and operators instruct them do, even if the instruction is not deterministic. That means that disparate treatment claims often will fail in challenging AI systems’ discriminatory outputs. Disparate impact claims will be critical.

To be clear, some algorithmic discrimination can indeed be attacked under disparate treatment law. If a developer programs a system to treat people differently based on a protected characteristic—or to be able to do so—that is intentional discrimination. In 2022, the Justice Department sued Meta for disparate treatment in violation of the Fair Housing Act because its ad-targeting tool deliberately classified users based on race, color, religion, sex, disability, familial status, and national origin, and Meta allowed advertisers to target housing ads to users based on those characteristics. Similarly, Amazon’s aforementioned resume-screening algorithm intentionally discriminated against women; although Amazon did not expect it, the algorithm looked for indications that applicants were women and downgraded them on that basis. That’s explicit classification based on sex, which the law would regard as disparate treatment. Disparate treatment claims may also find success when a company knows its algorithmic system causes disparities yet it chooses to continue using it; one may be able to infer that the company intends the discriminatory result.

But most algorithmic decisionmaking systems are not designed with the goal of producing discriminatory outputs. The facial recognition companies discussed above wanted to be able to tout as close to 100% accuracy as possible. Optum and the hospitals using its model would have preferred accurate assessments of health care needs. These companies probably did not want the tools to discriminate, and the tools did not expressly rely on race or gender.

In these cases, having to prove disparate treatment could frustrate relief for victims of discrimination. Disparate impact, on the other hand, would afford them a fair shot at redress.

Disparate impact liability is all the more crucial because today’s transformer-based AI models are still black boxes. The developers of GPT-4, Claude, Llama, and similar models don’t understand exactly how the models produce sophisticated and creative answers to inquiries. The complexity and opacity of the systems’ inner workings mean that we won’t know whether or how they are considering protected characteristics. It is hard to establish that discrimination is intentional if there is no transparency into how and why decisions were made.

If such technology is integrated into creditworthiness assessments, reviews of employment and housing applications, assessments of dangerousness for pretrial release, criminal sentencing decisions, or other rights-impacting processes, people who are harmed will not be able to prove disparate treatment. Disparate impact analysis, by contrast, will allow them to look at aggregate outputs to examine whether some people are treated worse because of their identity characteristics.

Disparate impact liability also creates the right incentives for AI developers and deployers. It will not be enough for them to claim they don’t intend to discriminate. Instead, they will have to demonstrate that they have a valid nondiscriminatory purpose for how their AI systems work and that there was no less discriminatory way to design them. These requirements move protection against discrimination upstream, into the design phase.

That is crucial and powerful: Developers will be more likely to prioritize diverse, representative datasets. They may do more to diversify their engineering teams and consult external stakeholders to better anticipate bias and avoid unforced errors. They will invest more in internal and external red-teaming and audits, testing their models for discriminatory results and adjusting them accordingly. They will refine their post-training techniques to better prevent discrimination. The result will be safer, more accurate, and more trusted models. And that’s good for business too: It will mean more customers and higher profits for companies whose AI products we can rely on.

Existing disparate impact law is inadequate to address algorithmic discrimination

Several federal statutes provide for disparate impact liability that can address AI-based harm. These include Title VII of the Civil Rights Act of 1964 (employment), the Age Discrimination in Employment Act (employment), the Fair Housing Act (housing and housing financing), the Equal Credit Opportunity Act (lending and other financial services), the Voting Rights Act of 1965 (voting rules and election procedures), Title VI of the Civil Rights Act of 1964 (prohibiting discrimination by recipients of federal financial assistance), Title IX of the Education Amendments Act of 1972 (prohibiting sex discrimination in education), and the Americans with Disabilities Act, among others.

Unfortunately, these provisions are insufficient to guard against many common forms of algorithmic discrimination.

First, many of these statutes are under attack in the courts. In 2021, the Supreme Court made it harder to challenge voting rules that have discriminatory effects under the Voting Rights Act by raising the legal and evidentiary standards plaintiffs must meet. In 2015, the Supreme Court held that plaintiffs could bring disparate impact claims under the Fair Housing Act, but the vote was 5 to 4, with Justice Anthony Kennedy joining four justices appointed by Democratic presidents. Kennedy has since retired, and that 2015 decision is one more pivotal ruling at risk of being overturned: The Court’s conservatives now command a 6-3 supermajority and regularly revisit settled precedents.

Title VI’s implementing regulations impose disparate impact liability if the defendant receives federal financial assistance. It’s common for schools, police departments, hospitals, transit authorities, and other institutions to receive federal funds, bringing them under the statute’s coverage. In 2001, however, the Supreme Court undercut Title VI enforcement by ruling that there is no private right of action for disparate impact claims under the statute. This means the federal government alone can file such lawsuits. But federal enforcement faces serious obstacles: limited resources, a requirement of pursuing voluntary compliance before going to court, and the risk that its primary remedy of cutting off federal funds will hurt vulnerable Americans such as underserved students and sick patients. We are also seeing fresh attacks on federal agencies’ ability to enforce Title VI. Just last month, a district judge permanently blocked the Environmental Protection Agency and Justice Department from enforcing Title VI’s disparate impact regulations anywhere in Louisiana.

Second, government enforcement has varied based on which party controls the presidency. Civil rights enforcement overall tends to atrophy (or be used to attack equity-focused programs) in Republican administrations and increase during Democratic ones. The Trump administration undid the Obama-era disparate impact regulations under the Fair Housing Act (President Joe Biden restored them), and in its final days, tried to rescind Title VI’s decades-old disparate impact regulations. We need clearer mandates for agencies to enforce disparate impact law and for private litigants to be able to sue when they are harmed.

Third, in most spheres of our society, no federal statute prohibits disparate impact discrimination. There is no freestanding federal disparate impact prohibition that applies to health care, for example, or criminal justice, education, environmental justice, public accommodations, or the sale of goods and services. Most of the companies developing or using AI in these spaces do not receive federal financial assistance and therefore are not covered by Title VI.

Enforcement actions under separate authorities by the Consumer Financial Protection Bureau (CFPB) and Federal Trade Commission (FTC) are helping fill these gaps. But they too are being challenged: A federal district judge in Texas recently ruled that the CFPB’s authority to combat unfair and deceptive practices does not include the power to regulate discrimination in financial services. The case is now on appeal. Here too, effective anti-discrimination enforcement will depend on these agencies’ political leadership, which can shift dramatically. In any scenario, such enforcement cannot be expected to cover the waterfront of AI-based discrimination.

Congress should enact legislation to strengthen disparate impact protection and enforcement

Congress should take at least three steps to better prepare our legal regime to deter and combat algorithmic discrimination.

First, Congress should enact a federal law that prohibits discrimination—specifically including disparate impact—in the deployment of AI and other automated technology. Such legislation should cover the use of algorithms to make or inform decisions that impact people’s rights and opportunities.

The law should apply to a broad range of contexts—activities like housing and lending covered by existing statutes as well as activities without their own statutes—to ensure consistency in AI regulation. Ideally, such legislation would also require model deployers to test for disparate impact and less discriminatory alternatives, thereby preventing discrimination on the front end of model design. Civil rights groups have championed such requirements, such as in the Lawyers’ Committee for Civil Rights Under Law’s model Online Civil Rights Act.

Some may object that stronger disparate impact laws will drive up developers’ costs and stifle innovation. But the expense of setting up internal testing regimes is a fraction of overall corporate costs in industries already governed by disparate impact, such as lending. These investments also help avert the expense of later defending against discrimination lawsuits and avoid costly public relations fiascos. Far from hindering innovation, disparate impact pushes companies to make products that work well for all of us—building trust with consumers and increasing their customer base. In short, greater scrutiny of AI products before releasing them is time and money well spent.

Congress has shown interest. The bipartisan proposed American Privacy Rights Act contained a disparate impact provision aimed at large data collectors and brokers as recently as May 2024. Unfortunately, it was excised from the draft bill on the eve of a House Energy and Commerce Committee markup in June, leading civil rights groups to object and the committee to cancel the markup. This is despite the fact that a nearly identical provision was included in the bill’s predecessor, the American Data Privacy and Protection Act, which passed out of committee with a bipartisan vote of 53 to 2 in 2022. The provision should be restored to the American Privacy Rights Act.

Second, Congress should provide a private right of action to allow individuals who have suffered discrimination to file suit. The country is too vast—and the number of discriminatory actions too large—to rely on agency enforcement alone. That’s a main reason private rights of action are common in our legal system. Indeed, private litigation has been critical to making real the promise of civil rights statutes such as Title VII, the Fair Housing Act, the Equal Credit Opportunity Act, and the Voting Rights Act. Its unavailability for Title VI disparate impact claims has stunted that statute’s impact.

Third, Congress should significantly increase funding for federal enforcement agencies. These include civil rights offices at the Justice Department, Department of Housing and Urban Development, Department of Health and Human Services, Equal Employment Opportunity Commission, and other agencies, but also independent agencies that regulate commerce, such as the CFPB and FTC. Although these offices have generally seen budget increases during the Biden administration, they need additional resources to tackle algorithmic discrimination on top of their already ample caseloads. These agencies also need that expanded funding to hire more technologists (who would otherwise go to industry) to help them understand the AI systems they will be scrutinizing.

These new authorities and resources aren’t a panacea for algorithmic discrimination. We’ll need comprehensive privacy legislation, AI governance standards, transparency requirements, and a host of other measures. But we need our policymakers to get started and prioritize strong liability rules. We must strengthen our legal system’s ability to prevent discrimination before AI is integrated into rights-impacting systems across our economy. The next president and Congress should prioritize enacting comprehensive disparate impact rules for AI in early 2025.

Source: Bookings