Thursday, June 20, 2019

No, you can’t use predictive analytics to reduce racial bias in child welfare

And if you’re claiming success in reducing racial disparities by ensnaring more white children in the system instead of fewer children of color, you’re missing the point.

Pittsburgh's supposed success in reducing child welfare racial disparities
consists mostly of slapping scarlet number "risk scores" on more children such as these.

“When it comes to stopping state-sanctioned violence – whether an unjustified police shooting or child removal – shouldn’t we use the most advanced tools at hand?” Daniel Heimpel, publisher of the Chronicle of Social Change, asks in the conclusion of a recent column. 

Since he’s long been one of the most ardent supporters of using predictive analytics in child welfare, [UPDATE: In a tweet, Heimpel takes issue with this characterization, which is based on my impression of years of Chronicle stories] his answer is unsurprising: “It seems to me that predictive analytics – which has been so maligned as the harbinger of automated racism – could actually be a key to eroding its hold.”

But the principal child welfare study Heimpel cites teaches a very different lesson.

Whodunit vs. who might do it

Heimpel begins by suggesting that predictive analytics could be used to find caseworkers who are racially biased – as demonstrated, presumably, by the fact that they are outliers in the number of times they “substantiate” alleged child abuse or neglect or remove children from families of color.  He cites research showing that it is possible to pinpoint which police officers stop and frisk African-Americans at a disproportionate rate.

But that’s not predictive analytics. That’s just math.  You’re not predicting what people are going to do – you’re just looking at what they’ve actually done. In other words, you’re looking for whodunit, not who might do it next week or next year. If all the other variables, such as nature of allegations, income of the family, etc. are the same, and a few workers are far more “trigger happy” about removing children of color than most others, odds are those workers have a bias problem.

Of course, there’s also an underlying assumption that child protective services agency administrators want to find such workers and change their behavior.  It is at least as likely that many CPS agencies would seek out and punish workers who are more cautious than most about substantiating alleged abuse and removing children – because take-the-child-and-run is a terrible policy for children but it’s often good politics.  That’s one reason why we have foster-care panics.

In any event, predictive analytics applied to families is very different. As I discuss in detail here, it’s more like the dystopian sci-fi movie Minority Report.

When the images happen to be true

Heimpel writes that “The idea of using predictive analytics in child welfare easily conjures images of child abuse investigators targeting parents a machine deems likely to harm their children.”
Yes, it does. Because those images are accurate.

The “machine” uses a series of data points, many involving whether a family is poor, and uses it to “predict” if that family will abuse or neglect a child in the future.  But if the data points are biased – confusing poverty with neglect, for example, then the predictions are likely to be biased.  Virginia Eubanks, author of Automating Inequality aptly calls it poverty profiling.  And Prof. Dorothy Roberts, an NCCPR board member, advances Eubanks’ analysis to show the racial bias as well.

Furthermore, when actually put into effect, these models have been shown to have enormously high rates of false positives – predicting terrible harm will come to children when in fact it didn’t.

But what about Pittsburgh?

Heimpel cites a recent evaluation of the nation’s most advanced predictive analytics model, one I’ve criticized often, the Allegheny Family Screening Tool (AFST) used in Pittsburgh and surrounding Allegheny County, Pa. For every neglect call received by the county, AFST generates a risk score between 1 and 20 – an invisible “scarlet number” that supposedly predicts how likely it is that a given child will be harmed.  The number then helps call screeners decide when to screen out a call and when to send a caseworker out to investigate.

The evaluation suggests that AFST reduced racial disparities at one child welfare decision point – opening a case for investigation.  And it did.  But in the worst possible way.

As the evaluation itself acknowledges, this achievement was accomplished through

increases in the rate of white children determined to be in need of further child welfare intervention coupled with slight declines in the rate at which black children were screened-in for investigation. Specifically, there was an increase in the number of white children who had cases opened for services, reducing case disparities between black and white children. [Emphasis added.]

In other words, what they’re really saying in Pittsburgh is: Great news!  We’re running around labeling so many more white parents as child abusers that we’ve reduced racial disparities!  (“Opened for services,” is a euphemism, by the way. It means the caseworker decided the allegation should be “substantiated” and the family put under the thumb of the child protective services agency.)

This is rather like a child welfare system suddenly throwing thousands more children into foster care, sending those children home after only a few days and then saying “Great news, folks!  Our average length of stay in foster care has plummeted!”

Given all we know about the enormous harm of needless child abuse investigations and needless foster care, the solution to racial disparities should involve treating black families more like white families, not the other way around.

And nowhere mentioned in the evaluation is something else that happened after AFST was implemented – something deeply disturbing: There was a sharp, sudden spike in the number of children torn from their parents in 2017.  In a typical year, Allegheny County tears children from their parents about 1,000 times. In 2017 that spiked to 1,200 before returning to 1,019 in 2018. 

We don’t know of AFST contributed to the spike – the evaluation never addresses it.  But in the past the longtime director of the Allegheny County Department of Human Services (DHS), Marc Cherna, has taken pride in avoiding such spikes in entries.  This time, there is silence.

And even the usual number of removals in Pittsburgh, about 1,000 per year, is disturbingly high. When compared to the number of impoverished children it represents a rate-of-removal as bad as Phoenix, which has the highest rate-of-removal among child welfare systems in America’s largest cities, and worse than Philadelphia, which is second worst.  If anything, all this raises questions about whether Cherna, the one-time reformer who has led Allegheny County DHS for decades, has stayed too long. 

AFST widens the net

Indeed, among the deeply disturbing findings of this evaluation is that AFST is widening the net of coercive, traumatic state intervention into families, with no actual evidence that children are safer.  And the results would be even worse if not for the fact that the human beings who screen calls are “standing up to the algorithm” more often than the county seems to have expected.  But DHS appears to want to prevent this, so the effects of AFST on families are only likely to worsen.

A flawed measure of accuracy ...

The evaluators made their case that AFST has improved accuracy based on the following premise: Workers who go out to investigate cases are concluding that a greater proportion of them warrant further intervention.  And since the investigators don’t know the actual scarlet number – somewhere between 1 and 20 for each child in the family – the evaluation assumes AFST must be singling out a greater proportion of cases where there really is a need for DHS to intervene.

Here’s the problem.  The investigators don’t know if the scarlet number was, say, a 6 or an 18. But the investigators know enough for the very existence of AFST to bias their decision-making.  They know that the algorithm that is the pride of Allegheny County, and has gotten an avalanche of favorable national attention is probably what sent them into this home in the first place. That alone probably is enough to make them more skittish about potentially “defying” the algorithm and saying there’s no problem here.  So what the report claims is an increase in accuracy is more likely a self-fulfilling prophecy. the net grows wider

A child abuse investigation is not a benign act.  Even when it does not lead to removal it can be enormously traumatic for children.  But under AFST this trauma is increasing. According to the evaluation, before AFST the proportion of reports “screened in” was declining.  AFST stopped that decline.  That is deeply disturbing in itself, all the more so when combined with the one-year increase in entries into care noted earlier.

The human factor

The one bit of good news in this evaluation is that the human beings who do the actual screening have been less afraid to stand up to the algorithm than I’d expected.  But what’s interesting here is the fact that DHS seems to be upset by this.

One of the biggest selling points for AFST has been that it’s supposedly just a tool, something that gives advice to the screeners who still, with their supervisors, are making the actual decisions.  According to the evaluation:

“…there is considerable lack of concurrence with the AFST by call screeners … only 61 percent of the referrals that scored in the ‘mandatory’ screen-in range were, in fact, screened in.  Therefore, the county will continue to work with call screeners to understand why they might be making these decisions.”

That does not sound like DHS is happy with the screeners daring to question the algorithm.  It’s frightening to think of the effects on the poorest communities in Allegheny County if DHS takes this one “brake” off AFST.