Don’t blame the AI algorithm for biases

By Richard Stolz April 27, 2018, 11:43 a.m. EDT 6 Min Read

The classic critique of computer technology, “garbage in, garbage out,” holds true as much today as when tech was in its infancy. The dazzling power of today’s systems, including AI-powered HCMS, can mask that vulnerability, to the unwary. So cautions Nathan Mondragon, the chief industrial and organizational (I/O) psychologist for HireVue, the provider of a sophisticated video-based hiring assessment tool. Mondragon is a pioneer in blending talent management and technology solutions. Employee Benefit News recently spoke with him to explore this topic; an edited excerpt of that conversation follows.

Employee Benefit News: I/O psychologists apply psychological principles to many corners of the HR realm. What’s your focus?

Nathan Mondragon: I always liked selection or hiring science, and the technology enablement of it. Back in 1996, I and a colleague built the very first online assessment ever delivered for hiring. I’ve always been blending selection techniques with technology enablement, and trying to make sure that the two can come together with a proper scientifically valid approach.

EBN: Is there a problem in this field?

Mondragon: Yes, there’s a lot of snake oil, especially when technology comes into play. People are saying, “I can do that,” and they put together a test and then launch it, and it’s not designed right and it’s not valid.

EBN: The knock on the application of artificial intelligence (AI) in hiring systems is that they make recommendations based on historical hiring patterns. Critics say that’s analogous to a child imitating its parents’ behaviors, rather than what they say the child should be doing. Is that a valid critique?

Mondragon: It’s true that people can take technology and do what they want with it, even if it’s not scientifically valid. There are a lot of people building algorithms to predict something, but they really don’t check the algorithm or the data that’s driving the algorithm for bias, whatever it might be.

EBN: Can you offer an example?

Mondragon: There was a case of someone trying to train a machine to distinguish between members of the canine family, using wolves and huskies. And they did it with something like 90% accuracy. But they never went back and looked at how the machine was doing it. Well, it turns out the computer was looking at snow in the background to identify the husky, not visual characteristics of the dog. So it had nothing to do with the actual canine characteristics of the two animals; it had to do with the background. The data was given and they ran with it.

EBN: So it might call a polar bear a husky. What would be a good analogy in the employee candidate selection world?

Mondragon: Humans have a lot of biases, and we can’t leave it at the door. It’s kind of who we are. So, for example, we built a model one time and the customer said, “We can’t give you the job performance data, we don’t have enough of it, but we’ve done video interviews on thousands and thousands of people, and we’ve hired hundreds of them. And we do a really good job of who we hire. We think we screen them right, we test them, and we think our hiring decisions are a really good reflection of a quality person that gets on the job and performs well. So can you build a model on predicting who we hire so that we can avoid some of the other screening step and save a bunch of time?”

EBN: What happened?

Mondragon: So you’re training the machine to make an assessment based on the decision that the human made on who they hired and who they didn’t hire, correct? And so we built the algorithm, we ran it. It was a great result where the statistics were really high level. It predicted who would be hired, based on the historic pattern. But then we ran “adverse impact” checks. We always want to make sure that there’s no race, gender, or age differences in how those people score. We found that there was a significant difference in the algorithm score on black versus white individuals in the direction that you don’t want it to be, so that they were hiring more whites than blacks.

EBN: So that means there was racial bias in the hiring process that got baked into the algorithm?

Mondragon: Yes, it was the human decision on the hiring that caused the bias in the algorithm.

EBN: What was your take-away from that experience?

Mondragon: We had to go back to the customer and say, basically, “There’s no bias in the algorithm, but you have a bias in your hiring decisions, so you need to fix that or this is going be a vicious cycle; the system will just perpetuate itself.” So now there’s a bunch of analysis that takes place to look at the datasets to make sure they’re as clean as possible. Then you bring them together, build the model, and then recheck and make sure there’s no adverse impact in it.

EBN: How does this work with gender discrimination?

Mondragon: There are about 20,000 data points that can drive the algorithms from the video data — the words, like “emotion” and “aggression,” the sound of the voice, the facial expressions, all that. We can go back in and see if there’s a male-female difference. That’s easily just a few hundred data points, so we still have 19,000 data points in the algorithm, and that male-female difference goes away.

EBN: What I want employees who are more aggressive than emotional? If you eliminate that point of distinction because it could result in gender bias, where does that leave me, even if I’m not trying to weed out female applicants — which would be easy enough to do anyway?

Mondragon: Think of a job with safety requirements, like a warehouse or driver job. When people get hurt on the job, it’s usually because they either didn’t follow procedure, they were too risky in their actions, or they were too aggressive. So we built the questions in the video assessment to go after a separation of behaviors of people that would take risks or not. Then we had a bunch of the customer’s employees go through the video assessment, but we already knew which ones had safety violations, and which ones did not, and we built a model to predict that. That gave us a more reliable way to predict risky behavior in job candidates, but limiting the potential for gender bias.

EBN: Can the system help employers increase diversity, without using a limited set of criteria?

Mondragon: Yes. Sometimes it isn’t purely a technology solution. First you make sure the algorithm that you’re going to screen people in isn’t biased. Then you cast a wider net in your sourcing efforts. Unilever did this. Instead of mainly going to Ivy League and similar campuses they previously would go to in the U.S., which could limit the diversity of the applicant pool, they would go to a bunch of other campuses as well. They didn’t have to visit them physically, but they would advertise the recruitment effort and have candidates come in and do their video interview.

EBN: We’ve just been talking about this assessment technology for talent selection. What about other applications?

Mondragon: It can also be used for employees who are applying for other internal positions, people looking for greater responsibility.

EBN: What’s the future of algorithm-based candidate assessment support technology?

Mondragon: We think we can make it even better. Even though we’re getting 20,000 data points, we’re still just scratching the surface. I think in five or 10 years, we’re going to have prediction accuracy levels that people haven’t seen before. And what we do in the question design area is going to change as well. It’s not just going to be asking candidates questions on video. We’ll be asking candidates to do exercises that will be real immersive and interactive, and yield even more data, and that will tell us a lot more.

Richard Stolz

Principal, Stolz Communications, Stolz Communications