Systematic prejudice produces bias, and artificial intelligence (AI) systems are no exception. AI systems often reflect biases present in the data sets they are trained on, or the AI’s modelling method introduces new ones. A new initiative led by Professor Parham Aarabi (ECE) aims to measure this bias as a first step toward mitigating it.
“Every AI system has some kind of a bias,” says Aarabi. “I say that as someone who has worked on AI systems and algorithms for over 20 years.”
Aarabi is one of the experts from academia and industry behind U of T’s HALT AI group, which tests other organizations’ AI systems using diverse input sets and then provides a diversity report — including a diversity chart for key metrics — that points out areas of failure and provides insights on how to do better.
“We found that most AI teams do not perform actual quantitative validation of their system,” says Aarabi. “We are able to say, for example, ‘Look, your app works 80 per cent successfully on native English speakers, but only 40 per cent for people whose first language is not English.’”
HALT was launched in May 2021 as a free service for anyone to measure diversity in their AI systems. So far, it has conducted studies of a number of popular AI systems — including those of Apple, Google and Microsoft — providing statistical reports across a variety of diversity dimensions, such as gender, age and race.
“For instance, in our own testing we found that Microsoft’s age-estimation AI does not perform well for certain age groups,” says Aarabi. “So too with Apple and Google’s voice to text systems: if you have a certain dialect, an accent, they can work poorly. But you do not know which dialect until you test. Similar apps fail in different ways — which is interesting, and likely indicative of the type and limitation of the training data that was used for each app.”
HALT started in early 2021 when AI researchers within and outside the ECE department began sharing their concerns about bias in AI systems. By May, the group coalesced by bringing aboard external experts in diversity from the private and academic sectors.
“To truly understand and measure bias, it can’t just be a few people from U of T,” says Aarabi. “HALT is a broad group of individuals, including the heads of diversity at Fortune 500 companies as well as AI diversity experts at other academic institutions such as University College London and Stanford University.”
As AI systems are deployed in a wider range of applications, bias in AI becomes an even more critical issue.
For example, an AI-curated HR system that’s designed to filter resumes might leave out excellent candidates from marginalized or underrepresented communities. A facial-recognition program that systematically misrecognizes a segment of the racialized population can have dire and unjust consequences if presented as evidence in court.
Developers, of course, want their AI systems to work well and train them to reach a performance threshold. But it’s important to ask (and measure) who exactly they work well for.
“The majority of the time, there is a training set problem,” says Aarabi. “The developers simply don’t have enough training data across all representative demographic groups.”
If diverse training data doesn’t improve the AI’s performance, then the model itself is flawed and would require reprogramming.
ECE Chair Professor Deepa Kundur says the new initiative is timely.
“Our push for diversity starts at home, in our department, but also extends to the electrical and computer engineering community at large — including the tools that researchers innovate for society,” she says. “HALT AI is helping to ensure a way forward for equitable and fair AI.”
“Right now is the right time for researchers and practitioners to be thinking about this,” says Aarabi. “They need to move from high-level abstractions and be definitive about how bias reveals itself. I think we can shed some light on that.”