How to Audit AI Systems for Bias and Fairness?

Raul Smith
Nov 17, 2025
3 min read

I never really thought about it until AI bias showed up in the least expected place: my own workplace. I was using a model to sort applications for some small internal non-important project. Nothing huge or dangerous. Just a timesaving tool. Then I noticed something weird: certain groups kept getting pushed further down the list even though their qualifications matched everyone else's.

That was the moment I stopped believing “the model knows what it’s doing.” It doesn’t. Not by default.

AI isn’t always fair. It reflects whatever it saw, however it learned, and whatever assumptions were made along the way. And the mistakes are not small if that system is unaudited. They creep into decisions assumed to be objective.

To understand why something appeared odd, I began learning to audit these systems not as an engineer trying to reverse-engineer algorithms.

Most problems begin with the data, so begin there.

The first time I heard the phrase 'biased data', I thought it meant something obvious, like a spreadsheet whose categories are grossly lopsided. It is seldom that way. Bias sits there silently.

It shows up when:

The data shows a dominant group.
Historical decisions get recognised as “correct,” even if they were erroneous
Labels were applied hastily, inconsistently, or under the influence of presumptions.

Some categories are completely ignored in the dataset.

An audit's initial phase doesn't include any interaction with the model. It's looking at what the model discovered.

Typically, I pose queries such as:

Why was this information collected?
Who requested its collection?
Does the dataset reflect a true cross-section of the intended or target audience?
Which groups are missing or barely visible in the dataset?
Are previous trends being followed as if they were rules and regulations?

Even those few questions raise more problems than most people ever expect.

Then, Challenge the Model

Once I have completed my own analysis of the data, I deliberately set about “selling”the model.

I give it borderline cases, edgecases, and representative inputs from often overlooked groups. I create input pairs that are almost identical, changing only one aspect. Then I observe how the system behaves.

If a particular type of input consistently produces bad results, I know something is wrong.

Sometimes it’s a subtle problem.

And sometimes it simply cannot be missed.

This way of testing a model resembles more interviewing someone who is overplaying the effort to hide some mistake than actual “engineering”.

The Reasons Behind Local Developers' Increased Attention

It is interesting the fact that many teams of mobile app development in Indianapolis have started delivering bias audits as part of their process. Partially because they are developing tools for end-users who do not want their applications to make any biased decisions, and also because most local application teams closely work with healthcare, logistics, and banking sectors where a mistake is not just a “bug” but real harm.

They won’t wait for big companies to talk about justice.

Early model checks, and guardrails, multi-group testing, and human review are happening whenever there’s a doubt.

It’s a practical approach.

Frankly, it’s the only reasonable option.

A real audit cannot rely on a single test.

I learned the hard way that fairness auditing is an ongoing process. Once a model has been deployed into any product and is practically applied, it continues learning both directly and indirectly through feedback loops, updates, and new data. Therefore, bias can creep in at any stage or iteration of the product.

Which brings me to my checklist because this essentially turns into standard operating procedure:

Check after update.
Check when user base changes.
Determine when a specific segment's performance starts to decline.
Verify when the model begins acting strangely.

Being fair is not something that can be "achieved."

It’s something you maintain.

My last takeaway

Because the world is complex and messy and datasets reflect that mess, therefore biased AI is more common than people assume. The audit does not need to find perfection; it needs to find a state of being accountable. Accountable enough to take a moment and answer the question:

Who could this system be treating unfairly, and why?

I never became a data scientist after learning how to audit AI.

However, it did help me better appreciate why these algorithms require human oversight and the decisions they make, sometimes inconspicuously and sometimes mistakenly. AI needs to be reviewed if it is to benefit humanity rather than hinder it. It turns out that if we look closely, "someone" could be any of us.

Yes, though this barely improved my appreciation for and decision-making making oversight humans must provide the algorithms, it helped drive home why they need us: because their choices are often hidden (inconspicuous) or mistaken (wrong).

If AI is to work more for the benefit of humanity than against it, then yes. So happens, on close inspection, "someone" could be any one of us.