On humans correcting computer decisions
11 Aug, 2020
"Due to an unexpected software error..." - if you get to read this, something terrible has happened. Police stopping people of color more often based on erroneous algorithms - already happening; the latest IoT device exposes itself to be an expensive paperweight or locking you out of your home - sounds familiar. The respective PR departement does its best to tell you that this exact circumstance was clearly unforseeable and nobody is responsible. What they don't tell you is that quite the opposite is true and in addition more and more parts of our lives are dependant on complex and often secret algorithms working correctly. While there is very little scrutiny and transparency for most of them like credit scores, the tracking and advertising industry and social media, we still feel uneasy letting them decide our fate when the most basic rights are affected. Especially in the context of law enforcement and the justice system politicians and startup CEOs assure us that the last decision will always be made by a human, therefore immensely reducing the risk of failure. But how good can a human be in correcting an algorithmic decision? A recent New York Times investigation can give a you an outlook: "I guess the computer got it wrong".
In many cases a massive increase in data available precedes the introduction of algorithms to a decision process. Manual processing and decision making would become too expensive. A test run by federal police introducing biometric surveillance to a train station in Berlin justified its test by claiming manual processing is too monotone and tiring while simoultaneously demanding more cameras be installed.
The mostly cost driven implementation of theses new technologies creates an enviroment of cost-awareness that has a negative influence on the quality of decisions itself. If there is time pressure, the evaluation might not be as thourough. In the human rights sensitive context of police using facial recognition to quickly identify potential suspects this effect is more detrimental when things go wrong. Let's have a closer look at the local test run in Berlin:
From August 2017 to July 2018 the federal police together with the national rail service tested facial recognition software made by three different companies at the Berlin Südkreuz railway station. The station is visited by about 90000 people a day. With a logically COMBINED false-positive rate of 0.67 %, meaning a match is only valid if all three systems agree, there still would be about 600 daily false positives. In a fully autonomous system all these people would be unjustly stopped by police, delayed on their commute and stigmatized infront of others. To bridge the time gap until the biometric systems perform better, politicians and developers responsible for introducing these invasive surveillance devices are eager to assure the public that the final decision will always be made by a human giving the process a compassionate element that can correct upcoming failures of the machines involved. But how good are we at correcting machines that are supposed to make our job easier?
In the context of recommendations of police action there are additional questions - Who has the strength to overrule a computer decision, given the current media landscape, exposing themselves to huge backlash if that decision was in the end correct and a crime could have been prevented. Who has the strength to do that more than once? To send out officers when in doubt will establish itself as the norm after some time.
A study in 2019 examined the interaction between humans and machines in a decision process and what effects occur:
In short: humans tend to trust machines too much. This is especially the case if the machine performs well in 99 out of 100 cases. Given enough cases that 1 % can prove dangerous. An example are the level two driver assistance systems in modern cars. Steering, accelerating and braking is performed by a computer in everyday situations. In 99 % of the cases the driver does not have to intervene and trust in the system builds up accordingly. However the driver must always be ready to intervene in a dangerous situation. The accumulated trust makes us justify small deviations from the norm. "I can answer that message, nothing is going to happen" becomes that much more justifiable when the system has been doing a great job for four hours. This effect can have desastrous consequences not only while driving a car.
According to the study there are only limited solutions to this problem. Training as well as experience do not offer a way to counter it. The study shows specific circumstances in which a computer recommendation or decision system can coexist with humans regardless.
- The supporting system must perform worse than a human would. In this case the trust in the system does not grow as big as in a system with great performance.
- The whole decision or recommendation process is divided into its subparts. If you have an automatic system available that performs much better than a human in a subpart of the process it is okay to use it. The other parts must remain operated by a human. Your car's cruise control is an example where technology clearly outperforms humans without endagering others.
- If the system outperforms humans in its specific area, it can be used from a performance point of view. However other considerations must be made when introducing autonomous decisions that affect the rights of just about everybody.
In the case of biometric surveillance by police two out of three categories are in contrast to the initial justification of cost saving and automation of monotonous work. The last offers a view into a dystopian future where if you live in London for example your every move might be tracked and evaluated.
- A machine performing worse than a human does not offer much cost savings or reduction in workload.
- Automating only parts of the process might also not offer enough savings or reduction. A system only outputting silhouettes and faces to an officer to compare to a large list does not offer enough reduction of monotonous workload either.
- If in the future a system is developed that clearly outperforms a human in recognizing a suspect new problems arise. Whether such a system can be lawfully operated is questionable. Who is responsible if an "unforseen software error" jails a few citizens? Is it okay to potentially track everybody's movement to be able to solve more crimes? While whithin the EU that answer might be no, other jurisdictions might come to different conclusions opening up the pandoras box to a dystopian future for many.
In the study three different risk categories for the implementations are presented. If we categorize the test run in Berlin we end up in the most dangerous category together with the level two driver assist systems. A mostly reliable and correctly working machine which results in overconfidence in its decisions and recommendations. This can lead to unjustly detaining and charging citizens with a crime they could not have comitted like in the New York Times story mentioned in the beginning.
With surveillance powers ever increasing in many countries we as a society must ask ourselves whether increasingly invasive technologies should have a place in our everyday lives. Once established a way back might not always be available.