Real world. Real mistakes.
Examples of AI incidents.
AI incidents can cause large-scale damage
AI technologies promise to deliver unprecedented business value by automating processes, improving personalization, and generating possibilities for novel products and services. However, this landscape of opportunity is also rife with common pitfalls. AI developers and users across use cases and industries face a variety of risks and uncertainties. What happens when the AI system outputs unexpected results? Is the input data secured and anonymized? Is the system protected from malicious attacks? Is the AI output integrated into any critical decisions, and does it treat all involved human stakeholders fairly?
Failing to consider these types of questions, and thus the trustworthiness of the AI system, typically means exposure to severe consequences. Financial, legal and reputational damage can result when an AI system functions in an unintended way. Even large and experienced companies encounter major setbacks after ignoring components of Trustworthy AI development and operation, as illustrated by the following selection of examples.
Meta AI released a demo of Galactica, its new Large Language Model (LLM) for science, accompanied by impressive claims. Meta touted the model’s potential for assisting scientists, from summarizing academic papers to solving math problems, writing scientific code and more. Hours after release, it became clear that Galactica could be easily exploited to generate biased, misleading yet authoritative-sounding content. Scientists began sharing examples of concerns and ethics criticisms online, and within 48 hours Meta had paused the demo. While the model’s developers describe running a “toxicity and bias” evaluation pre-release, the analysis was clearly insufficient to prevent potential harms. A deeper audit informed by the application’s use context could have generated a clearer picture of the limitations before public release.Source: Why Meta’s latest large language model survived only three days online - MIT Technology Review
Kentucky Fried Chicken
German customers of popular fast food chain Kentucky Fried Chicken were shocked by the insensitive nature of an automated push notification in the KFC mobile app. The message invited customers to commemorate the anniversary of Kristallnacht, the 1938 pogrom that is widely considered to mark the start of the Holocaust, by enjoying a serving of “tender cheese with crispy chicken”. KFC quickly apologized, calling the text “unacceptable” and blaming a failure in an internal review process that governs semi-automatic content creation. Nevertheless, the incident had already been widely shared and criticized. Here, an independent process assessment could have indicated the potential for mistakes with sensitive content and helped KFC to avoid a reputational stain.
Source: KFC Apologizes for Linking Chicken Promotion to Kristallnacht – New York Times
Damage: Financial, Legal, Reputational
Facial recognition company Clearview AI was fined for breaching British privacy laws when it failed to disclose noncompliant data practices. The company had processed personal data without permission while training its models on billions of photos scraped from social media profiles. In this scenario, evaluating the ethics of data collection and storage practices could have highlighted a lack of privacy safeguards and averted the resulting poor publicity.Source: Clearview AI, a facial recognition company, is fined for breach of Britain’s privacy laws - New York Times
Damage: Financial ($569M), Reputational
The US real estate market giant Zillow staked a big claim on the "iBuying" trend: tech-enabled house flipping with fast turnover driven by algorithmic house pricing. However, this strategic direction came to a crashing halt when Zillow was forced to shut down the unit, lay off a quarter of its employees, and take $569 million in write-downs. A major cause? Home price forecasting was highly unpredictable and combined with changing conditions under pandemic variability and inflation, the algorithm’s results did not scale well. In this scenario, a comprehensive analysis of the model robustness could have better identified shifting data distributions and additional issues with the prediction quality. With a clearer picture of the algorithmic risk, business leaders could have adapted to a more profitable strategy sooner.Source: Flip Flop: Why Zillow’s Algorithmic Home Buying Venture Imploded | Stanford Graduate School of Business
Amazon implemented an AI-based hiring tool and discovered an unfortunate drawback very late in the game: the model had learned to discriminate against resumes from female applicants. Executives lost hope in the project, and ultimately Reuters broke the story of the biased system to public outrage. In this scenario, a deeper dive into the model development process and analysis of the model training features could have quickly identified problems with fairness. Absent a proactive quality assessment, the project ended in financial and reputational loss.Source: The AI Recruitment Evolution - from Amazon’s Biased Algorithm to Contextual Understanding | TalentLyft
Widely used video conferencing platform Zoom was forced into damage control mode against allegations of racial bias. In a viral social media post, a researcher pointed out a critical flaw in Zoom’s virtual background algorithms that resulted in the erasure of Black users’ faces. While Zoom’s internal checks hadn’t revealed this weakness ahead of time, a high standard, detailed independent assessment could have improved the fairness of the algorithm and avoided user distress.Source: Twitter and Zoom’s algorithmic bias issues | TechCrunch
Damage: Financial, Reputational
Cosmetics company Estée Lauder was forced to pay out a settlement to three make-up artists who were wrongfully fired. The women were initially terminated from their positions following interviews performed by an automated hiring software from a third-party vendor, HireVue. However, the process was completely opaque, with no explanation or justification for the result, nor relation to job performance. HireVue has since fully discontinued use of visual video analysis software. As an AI procurer, Estée Lauder could have saved money and integrity by purchasing a product labelled with a proof of quality.Source: Payout for Estée Lauder women ‘sacked by algorithm’ | News | The Times