Does AI dream of electric sheep? Magistrate cautions against blind trust in AI-generated content

Don’t believe everything you read on the internet.

The lawyers acting for a woman, Michelle Parker, who claimed she was defamed by a body corporate in Parkwood, Johannesburg, were embarrassingly reminded of this platitude recently when it was determined that judgments cited as reference material to strengthen the plaintiff’s case turned out to be fictitious. The source used to find these so-called judgments: ChatGPT.

The artificial intelligence chatbot was developed by OpenAI and launched on 30 November 2022. Four months later, OpenAI released a new large language model, or LLM, called GPT-4 with markedly improved capabilities. By May, Anthropic’s generative AI, Claude, was able to process 100 000 tokens of text, equal to about 75 000 words in a minute – the length of the average novel – compared with roughly 9 000 tokens when it was introduced in March. And in May, Google announced several new features powered by generative AI, including Search Generative Experience and a new LLM called PaLM 2 that will power its Bard chatbot.

These generative AI applications have captured the imagination of people around the world, but they are also known to generate mistakes or “hallucinations”. As a result, they generally come with clearly displayed disclaimers disclosing this problem.

In other words, caveat lector (“let the reader beware”).

The findings of Magistrate Arvin Chaitram’s judgment delivered in the Johannesburg Regional Court were first reported on in the Sunday Times. The case first came before the magistrate in March when Claire Avidon – counsel for the trustees of the body corporate – argued that a body corporate cannot be sued for defamation.

Jurie Hayes, counsel for Parker, argued there were earlier judgments that answered this question. Chaitram granted a postponement to late May, “as the question appeared to be a novel one that could be dispositive of the entire action”, requesting both parties to source the “suggested authorities”.

What followed was a “concerted effort” to do so, which included “extensive discussions with the various High Courts throughout the country, discussions with court registrars in multiple provinces, and conversing with law librarians at the Legal Practice Council library”.

When help from Johannesburg Bar’s library was called in, the librarians said they could not find the judgments either. They were the first to raise the suspicion that the cases may have been generated by ChatGPT.

This was later confirmed in court, according to the judgment, when “the plaintiff’s counsel explained that his attorney had sourced the cases through the medium of ChatGPT”. The judgment said Parker’s attorneys had accepted the AI chatbot’s result “without satisfying themselves as to its accuracy”.

“The names and citations are fictitious, the facts are fictitious, and the decisions are fictitious,” said the judgment.

This could have turned out very badly for the plaintiff’s team of lawyers if they had submitted these fictitious judgments to the court. But, seeing they had only been submitted to the opposing side, the magistrate found the plaintiff’s attorneys had not intended to mislead the court; they were “simply overzealous and careless”.

Parker, however, was instructed by the court to pay punitive costs.

“Indeed, the court does not even consider it to be punitive. It is simply appropriate. The embarrassment associated with this incident is probably sufficient punishment for the plaintiff’s attorneys,” said Chaitram.

Does AI have a role to play in legal practice?

Commenting on the case, law firm Norton Rose Fulbright said AI tools such as ChatGPT can be useful for exploring legal concepts and possible legal arguments or replies at a high level.

“They provide a starting point for further research and analysis. However, they should not be the sole or final source of legal research. All information and sources, AI generated or not, must be cross-verified independently.”

However, the law firm emphasises client confidentiality remains paramount. “AI tools should not be privy to any client confidential data, as their security may not be guaranteed.”

The firm advised against the outright banning of the use of AI in a practice, saying it was neither a practical nor a beneficial solution.

“The impressive utility of these tools, when used correctly, can be a valuable asset for legal professionals and, even with a ban, staff are likely to still try to utilise it. Instead, we should focus on learning how to use AI correctly and safely and understanding its potential and limitations to prevent further instances like the one here.”

When ChatGPT gets it wrong, who pays?

This is not the only case where ChatGTP getting it terribly wrong has made it into the media … or the courts.

A legal case currently under way in Georgia in the United States is set to have a significant impact on establishing a standard in the field of generative AI.

First reported on by Bloomberg Law, Georgia radio host Mark Walters found that ChatGPT was spreading false information about him, accusing him of embezzling money. As a result, he sued OpenAI, in what is the company’s first defamation lawsuit.

According to the lawsuit, Walters v OpenAI LLC, the misinformation ensued when Fred Riehl, the editor-in-chief of gun publication AmmoLand, asked ChatGPT for a summary of Second Amendment Foundation v Ferguson as background for a case he was reporting on.

ChatGPT provided Riehl with a summary of the case, which stated that Alan Gottlieb, the founder of the Second Amendment Foundation (SAF), accused Walters of “defrauding and embezzling funds from the SAF”.

As reported in June by ZDNET, a business technology news website, Walters claimed that every single “fact” in the summary was false. He is seeking compensation from OpenAI through general and punitive damages, as well as reimbursement for the expenses incurred during the lawsuit.

Some of the questions that this lawsuit may shine a light on include who should be held liable and whether the website’s disclaimers about hallucinations are sufficient to remove liability.

The what and the why of AI hallucination

Futurist Bernard Marr describes hallucination in AI as the generation of outputs that may sound plausible but are either factually incorrect or unrelated to the given context.

“These outputs often emerge from the AI model’s inherent biases, lack of real-world understanding, or training data limitations. In other words, the AI system ‘hallucinates’ information that it has not been explicitly trained on, leading to unreliable or misleading responses,” writes Marr in the article “ChatGPT: What are hallucinations and why are they a problem for AI systems” published online on 22 March.

In the article, he lists four reasons why “hallucination” is a problem: erosion of trust, ethical concerns, impact on decision-making, and legal implications.

“When AI systems produce incorrect or misleading information, users may lose trust in the technology, hampering its adoption across various sectors,” he writes.

Under ethical implications, he adds, hallucinated outputs can potentially perpetuate harmful stereotypes or misinformation.

When it comes to decision-making, AI systems are increasingly being used to inform critical decisions in fields such as finance, health care, and law. Marr writes hallucinations can lead to poor choices with serious consequences.

And then as both legal cases mentioned above show, inaccurate or misleading outputs may expose AI developers and users to potential legal liabilities.

Marr recommends four ways in which these models could be improved to reduce hallucinations.

The first is improved training data.

“Ensuring that AI systems are trained on diverse, accurate, and contextually relevant datasets can help minimise the occurrence of hallucinations,” he writes.

Then there is “red teaming”; when AI developers simulate adversarial scenarios to test the AI system’s vulnerability to hallucinations and iteratively improve the model.

Transparency – providing users with information on how the AI model works and its limitations – is a third. Marr writes this can help users to understand when to trust the system and when to seek additional verification.

Last, he suggests putting “humans in the loop”: incorporating human reviewers to validate the AI system’s outputs can mitigate the impact of hallucinations and improve the overall reliability of the technology.

“By understanding the causes of hallucination and investing in research to mitigate its occurrence, AI developers and users can help ensure that these powerful tools are used responsibly and effectively,” concludes Marr.

Or users of ChatGPT and similar chatbots can follow Chaitram’s advice.

In his judgment, the magistrate said that the incident was a “timely reminder” that “when it comes to legal research, the efficiency of modern technology still needs to be infused with a dose of good old-fashioned independent reading”.