AI builds its bedside manner

US scientists say ChatGPT-4 has aced their Turing test, proving itself indistinguishable from a real human, even when statistical methods were used to try to detect it.  

In fact, ChatGPT-4 displayed more humanity than some of the humans it was tested against, as it was more cooperative, altruistic, trusting, generous, and likely to return a favour than the average human included in the trial.  

The team asked ChatGPT to answer psychological survey questions and play interactive games that assess trust, fairness, risk aversion, altruism, and cooperation. Next, they compared ChatGPT’s’ choices to the choices of 108,314 humans from more than 50 countries.  

Statistically, ChatGPT was indistinguishable from randomly selected humans, and it mirrored human responses such as becoming more generous when it was told someone else was watching.  

For example, both humans and chatbots became more generous when told that their choices would be observed by a third party and modified their behaviours after experiencing different roles in a game or in response to different framings of the same strategic situation. 

Stanford University’s Dr Matthew Jackson, the lead author, explained that as some roles for AI involve decision-making and strategic interactions with humans, it was imperative to understand AI behavioural tendencies.  

“Surprisingly, the chatbots’ behaviours tended to be more cooperative and altruistic than the median human behaviour, exhibiting increased trust, generosity, and reciprocity. Our findings suggest that such tendencies may make AI well-suited for roles requiring negotiation, dispute resolution, customer service, and caregiving,” he said. 

While this could be great news for patients, given the potential of AI to integrate seamlessly across multiple healthcare sites, it has raised question about the trajectory of medical AI development given the existing literature currently stresses the diagnostic role of AI, alongside the need for more empathetic doctors to provide better personalised care.  

“As Alan Turing foresaw to be inevitable, modern AI has reached the point of emulating humans: holding conversations, providing advice, drafting poems, and proving theorems. Turing proposed an intriguing test ‘the imitation game’: whether an interrogator who interacts with an AI and a human can distinguish which one is artificial,” Dr Jackson said. 

“This goes beyond simply asking whether AI can produce an essay that looks like it was written by a human or can answer a set of factual questions, and instead involves assessing its behavioural tendencies and ‘personality.’” 

The team asked variations of ChatGPT to answer psychological survey questions and play a suite of interactive games that have become standards in assessing behavioural tendencies, for which there is extensive human subject data.   

“Beyond eliciting a “Big Five” personality profile, we had the chatbots play a variety of games that elicited different traits: a dictator game, an ultimatum bargaining game, a trust game, a bomb risk game, a public goods game, and a finitely repeated Prisoner’s Dilemma game,” Dr Jackson explained. 

“Each game was designed to reveal different behavioural tendencies and traits, such as cooperation, trust, reciprocity, altruism, spite, fairness, strategic thinking, and risk aversion. 

“We also investigated the extent to which the chatbots’ behaviours change as they gain experience in different roles in a game, as if they were learning from such experience, as this is something that is true of humans.” 

In games with multiple roles, the AIs’ decisions were influenced by previous exposure to another role: if ChatGPT-3 previously acted as the responder in the Ultimatum Game, it tended to propose a higher offer when it later played as the proposer, while ChatGPT-4’s proposal remained unchanged.  

“Conversely, when ChatGPT-4 had previously been the proposer, it tended to request a smaller split as the responder,” Dr Jackson said.