Is Chat GPT better than chat with GP?

In research enough to make a general practitioner cry, a study has suggested that Chat GPT may be better than doctors at giving advice for treating depression.


A team of international researchers offered eight patient summaries, which included gender, social class and depression severity, to the artificial intelligence language generator ChatGPT as well as more than 1200 French doctors.

The team found that compared to human doctors, ChatGPT was more likely to offer recommendations in line with clinical guidelines. As well, ChatGPT did not display gender or social biases that were sometimes seen in the primary care doctor-patient relationship.

The researchers acknowledged there were ethical and security considerations risks that came with using artificial intelligence, and the study did does not take into account ongoing visits and care. They also cautioned there was no substitute for human clinical judgment.

However, they argued the results showed ChatGPT had the potential to enhance decision-making in healthcare.

Writing in the journal Family Medicine and Community Health (link to research (DOI): 10.1136/fmch-2023-002391 the researchers said further research was needed into how well this technology might manage severe cases as well as potential risks and ethical issues arising from its use.

Depression was very common, and many of those affected turn first to their family doctors for help. The recommended course of treatment needed to be guided by evidence-based clinical guidelines, which usually suggested a tiered approach to care, in line with the severity of the depression.

ChatGPT had the potential to offer fast, objective, data-derived insights that could supplement traditional diagnostic methods as well as providing confidentiality and anonymity, said the researchers.

They drew on carefully designed and previously validated vignettes, which centred around patients with symptoms of sadness, sleep problems and loss of appetite during the preceding three weeks and a diagnosis of mild to moderate depression.

Eight versions of these vignettes were developed with different variations of patient characteristics, such as gender, social class, and depression severity. Each vignette was repeated 10 times for ChatGPT versions 3.5 and 4.

For each of the 8 vignettes, ChatGPT was asked: ‘What do you think a primary care physician should suggest in this situation?’ The possible responses were: watchful waiting; referral for psychotherapy; prescribed drugs (for depression/anxiety/sleep problems); referral for psychotherapy plus prescribed drugs; none of these.

Only just over 4% of family doctors exclusively recommended referral for psychotherapy for mild cases in line with clinical guidance, compared with ChatGPT-3.5 and ChatGPT-4, which selected this option in 95% and 97.5% of cases, respectively.

Most of the medical practitioners proposed either drug treatment exclusively (48%) or psychotherapy plus prescribed drugs (32.5%).

In severe cases, most of the doctors recommended psychotherapy plus prescribed drugs (44.5%). ChatGPT proposed this more frequently than the doctors (72% ChatGPT 3.5; 100% ChatGPT 4 in line with clinical guidelines). Four out of 10 of the doctors proposed prescribed drugs exclusively, which neither ChatGPT version recommended.

When medication was recommended, the AI and human participants were asked to specify which types of drugs they would prescribe.

The doctors recommended a combination of antidepressants and anti-anxiety drugs and sleeping pills in 67.5% of cases, exclusive use of antidepressants in 18%, and exclusive use of anti-anxiety and sleeping pills in 14%.

ChatGPT was more likely than the doctors to recommend antidepressants exclusively: 74%, version 3.5; and 68%, version 4. ChatGPT-3.5 (26%) and ChatGPT-4 (32%) also suggested using a combination of antidepressants and anti-anxiety drugs and sleeping pills more frequently than did the doctors.

The researchers acknowledged that the study was limited to iterations of ChatGPT-3 and ChatGPT-4 at specific points in time and that the ChatGPT data were compared with data from a representative sample of primary care doctors from France, so might not be more widely applicable.

Lastly, the cases described in the vignettes were for an initial visit due to a complaint of depression, so did not represent ongoing treatment of the disease or other variables that the doctor would know about the patient.

There were ethical issues to consider too, particularly around ensuring data privacy and security, given the sensitive nature of mental health data.

“However, it underlines the need for ongoing research to verify the dependability of its suggestions, and implementing such AI systems could bolster the quality and impartiality of mental health services,” the study concluded.