INTRODUCTION: Radiology has a significant environmental impact, but guidance on how to effectively implement sustainable practices in this field is limited. This study investigated the performance of large language models (LLMs) in providing sustainability advice for radiology. METHODS: Four state-of-the-art LLMs, namely ChatGPT-4.0 (CGT), Claude 3.5 Sonnet (CS), Gemini Advanced (GA), and Meta Llama 3.1 405b (ML), were evaluated based on their answers to 30 standardized questions covering sustainability topics such as energy consumption, waste management, digitalization, best practices, and carbon footprint. Three experienced readers rated their response for quality (OQS), understandability (US), and implementability (IS) using a 4-point scale. A mean quality score (MQS) was derived from these three attributes. RESULTS: The overall intraclass correlation was good (ICC = 0.702). Across the 30 questions on sustainability in radiology, all four LLMs showed good to very good performances, with the highest ratings being achieved in understandability (CGT/GA/ML 3.91 ± 0.29; CS 3.99 ± 0.11), underlining the excellent language skills of these models. CS emerged as the top performer across most topics, with an MQS of 3.95 ± 0.22, frequently achieving the highest scores. ML showed the second highest performance with an MQS of 3.84 ± 0.37, followed by CGT with an MQS of 3.78 ± 0.42 and GA with an MQS of 3.73 ± 0.44. Accordingly, CGT and GA showed comparable results, while GA consistently received lower mean scores than the other LLMs. None of the LLMs provided answers that were rated insufficient. CONCLUSION: Our findings highlight the potential of LLMs such as Claude 3.5 Sonnet, ChatGPT-4.0, Meta Llama 3.1, and Gemini Advanced to advance sustainable practices in radiology, with thoughtful model selection further enhancing their positive impact due to model variations.
Keywords
