Analyse qualitative et quantitative des « hallucinations » générées automatiquement dans un corpus de reformulations médicales

Open Access

Issue		SHS Web Conf. Volume 191, 2024 9^e Congrès Mondial de Linguistique Française


Article Number		11001
Number of page(s)		20
Section		Ressources et outils pour l’analyse linguistique
DOI		https://doi.org/10.1051/shsconf/202419111001
Published online		28 June 2024

Alkaissi, H. et McFarlane, SI. (2023). Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus 15(2):e35179. DOI: 10.7759/cureus.35179 [PubMed] [Google Scholar]
Buhnila Ioana. (2023). Une méthode automatique de construction de corpus de reformulation. Thèse de doctorat, Université de Strasbourg, juin 2023. [Google Scholar]
Athaluri, SA., Manthena, SV., Kesapragada VSR, KM., Yarlagadda, V., Tirth, D. et Rama, TSD. (2023). Exploring the Boundaries of Reality: Investigating the Phenomenon of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT References. Cureus 15(4):e37432. DOI 10.7759/cureus.37432 [Google Scholar]
Bender, EM., Gebru, T., McMillan-Major, A. et Shmargaret, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ‘21). Association for Computing Machinery, New York, NY, USA, p. 610–623. https://doi.org/10.1145/3442188.3445922 [Google Scholar]
Bender, E., Koller, A. (2020). Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, p. 5185–5198. [Google Scholar]
Bidu-Vrânceanu, A. (2007). Lexicul specializat în mișcare. De la dicționare la texte. București. Editura Universității din București. 266 pages. [Google Scholar]
Bruno, A., Mazzeo PL., Chetouani, A., Tliba, M., Kerkouri, MA. (2023). Insights into Classifying and Mitigating LLMs’ Hallucinations. arXiv:2311.08117v1 [cs.CL] [Google Scholar]
Buhnila, I. (2022). Le rôle des marqueurs et indicateurs dans l’analyse lexicale et sémantico-pragmatique de reformulations médicales. 8e Congrès Mondial de Linguistique Française (CMLF), 4–8 juillet 2022, Orléans, France, SHS Web of Conferences 138: 10005. https://doi.org/10.1051/shsconf/202213810005. [Google Scholar]
Bybee, J. (2006). From Usage to Grammar: The Mind’s Response to Repetition. Language, 82(4), p. 711–733. [CrossRef] [Google Scholar]
Chomsky, N. (1957). Syntactic Structure. Mouton. [CrossRef] [Google Scholar]
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educ. Psychol. Meas., 20, p. 27–46. [Google Scholar]
Copara, J., Knafou, J., Naderi, N., Moro, C., Ruch, P. et Teodoro, D. (2020). Contextualized French language models for biomedical named entity recognition. Actes de la 6e conférence conjointe Journées d’Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes, p. 36–48. [Google Scholar]
Culbertson, J., Schouwstra, M. et Kirby, S. (2020). From the world to word order: Deriving biases in noun phrase order from statistical properties of the world. Language 96(3), p. 1–22. [Google Scholar]
De Castro, M., Zona, U. (2022). A vigotskijan perspective on machine learning. How cultural stereotypes are involved in education of algorithms. Academia Letters, Article 4638. https://doi.org/10.20935/AL4638. [Google Scholar]
Dechêne, A., Stahl, C., Hansen, J. et Wänke, M. (2010). The truth about the truth: A meta-analytic review of the truth effect. Personality and Social Psychology Review 14(2), p. 238–257. doi:10.1177/1088868309352251. [Google Scholar]
Devlin, J., Chang, M-W., Lee, K. et Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. [Google Scholar]
Durt, C., Froese, T., Fuchs, T. (2023). Against AI Understanding and Sentience: Large Language Models, Meaning, and the Patterns of Human Language Use. [Preprint] PhilSci Archive. [Google Scholar]
Eddine, M. K., Tixier, A., Vazirgiannis, M. (2021). BARThez: a Skilled Pretrained French Sequence-to-Sequence Model. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, p. 9369–9390. [Google Scholar]
Emsley, R. (2023). ChatGPT: these are not hallucinations – they’re fabrications and falsifications. Schizophrenia 9:52. https://doi.org/10.1038/s41537–023-00379–4. [Google Scholar]
Eshkol-Taravella, I., Grabar, N. (2017). Taxinomie dans les reformulations du point de vue de la linguistique de corpus. Syntaxe et Sémantique, vol. 18, no. 1, p. 149–184. [CrossRef] [Google Scholar]
Fuchs, C. (1982). La paraphrase entre la langue et le discours. Langue française, La vulgarisation (53), p. 22–33. [Google Scholar]
Goldberg, A. (2019). Explain Me This: Creativity, competition, and the partial productivity of constructions. Princeton University Press. [Google Scholar]
Grabar, N., Cardon, R. (2018). CLEAR – Simple Corpus for Medical French. Proceedings of the 1st Workshop on Automatic Text Adaptation (ATA), Tilburg, the Netherlands. Association for Computational Linguistics, p. 3–9. [Google Scholar]
Gülich, E., Kotschi, T. (1983). Les marqueurs de la reformulation paraphrastique. Cahiers de linguistique française 5, p. 305–351. [Google Scholar]
Hasher, L., Goldstein, D. , Toppino, T. (1977). Frequency and the conference of referential validity. Journal of Verbal Learning and Verbal Behavior 16(1), p. 107–112. doi:10.1016/S0022–5371(77)80012–1 [Google Scholar]
Hatem R., Simmons B., Thornton JE. (2023). Chatbot Confabulations Are Not Hallucinations. JAMA Intern Med. 2023, 183(10):1177. doi:10.1001/jamainternmed.2023.4231 [Google Scholar]
Heidegger, M. (2010). Being and Time. Translated by Joan Stambaugh and Dennis J. Schmidt. SUNY Series in Contemporary Continental Philosophy. Albany: State University of New York Press. [Google Scholar]
Hoey, M. (2005). Lexical Priming: A new theory of words and language. Abingdon, England: Routledge. [Google Scholar]
Hopper, P., Bybee, J. (2001). Frequency and the Emergence of Linguistic Structure. Amsterdam/Philadelphia: John Benjamins Publishing Company. [Google Scholar]
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P. et Suchomel, V. (2014). The Sketch Engine: ten years on. Lexicography 1, p. 7–36. [CrossRef] [Google Scholar]
Labrak, Y., Bazoge, A., Dufour, R., Rouvier, M., Morin, E., Daille, B. et Gourraud, P. A. (2023). DrBERT: Un modèle robuste pré-entraîné en français pour les domaines biomédical et clinique. 18e Conférence en Recherche d’Information et Applications\\16e Rencontres Jeunes Chercheurs en RI\\30e Conférence sur le Traitement Automatique des Langues Naturelles\\25e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, p. 109–120. [Google Scholar]
Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. Text summarization branches out, p. 74–81. [Google Scholar]
Martin, L., Muller, B., Ortiz Suárez P.J, Dupont, Y., Romary, L., de la Clergerie, E., Seddah, D., Sagot, B. (2020). CamemBERT: a Tasty French Language Model. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, p. 7203–7219. [CrossRef] [Google Scholar]
Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J. et Kiela, D. (2020). Adversarial nli: A new benchmark for natural language understanding (https://arxiv.org/abs/1910.14599). [Google Scholar]
Nighojkar, A., Licato, J. (2021). Improving paraphrase detection with the adversarial paraphrasing task. arXiv preprint. arXiv:2106.07691. [Google Scholar]
Østergaard, SD., Nielbo, KL. (2023). False Responses From Artificial Intelligence Models Are Not Hallucinations. Schizophrenia Bulletin, Volume 49, Issue 5, p. 1105–1107, https://doi.org/10.1093/schbul/sbad068 [Google Scholar]
Palivela, H. (2021). Optimization of paraphrase generation and identification using language models in natural language processing. International Journal of Information Management Data Insights, 1(2), 100025. [CrossRef] [Google Scholar]
Piantadosi, ST. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin et Review 2, p. 1112–1130. https://doi.org/10.3758/s13423–014-0585–6 [CrossRef] [Google Scholar]
Post, M. (2018). A Call for Clarity in Reporting BLEU Scores. Proceedings of the Third Conference on Machine Translation: Research Papers, p. 186–191. [Google Scholar]
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S, Matena, M., Zhou, Y., Li, W. et Liu, PJ. (2020). [Google Scholar]
Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1), p. 5485–5551. [Google Scholar]
Rastier, F. (1985). L’isotopie sémantique, du mot au texte. Paris. [Google Scholar]
Săpoiu, C. (2013). Hiponimia în terminologia medicală. Modalităţi de abordare în semantică şi lexicografie. Piteşti, Editura Trend, 199 pages. [Google Scholar]
Sellam, T., Das, D. et Parikh, AP. (2020). Bleurt: Learning robust metrics for text generation. arXiv preprint arXiv:2004.04696. [Google Scholar]
Sinclair, J. (1996). The search for units of meaning. Textus 9, p. 75–106. [Google Scholar]
Tchechmedjiev, A., Abdaoui, A., Emonet, V., Zevio S. et Jonquet, C. (2018). SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes. BMC bioinformatics, 19(1), 405. [CrossRef] [Google Scholar]
Todirascu, A., Padó, S., Krisch, J., Kisselew, M. et Heid, U. (2012). French and german corpora for audience-based text type classification. LREC, volume 2012, p. 1591–1597. [Google Scholar]
Touchent, R., Romary, L. et De La Clergerie, E. (2023). CamemBERT-bio: Un modèle de langue français savoureux et meilleur pour la santé. Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1: travaux de recherche originaux--articles longs, p. 323–334. [Google Scholar]
Vassiliadou, H. (2020). Peut-on aborder la notion de “reformulation” autrement que par la typologie des marqueurs? pour une analyse sémasiologique et onomasiologique. In Olga Inkova (Ed.), Autour de la Reformulation, Droz, p. 77–94. [Google Scholar]
Vernikos, G., Popescu-Belis, A. (2024). Don’t Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation. arXiv preprint arXiv:2401.06688. [Google Scholar]
Witteveen, S.>, AI, R. D., Andrews, M. (2019). Paraphrasing with Large Language Models. In Proceedings of the 3rd Workshop on Neural Generation and Translation, EMNLP-IJCNLP 2019, p. 215–220. [Google Scholar]
Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., Zhong, S., Yin, B. et Hu, X. (2024). Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. ACM Trans. Knowl. Discov. Data Just Accepted (February 2024). https://doi.org/10.1145/3649506 [Google Scholar]
Ye, H., Liu, T., Zhang, A., Hua, W. et Jia, W. (2023). Cognitive Mirage: A Review of Hallucinations in Large Language Models. arXiv:2309.06794v1 [cs.CL] [Google Scholar]
Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Chen, Y., Wang, L., Luu, AT, Bi, W., Shi, F. et Shi, S. (2023). Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models. arXiv cs.CL eprint 2309.01219, https://doi.org/10.48550/arXiv.2309.01219 [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.