Limitations of Large Langauge Models
The problem of false attribution is not only an ethical or moral one but also a legal one. It is considered illegal in some countries. False attribution is the incorrect representation that someone is the author of a piece of work when they are not.
Objective
To demonstrate the strengths and limitations of large language models (LLMs) in zero- and few-shot settings with regards to the task of author attribution for chunks of text and introduce a simple hallucination metric for their evaluation, called simple hallucination index (SHI).
Methodology
SHI differentiates unknown from incorrect or correct facts made by an LLM, unlike the typical binary (correct/incorrect) classes in author attribution tasks or existing hallucinaton metrics. We evaluated 3 SotA LLMs (LLaMA-2-13B, Mixtral 8x7B, and Gemma-7B-In).
Results and Future Projects
Mixtral 8x7B has the best average performance, i.e. the best average accuracy and the lowest average SHI. Gemma-7B-In has the lowest average accuracy and the highest average SHI. Despite having the best average performance, Mixtral 8x7B hallucinates strongly on all the 3 books by the author Smollett. This issue is observed for all the LLMs. As future work, it can be interesting to evaluate closed LLMs, such as ChatGPT.

Correlation of accuracy and SHI for Mixtral 8x7B.
Updated: