Bipol
We address the challenge of how bias in text data can be measured along some of its many axes (e.g. race and gender) and determine whether social bias exists in some important NLP datasets, based on this new method called bipol.
Recent advances in artificial intelligence (AI), large language models (LLM), and chatbots (such as ChatGPT) have raised considerable concerns about potential risks to users and the society generally. One important concern is the issue of social bias, particularly with the data these models are trained with. Bias, which can be harmful, is the unfair prejudice in favor of or against an entity.
Methodology
There are two steps in the implementation of bipol before it gives a final score between 0.0 (zero or undetected bias) and 1.0 (extreme bias). The first involves the classification of the data points (into biased and unbiased classes) using a trained model. The second step evaluates the biased samples using sensitive terms that are listed in the multi-axes lexica. We trained state-of-the-art classifiers (e.g. DeBERTa and mT5) using the large multi-axes bias (MAB) dataset for the first step and investigated benchmark datasets of different languages, including Boolq, CB, and RTE, among others. Furthermore, we confirmed the assumption that toxic comments contain bias, based on the MAB dataset.
Results and Future Projects
Our findings show that all the datasets have bias and a similar trend is observed for the models used (RoBERTa, DeBERTa, and Electra). Future work may explore possible ways of improving the performance of the classifiers, minimising false positives, address the issue of data imbalance in the MAB training data, and scale this work for additional languages.

Figure: Top-10 gender frequent terms influencing bipol in the MAB test set after Electra classification. Terms like love & old are associated with the female gender according to the lexica. However, when such subjective words are removed or put in both the male & female lexica, they cancel out from influencing bipol (paired terms are only for comparison).
Updated: