Northvolt-ML-Group Data project
High hiring rate at the company leads to a lot of employees who have questions of which a high percentage are the same. Hence, a study of how a chatbot can tap into the company’s database of information to answer users’ requests will be beneficial.
Objective
Create high-quality QnA pairs from the HowTo & other pages in Confluence that can be added to Northvolt’s IT Service’s bot’s knowledge base.
Methodology
This involved QnA extraction and saving them as CSV. The final answers per question are summarized using LLaMA-2. Personal Identifiable Information (PII) was removed. Quality evaluation is performed on the final answers. Synthetic data augmentation was performed to upscale the original data.
Results
This resulted in (1) better quality data for training chatbot models, (2) increased data points through the augmentation, and (3) structured data, instead of IT Support searching over 7,000 “how-tos”.
Updated: