A LLM-based log parsing, variable annotation, and anomaly detection system
Project Description
Overall Project Goal:
This project aims to design and develop a system based on Large Language Models (LLMs) that integrates log parsing, variable annotation, and anomaly detection functionalities to optimize code maintenance work, especially in low-sample and low-resource environments. The system will enhance the understanding, processing, and maintenance of log data in vertical domains, facilitating efficient data analysis and problem diagnosis.
Specific Task Requirements:
1. Develop a log parsing algorithm capable of analyzing the quantity and complexity of log data to lay the foundation for subsequent log analysis. Each log line will be assigned a template that distinguishes constants from variables. The Human-in-the-Loop Enhanced Large Language Model (HITL) method will be used, where in each iteration, the LLM predicts logs based on few-shot inference and refines its predictions through human annotations until all log data is parsed correctly.
2. Generate semantic labels for each variable in the log template to help explain the log parsing results and facilitate efficient querying and analysis. An open-set generation method will be used for log variable annotation, leveraging the LLM's few-shot inference capabilities to generate initial labels, and these labels will be fused and standardized to reduce the number of labels and improve semantic consistency.
3. Develop a method for automatically detecting anomalous logs to help quickly identify software issues and perform troubleshooting. The system will combine prefix tuning with large language models to acquire domain-specific knowledge of log data and apply domain generalization learning techniques to ensure accurate detection of anomalous logs across different production environments.
This project aims to design and develop a system based on Large Language Models (LLMs) that integrates log parsing, variable annotation, and anomaly detection functionalities to optimize code maintenance work, especially in low-sample and low-resource environments. The system will enhance the understanding, processing, and maintenance of log data in vertical domains, facilitating efficient data analysis and problem diagnosis.
Specific Task Requirements:
1. Develop a log parsing algorithm capable of analyzing the quantity and complexity of log data to lay the foundation for subsequent log analysis. Each log line will be assigned a template that distinguishes constants from variables. The Human-in-the-Loop Enhanced Large Language Model (HITL) method will be used, where in each iteration, the LLM predicts logs based on few-shot inference and refines its predictions through human annotations until all log data is parsed correctly.
2. Generate semantic labels for each variable in the log template to help explain the log parsing results and facilitate efficient querying and analysis. An open-set generation method will be used for log variable annotation, leveraging the LLM's few-shot inference capabilities to generate initial labels, and these labels will be fused and standardized to reduce the number of labels and improve semantic consistency.
3. Develop a method for automatically detecting anomalous logs to help quickly identify software issues and perform troubleshooting. The system will combine prefix tuning with large language models to acquire domain-specific knowledge of log data and apply domain generalization learning techniques to ensure accurate detection of anomalous logs across different production environments.
Supervisor
YI, Ke
Co-Supervisor
CHEN, Lei
Quota
2
Course type
UROP1100
UROP2100
UROP3100
UROP3200
UROP4100
Applicant's Roles
The project requires 1-2 undergraduate students to assist with data processing, model testing, system development, and evaluation.
Applicant's Learning Objectives
Understanding how LLMs work and getting familiar with the system development involving LLMs.
Complexity of the project
Challenging