MLGroup-JLU / LLM-data-aug-survey
The official GitHub page for the survey paper "A Survey on Data Augmentation in Large Model Era"
☆101Updated 2 months ago
Related projects: ⓘ
- A reading list on LLM based Synthetic Data Generation 🔥☆105Updated last month
- ☆119Updated 7 months ago
- A curated reading list for large language model (LLM) alignment. Take a look at our new survey "Large Language Model Alignment: A Survey"…☆65Updated 11 months ago
- Awesome papers for role-playing with language models☆88Updated last month
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆36Updated 2 months ago
- A Toolkit for Table-based Question Answering☆94Updated 11 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆101Updated 2 weeks ago
- ☆89Updated 3 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆121Updated 3 months ago
- [SIGIR'24] The official implementation code of MOELoRA.☆113Updated last month
- LLaMA Factory Document☆61Updated 3 weeks ago
- ☆32Updated 3 months ago
- Official Repository for SIGIR2024 Demo Paper "An Integrated Data Processing Framework for Pretraining Foundation Models"☆49Updated 3 weeks ago
- ☆71Updated 8 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆81Updated this week
- Project for the paper entitled `Instruction Tuning for Large Language Models: A Survey`☆134Updated 6 months ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆70Updated last year
- 顾名思义:手搓的RAG☆108Updated 6 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆51Updated 5 months ago
- LongQLoRA: Extent Context Length of LLMs Efficiently☆156Updated 10 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆133Updated 3 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆64Updated 2 weeks ago
- 本项目用于大模型数学解题能力方面的数据集合成,模型训练及评测,相关文章记录。☆41Updated last week
- Pretrain、decay、SFT a CodeLLM from scratch 🧙♂️☆30Updated 4 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆196Updated last year
- ☆109Updated 5 months ago
- Fantastic Data Engineering for Large Language Models☆38Updated last month
- Token level visualization tools for large language models☆46Updated last month
- Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuning☆27Updated 7 months ago
- ☆77Updated 2 months ago