onesuper / HuggingFace-Datasets-Text-Quality-Analysis
Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in dataset using pandas
☆49Updated last year
Related projects: ⓘ
- MultilingualShareGPT, the free multi-language corpus for LLM training☆72Updated last year
- Light local website for displaying performances from different chat models.☆85Updated 10 months ago
- ⏳ ChatLog: Recording and Analysing ChatGPT Across Time☆94Updated 3 months ago
- Leveraging passage embeddings for efficient listwise reranking with large language models.☆27Updated 2 months ago
- AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆98Updated 2 weeks ago
- ☆75Updated 5 months ago
- [EMNLP 2023 Demo] CLEVA: Chinese Language Models EVAluation Platform☆55Updated 9 months ago
- Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.☆132Updated 10 months ago
- ☆126Updated last year
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆37Updated 6 months ago
- Unofficial implementation of AlpaGasus☆83Updated 11 months ago
- 中文大语言模型评测第二期☆68Updated 10 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆101Updated last week
- [ACL24] Official repo for "Synthesizing Text-to-SQL Data from Weak and Strong LLMs"☆58Updated last month
- ☆172Updated last year
- ☆32Updated 3 months ago
- Reformatted Alignment☆111Updated 4 months ago
- YuLan-IR: Information Retrieval Boosted LMs☆211Updated 6 months ago
- Leveraging large language models for text-to-SQL synthesis, this project fine-tunes WizardLM/WizardCoder-15B-V1.0 with QLoRA on a custom …☆43Updated 9 months ago
- ☆124Updated 2 months ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆58Updated last year
- LongAlign: A Recipe for Long Context Alignment Encompassing Data, Training, and Evaluation☆194Updated 4 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆68Updated 11 months ago
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆209Updated this week
- Counting-Stars (★)☆70Updated 3 weeks ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆196Updated last year
- Imitate OpenAI with Local Models☆83Updated 3 weeks ago
- ☆90Updated 5 months ago
- Evaluation tools for Retrieval-augmented Generation (RAG) methods.☆112Updated 2 months ago
- ☆185Updated last month