onesuper / HuggingFace-Datasets-Text-Quality-AnalysisLinks
Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in dataset using pandas
☆53Updated 2 years ago
Alternatives and similar repositories for HuggingFace-Datasets-Text-Quality-Analysis
Users that are interested in HuggingFace-Datasets-Text-Quality-Analysis are comparing it to the libraries listed below
Sorting:
- Light local website for displaying performances from different chat models.☆87Updated last year
- MultilingualShareGPT, the free multi-language corpus for LLM training☆72Updated 2 years ago
- Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.☆154Updated last month
- AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆148Updated 6 months ago
- A Multi-Turn Dialogue Corpus based on Alpaca Instructions☆172Updated 2 years ago
- Summarize all open source Large Languages Models and low-cost replication methods for Chatgpt.☆137Updated 2 years ago
- Leveraging passage embeddings for efficient listwise reranking with large language models.☆45Updated 7 months ago
- The code and data for "StructGPT: A general framework for Large Language Model to Reason on Structured Data"☆103Updated last year
- Imitate OpenAI with Local Models☆87Updated 10 months ago
- 中文大语言模型评测第二期☆70Updated last year
- Unofficial implementation of AlpaGasus☆92Updated last year
- ☆30Updated 11 months ago
- ⏳ ChatLog: Recording and Analysing ChatGPT Across Time☆100Updated last year
- Large language Model fintuning bloom , opt , gpt, gpt2 ,llama,llama-2,cpmant and so on☆97Updated last year
- 中文大语言模型评测第一期☆109Updated last year
- Counting-Stars (★)☆83Updated last month
- [EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform" [ACL 2025 Findings] "C2LEVA: Toward Comprehensive and Contaminatio…☆63Updated 2 months ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆66Updated 2 years ago
- Open efforts to implement ChatGPT-like models and beyond.☆108Updated 11 months ago
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆178Updated last year
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- ☆128Updated 2 years ago
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆85Updated 5 months ago
- Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"☆99Updated last year
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆85Updated last year
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆185Updated last year
- ☆124Updated last year
- Official code for the publication "Large Language Models as Zero-shot Dialogue State Tracker through Function Calling" https//arxiv.org/a…☆62Updated 11 months ago
- 1.4B sLLM for Chinese and English - HammerLLM🔨☆44Updated last year
- Silk Road will be the dataset zoo for Luotuo(骆驼). Luotuo is an open sourced Chinese-LLM project founded by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子…☆39Updated last year