boson-ai / RPBench-Auto
An automated pipeline for evaluating LLMs for role-playing.
☆118Updated this week
Related projects: ⓘ
- A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarks☆239Updated last month
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆209Updated 5 months ago
- ☆148Updated 10 months ago
- LongAlign: A Recipe for Long Context Alignment Encompassing Data, Training, and Evaluation☆194Updated 4 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆121Updated 3 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆196Updated last year
- Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.☆131Updated 10 months ago
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning☆337Updated 2 months ago
- 大模型多维度中文对齐评测基准 (ACL 2024)☆292Updated last month
- code for Scaling Laws of RoPE-based Extrapolation☆68Updated 11 months ago
- ☆196Updated 4 months ago
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆208Updated last month
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness☆200Updated last week
- ☆185Updated last month
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆114Updated 2 months ago
- SOTA Math Opensource LLM☆296Updated 9 months ago
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆244Updated last week
- ☆125Updated this week
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆190Updated 4 months ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆281Updated last week
- ☆158Updated 3 months ago
- ☆180Updated 4 months ago
- LongQLoRA: Extent Context Length of LLMs Efficiently☆156Updated 10 months ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆37Updated 6 months ago
- Rectified Rotary Position Embeddings☆329Updated 3 months ago
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆224Updated 2 months ago
- Train a Chinese LLM From 0 by Personal☆145Updated last week
- ☆310Updated 2 months ago
- InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…☆276Updated this week
- GPT-Fathom is an open-source and reproducible LLM evaluation suite, benchmarking 10+ leading open-source and closed-source LLMs as well a…☆350Updated 5 months ago