open-compass / CompassJudger
☆78Updated 2 months ago
Alternatives and similar repositories for CompassJudger:
Users that are interested in CompassJudger are comparing it to the libraries listed below
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆209Updated 3 months ago
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆52Updated 9 months ago
- ☆252Updated 6 months ago
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆257Updated 9 months ago
- Reformatted Alignment☆113Updated 4 months ago
- [ACL 2024] AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning☆203Updated 2 weeks ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆172Updated 10 months ago
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆110Updated 2 months ago
- The demo, code and data of FollowRAG☆68Updated last month
- A series of technical report on Slow Thinking with LLM☆359Updated this week
- Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.☆143Updated last year
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆237Updated last year
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆64Updated last month
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆138Updated 4 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆104Updated last month
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆239Updated last month
- ☆172Updated 3 weeks ago
- ☆301Updated 4 months ago
- ☆116Updated 2 months ago
- awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.☆183Updated this week
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆36Updated 2 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆93Updated last month
- ☆88Updated last month
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆46Updated 3 months ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆41Updated 7 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆40Updated 2 months ago
- [NAACL 2025] KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents☆195Updated this week
- ☆120Updated 7 months ago
- This is the code repo for our paper "Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents".☆102Updated 3 months ago
- The code and data of DPA-RAG☆55Updated last week