SAILResearch / awesome-foundation-model-leaderboards
A curated list of awesome leaderboard-oriented resources for foundation models
☆253Updated last month
Alternatives and similar repositories for awesome-foundation-model-leaderboards:
Users that are interested in awesome-foundation-model-leaderboards are comparing it to the libraries listed below
- AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning (NeurIPS 2024)☆178Updated last month
- The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://a…☆287Updated 3 months ago
- The official repo for paper, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods.☆266Updated 2 months ago
- The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"☆152Updated last month
- MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler a…☆172Updated 3 months ago
- "GraphAgent: Agentic Graph Language Assistant"☆258Updated 2 weeks ago
- A Contamination-free Multi-task Language Understanding Benchmark☆113Updated last month
- Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasonin…☆155Updated 2 months ago
- 使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力☆141Updated 7 months ago
- Grimoire is All You Need for Enhancing Large Language Models☆111Updated 11 months ago
- Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS☆618Updated this week
- GraphRAG-survey: A curated list of resources on graph-based retrieval-augmented generation for customized large language models.☆558Updated this week
- "AnyGraph: Graph Foundation Model in the Wild"☆205Updated 5 months ago
- The official implementation of Self-Play Preference Optimization (SPPO)☆481Updated 3 weeks ago
- A Comprehensive Benchmark for Code Information Retrieval.☆60Updated this week
- TxBKG - Knowledge Graph Generation for Any PDFs☆180Updated 3 months ago
- This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use".☆201Updated this week
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models☆170Updated 3 months ago
- Pytorch Library for Relational Table Learning with LLMs.☆316Updated this week
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆253Updated 10 months ago
- The official code for "Olympus: A Universal Task Router for Computer Vision Tasks"☆46Updated 2 months ago
- STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases (NeurIPS D&B 2024)☆298Updated last month
- [ICLR 2025] The official implementation of our ICLR2025 paper "AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak…☆213Updated 3 weeks ago
- ☆216Updated 2 months ago
- A deployment, monitoring and autoscaling service towards serverless LLM serving.☆148Updated this week
- [ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.☆158Updated 3 months ago
- Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"☆193Updated 6 months ago
- Multilingual Corpus of Web Fiction☆189Updated 7 months ago