SAILResearch / awesome-foundation-model-leaderboards
A curated list of awesome leaderboard-oriented resources for foundation models
☆241Updated last week
Alternatives and similar repositories for awesome-foundation-model-leaderboards:
Users that are interested in awesome-foundation-model-leaderboards are comparing it to the libraries listed below
- The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://a…☆286Updated 2 months ago
- "GraphAgent: Agentic Graph Language Assistant"☆235Updated 2 weeks ago
- AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning (NeurIPS 2024)☆171Updated this week
- The official repo for paper, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods.☆231Updated 3 weeks ago
- MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler a…☆166Updated 2 months ago
- "AnyGraph: Graph Foundation Model in the Wild"☆186Updated 3 months ago
- Pytorch Library for Relational Table Learning with LLMs.☆256Updated this week
- Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasonin…☆149Updated last month
- The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"☆151Updated 3 weeks ago
- ☆192Updated this week
- Grimoire is All You Need for Enhancing Large Language Models☆109Updated 10 months ago
- TxBKG - Knowledge Graph Generation for Any PDFs☆181Updated last month
- 使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力☆135Updated 6 months ago
- The official implementation of Self-Play Preference Optimization (SPPO)☆452Updated last month
- [KDD 2024]this is project for training explicit graph reasoning large language models.☆83Updated 3 weeks ago
- Benchmarking LLMs via Uncertainty Quantification☆201Updated 11 months ago
- "MiniRAG: Making RAG Simpler with Small and Free Language Models"☆215Updated this week
- A Contamination-free Multi-task Language Understanding Benchmark☆110Updated last week
- Multilingual Corpus of Web Fiction☆187Updated 6 months ago
- LLM Benchmark for Code☆31Updated 5 months ago
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models☆157Updated 2 months ago
- A deployment, monitoring and autoscaling service towards serverless LLM serving.☆144Updated 3 weeks ago
- This includes the original implementation of CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control.☆56Updated 3 months ago
- We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that …☆93Updated last year
- A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.☆71Updated 6 months ago
- An interpretable large language model (LLM) for medical diagnosis.☆106Updated 4 months ago
- [NeurIPS 2024] BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models☆228Updated last month
- This tool(enhance_long) aims to enhance the LlaMa2 long context extrapolation capability in the lowest-cost approach, preferably without …☆44Updated last year
- This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use".☆179Updated this week
- [ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.☆159Updated 2 months ago