aiwaves-cn / Dive-into-LLMs
The official github repo for the open online courses: "Dive into LLMs".
☆10Updated last year
Alternatives and similar repositories for Dive-into-LLMs:
Users that are interested in Dive-into-LLMs are comparing it to the libraries listed below
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆23Updated 6 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆24Updated 5 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆18Updated 3 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆30Updated 2 weeks ago
- o1 Chain of Thought Examples☆33Updated 5 months ago
- ☆52Updated 3 weeks ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆97Updated 3 weeks ago
- ☆34Updated 3 months ago
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆39Updated 4 months ago
- ☆22Updated 3 months ago
- The official GitHub page for paper "NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional St…☆21Updated 10 months ago
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆19Updated last year
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆25Updated 3 weeks ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- ☆24Updated 8 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆33Updated last year
- A framework for evolving and testing question-answering datasets with various models.☆14Updated last year
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆58Updated 3 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆35Updated last month
- ☆29Updated 4 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆23Updated 2 weeks ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated last month
- official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"☆59Updated last year
- Codebase for Instruction Following without Instruction Tuning☆33Updated 6 months ago
- ☆13Updated last month
- The official Github repository for paper "R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation" (EMNLP 2024 Fin…☆30Updated 3 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆46Updated 3 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆50Updated 5 months ago