☆150Jan 4, 2024Updated 2 years ago
Alternatives and similar repositories for gemini-benchmark
Users that are interested in gemini-benchmark are comparing it to the libraries listed below
Sorting:
- ☆14Oct 28, 2023Updated 2 years ago
- 服务器 GPU 监控程序,当 GPU 属性满足预设条件时通过微信发送提示消息☆34Aug 10, 2021Updated 4 years ago
- THOUGHTSCULPT, a general reasoning and search method for complex tasks☆13Dec 13, 2024Updated last year
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated last year
- Latent Large Language Models☆19Aug 24, 2024Updated last year
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆44Apr 30, 2023Updated 2 years ago
- ☆55Apr 1, 2024Updated last year
- ☆101Dec 22, 2023Updated 2 years ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- ☆165Nov 23, 2024Updated last year
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Sep 26, 2024Updated last year
- Web-grounded natural language instructions☆18Nov 25, 2024Updated last year
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆44Feb 18, 2026Updated 2 weeks ago
- Official implementation of "Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought" (NeurIPS 2025)☆38Oct 8, 2025Updated 4 months ago
- [NeurlPS D&B 2024] Generative AI for Math: MathPile☆419Apr 4, 2025Updated 11 months ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆79Nov 14, 2024Updated last year
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆826Feb 3, 2025Updated last year
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆63Mar 26, 2024Updated last year
- 800,000 step-level correctness labels on LLM solutions to MATH problems☆2,094Jun 1, 2023Updated 2 years ago
- ☆33Jun 24, 2024Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆589Dec 9, 2024Updated last year
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- [NeurIPS 2023] Discover and Align Taxonomic Context Priors for Open-world Semi-Supervised Learning☆16Apr 15, 2024Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆149Oct 27, 2024Updated last year
- ☆134Dec 22, 2023Updated 2 years ago
- GPT-4V(ision) as A Social Media Analysis Engine☆38Dec 20, 2024Updated last year
- The implementation of the paper "Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters".☆17May 24, 2022Updated 3 years ago
- Multimodal computer agent data collection program☆164Dec 5, 2025Updated 2 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆64Oct 19, 2024Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆201Dec 8, 2025Updated 2 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆147Sep 20, 2024Updated last year
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆24Nov 23, 2022Updated 3 years ago
- [CVPR 2023] Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning☆22Jun 11, 2023Updated 2 years ago
- [ICLR 2024] Lemur: Open Foundation Models for Language Agents☆555Oct 28, 2023Updated 2 years ago
- Code for NeurIPS LLM Efficiency Challenge☆60Apr 9, 2024Updated last year
- ☆19May 23, 2024Updated last year
- A clone of OpenAI's Tokenizer page for HuggingFace Models☆46Nov 13, 2023Updated 2 years ago
- Data and tools for generating and inspecting OLMo pre-training data.☆1,416Nov 5, 2025Updated 3 months ago
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆329Oct 14, 2025Updated 4 months ago