reka-ai / reka-vibe-evalLinks
Multimodal language model benchmark, featuring challenging examples
☆168Updated 5 months ago
Alternatives and similar repositories for reka-vibe-eval
Users that are interested in reka-vibe-eval are comparing it to the libraries listed below
Sorting:
- LL3M: Large Language and Multi-Modal Model in Jax☆72Updated last year
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆110Updated 2 weeks ago
- Self-Alignment with Principle-Following Reward Models☆161Updated 3 weeks ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆223Updated 7 months ago
- M4 experiment logbook☆57Updated last year
- ☆174Updated last month
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆84Updated last year
- ☆97Updated 11 months ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆101Updated 2 months ago
- Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆49Updated this week
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆136Updated 8 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains☆126Updated this week
- Replicating O1 inference-time scaling laws☆87Updated 6 months ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆90Updated 3 months ago
- The HELMET Benchmark☆149Updated last month
- EvaByte: Efficient Byte-level Language Models at Scale☆98Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆171Updated 4 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆207Updated this week
- ☆63Updated 8 months ago
- ☆92Updated 8 months ago
- ☆67Updated 2 months ago
- This is the official repository for Inheritune.☆111Updated 3 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆55Updated 3 months ago
- ☆46Updated 3 months ago
- Language models scale reliably with over-training and on downstream tasks☆97Updated last year
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆170Updated 5 months ago
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆187Updated 10 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆53Updated 7 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆51Updated 5 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆146Updated 4 months ago