reka-ai / reka-vibe-evalLinks
Multimodal language model benchmark, featuring challenging examples
☆181Updated 10 months ago
Alternatives and similar repositories for reka-vibe-eval
Users that are interested in reka-vibe-eval are comparing it to the libraries listed below
Sorting:
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆110Updated 8 months ago
- LL3M: Large Language and Multi-Modal Model in Jax☆74Updated last year
- M4 experiment logbook☆57Updated 2 years ago
- ☆149Updated last year
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆117Updated 3 months ago
- Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M d…☆209Updated last year
- Self-Alignment with Principle-Following Reward Models☆169Updated last month
- Benchmarking LLMs with Challenging Tasks from Real Users☆244Updated last year
- ☆100Updated last year
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆179Updated 4 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Updated 11 months ago
- ☆88Updated this week
- This is the official repository for Inheritune.☆115Updated 9 months ago
- Language models scale reliably with over-training and on downstream tasks☆100Updated last year
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆56Updated 4 months ago
- [NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆127Updated last week
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆219Updated 3 weeks ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆144Updated last year
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆198Updated 2 weeks ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 10 months ago
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆161Updated last month
- Evaluating LLMs with fewer examples☆167Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆147Updated 11 months ago
- ☆81Updated this week
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆108Updated 8 months ago
- ☆156Updated last year
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆128Updated 6 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆143Updated last year
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆218Updated 2 weeks ago
- ☆129Updated last year