gkamradt / SnakeBench
☆84Updated last week
Alternatives and similar repositories for SnakeBench:
Users that are interested in SnakeBench are comparing it to the libraries listed below
- Repository for the paper Stream of Search: Learning to Search in Language☆145Updated 2 months ago
- Train your own SOTA deductive reasoning model☆88Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆171Updated 3 months ago
- A repository for research on medium sized language models.☆76Updated 11 months ago
- ☆56Updated last week
- ☆114Updated 2 months ago
- ☆85Updated 2 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆134Updated 5 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆139Updated this week
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆50Updated 4 months ago
- ☆48Updated 5 months ago
- Testing paligemma2 finetuning on reasoning dataset☆18Updated 3 months ago
- ☆101Updated 2 weeks ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆182Updated last week
- accompany material for sleep time compute paper☆17Updated last week
- ☆60Updated 11 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆88Updated this week
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆100Updated last week
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆13Updated 2 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆84Updated last month
- ☆107Updated 3 months ago
- From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging☆71Updated last month
- ☆24Updated 7 months ago
- Official repo of paper LM2☆37Updated 2 months ago
- RWKV-7: Surpassing GPT☆83Updated 5 months ago
- ☆142Updated 11 months ago
- Evaluating LLMs with CommonGen-Lite☆89Updated last year
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆60Updated 3 weeks ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆63Updated last month
- NanoGPT (124M) quality in 2.67B tokens☆28Updated last week