Alice in Wonderland code base for experiments and raw experiments data
☆130Feb 4, 2026Updated last month
Alternatives and similar repositories for AIW
Users that are interested in AIW are comparing it to the libraries listed below
Sorting:
- Train your own SOTA deductive reasoning model☆108Mar 6, 2025Updated last year
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆147Sep 20, 2024Updated last year
- Code and data from the paper 'Human Feedback is not Gold Standard'☆20Mar 6, 2026Updated 2 weeks ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆97May 16, 2025Updated 10 months ago
- tuimorphic choose-your-own-adventure story game☆18Mar 3, 2026Updated 2 weeks ago
- The original Shared Recurrent Memory Transformer implementation☆34Jul 11, 2025Updated 8 months ago
- Language models scale reliably with over-training and on downstream tasks☆100Apr 2, 2024Updated last year
- A testbed for agents and environments that can automatically improve models through data generation.☆28Mar 4, 2025Updated last year
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆98Nov 17, 2024Updated last year
- ☆16Jul 23, 2024Updated last year
- Un-*** 50 billions multimodality dataset☆23Sep 14, 2022Updated 3 years ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆33May 1, 2025Updated 10 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆157Apr 7, 2025Updated 11 months ago
- Code for reproducing the experiments on large-scale pre-training and transfer learning for the paper "Effect of large-scale pre-training …☆19May 29, 2022Updated 3 years ago
- ☆19Nov 4, 2025Updated 4 months ago
- Repository for the ACL 2024 conference website☆18Feb 3, 2025Updated last year
- Clue inspired puzzles for testing LLM deduction abilities☆46Updated this week
- Code for☆28Dec 16, 2024Updated last year
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆55Feb 22, 2025Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆62Dec 10, 2024Updated last year
- command loom interface☆111Feb 8, 2025Updated last year
- The message passing GAN https://arxiv.org/abs/2106.11535 and generative adversarial particle transformer https://arxiv.org/abs/2211.10295…☆13Jan 5, 2026Updated 2 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆149Oct 27, 2024Updated last year
- LLM Skirmish☆45Feb 3, 2026Updated last month
- ☆23Dec 17, 2024Updated last year
- ☆16Feb 22, 2025Updated last year
- ☆21Mar 14, 2026Updated last week
- ☆115Dec 1, 2024Updated last year
- Multimodal language model benchmark, featuring challenging examples☆185Dec 18, 2024Updated last year
- ☆40Jul 26, 2024Updated last year
- Translate Python code to Coq code for formal verification. Applied to the reference implementation of Ethereum in Python.☆42Sep 10, 2024Updated last year
- Demo of fine-tuning QA models for answering FAQ of cloud providers documentation☆11Mar 7, 2023Updated 3 years ago
- Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs. EMNLP 2024☆27Nov 13, 2024Updated last year
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆90Mar 18, 2025Updated last year
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Sep 26, 2024Updated last year
- ☆124Feb 21, 2025Updated last year
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Oct 3, 2024Updated last year
- ☆337Mar 5, 2026Updated 2 weeks ago
- ☆18Jul 10, 2024Updated last year