facebookresearch / PhysicsLM4Links
Physics of Language Models, Part 4
☆67Updated this week
Alternatives and similar repositories for PhysicsLM4
Users that are interested in PhysicsLM4 are comparing it to the libraries listed below
Sorting:
- ☆101Updated 10 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆116Updated 3 months ago
- ☆78Updated 5 months ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆178Updated last month
- A brief and partial summary of RLHF algorithms.☆131Updated 4 months ago
- The evaluation framework for training-free sparse attention in LLMs☆86Updated last month
- 🔥 A minimal training framework for scaling FLA models☆209Updated last month
- Code for "Reasoning to Learn from Latent Thoughts"☆114Updated 4 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆59Updated 5 months ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆60Updated 6 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆218Updated 4 months ago
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆56Updated 10 months ago
- ☆82Updated 11 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆77Updated 9 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆139Updated 10 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆92Updated last week
- Replicating O1 inference-time scaling laws☆89Updated 8 months ago
- The HELMET Benchmark☆161Updated 3 months ago
- Normalized Transformer (nGPT)☆185Updated 8 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆85Updated 7 months ago
- Understand and test language model architectures on synthetic tasks.☆221Updated 2 weeks ago
- Long Context Extension and Generalization in LLMs☆58Updated 10 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆78Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆82Updated 2 weeks ago
- Some preliminary explorations of Mamba's context scaling.☆216Updated last year
- Language models scale reliably with over-training and on downstream tasks☆97Updated last year
- [NeurIPS 2024] Low rank memory efficient optimizer without SVD☆30Updated last month
- AnchorAttention: Improved attention for LLMs long-context training☆212Updated 6 months ago
- ☆95Updated 3 months ago
- ☆187Updated 3 months ago