ablghtianyi / ICL_Modular_Arithmetic
☆18Updated 4 months ago
Alternatives and similar repositories for ICL_Modular_Arithmetic:
Users that are interested in ICL_Modular_Arithmetic are comparing it to the libraries listed below
- ☆18Updated 8 months ago
- ☆30Updated 2 months ago
- ☆16Updated last month
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆35Updated 4 months ago
- Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""☆11Updated 7 months ago
- ☆22Updated last month
- ☆12Updated last year
- The repository contains code for Adaptive Data Optimization☆20Updated 3 months ago
- ☆30Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆44Updated last month
- Lightweight Adapting for Black-Box Large Language Models☆21Updated last year
- Stick-breaking attention☆48Updated last week
- ☆18Updated last month
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆26Updated 11 months ago
- source code for paper "Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models"☆23Updated 9 months ago
- ☆30Updated 5 months ago
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆25Updated last month
- Universal Neurons in GPT2 Language Models☆27Updated 9 months ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆21Updated last month
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆26Updated last month
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆23Updated this week
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Updated 9 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆74Updated 5 months ago
- ☆47Updated 7 months ago
- ☆25Updated last month
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆23Updated 8 months ago
- Test-time-training on nearest neighbors for large language models☆39Updated 11 months ago
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆19Updated 3 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆71Updated 4 months ago
- Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…☆15Updated 4 months ago