ablghtianyi / ICL_Modular_Arithmetic
☆18Updated 4 months ago
Alternatives and similar repositories for ICL_Modular_Arithmetic:
Users that are interested in ICL_Modular_Arithmetic are comparing it to the libraries listed below
- ☆12Updated last year
- ☆29Updated 2 months ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆33Updated 4 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆72Updated 4 months ago
- The repository contains code for Adaptive Data Optimization☆20Updated 3 months ago
- ☆17Updated 8 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆43Updated 3 weeks ago
- Lightweight Adapting for Black-Box Large Language Models☆20Updated last year
- ☆28Updated 4 months ago
- Unofficial Implementation of Selective Attention Transformer☆16Updated 4 months ago
- ☆16Updated 2 months ago
- Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""☆11Updated 6 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆25Updated 2 weeks ago
- ☆24Updated last year
- ☆30Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆25Updated 8 months ago
- Universal Neurons in GPT2 Language Models☆27Updated 9 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆70Updated 3 months ago
- Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxiang Li, Lu Yi…☆18Updated 2 months ago
- [NeurIPS 2023 Spotlight] Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training☆32Updated last year
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Updated 9 months ago
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging☆51Updated 3 months ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆22Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆71Updated 4 months ago
- Stick-breaking attention☆48Updated this week