Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
☆58Nov 11, 2025Updated 3 months ago
Alternatives and similar repositories for retrofitting-recurrence
Users that are interested in retrofitting-recurrence are comparing it to the libraries listed below
Sorting:
- LCA-on-the-line (ICML 2024 Oral)☆13Feb 13, 2025Updated last year
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆28Aug 19, 2025Updated 6 months ago
- Transformers components but in Triton☆34May 9, 2025Updated 9 months ago
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆21Mar 15, 2025Updated 11 months ago
- Code and data for paper "(How) do Language Models Track State?"☆20Mar 31, 2025Updated 11 months ago
- Repository for Sparse Universal Transformers☆20Oct 23, 2023Updated 2 years ago
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆54Jan 12, 2026Updated last month
- This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.☆17Sep 13, 2024Updated last year
- Research work aimed at addressing the problem of modeling infinite-length context☆46Dec 18, 2025Updated 2 months ago
- Experiments on the impact of depth in transformers and SSMs.☆40Oct 23, 2025Updated 4 months ago
- The evaluation framework for training-free sparse attention in LLMs☆121Jan 27, 2026Updated last month
- Data Wrangling, Linear Models & other misc. Inferential Statistics.☆14Jul 16, 2022Updated 3 years ago
- Code for the paper Don't Pay Attention☆54Sep 25, 2025Updated 5 months ago
- defaultMODE is a Python framework for creating Discord AI agents with persistent memory and evolving behavior through brain-inspired sele…☆13Dec 18, 2025Updated 2 months ago
- ☆12Jul 7, 2022Updated 3 years ago
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆10Feb 13, 2024Updated 2 years ago
- A Swiss Army Knife for computational social choice research☆16Feb 23, 2026Updated last week
- Statistical discontinuous constituent parsing☆11Feb 15, 2018Updated 8 years ago
- Generating a cover letter using LLM given the job description and your resume☆10Feb 1, 2025Updated last year
- Repository for SPECTRA: Sparse Structured Text Rationalization, accepted at EMNLP 2021 main conference.☆10Feb 14, 2024Updated 2 years ago
- ☆12Jun 15, 2023Updated 2 years ago
- Official implementation of "Learning Proposals for Practical Energy-Based Regression", AISTATS 2022.☆13Feb 4, 2023Updated 3 years ago
- Code in support of the paper Continuous Mixtures of Tractable Probabilistic Models☆12Oct 12, 2024Updated last year
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Aug 6, 2024Updated last year
- JavaScript WebSocket Client☆12Mar 19, 2024Updated last year
- Source code for "N-ary Constituent Tree Parsing with Recursive Semi-Markov Model" published at ACL 2021☆10May 27, 2021Updated 4 years ago
- Advanced Formal Language Theory (263-5352-00L; Frühjahr 2023)☆10Feb 21, 2023Updated 3 years ago
- source code for NAACL2022 main conference "Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs"☆10Sep 26, 2022Updated 3 years ago
- Code for NeurIPS 2024 work "MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps"☆17Dec 11, 2024Updated last year
- [ICLR'25] "Understanding Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing" by Peihao Wang, Ruisi Cai, Yue…☆17Mar 21, 2025Updated 11 months ago
- Culturally-compliant video storage. Embeds searchable text chunks into pixelated media for lightning-fast semantic search. Zero-database,…☆20Jun 8, 2025Updated 8 months ago
- A minimal Rust library for styling terminal text using ANSI escape codes.☆26Aug 21, 2025Updated 6 months ago
- PyTorch implementation for PaLM: A Hybrid Parser and Language Model.☆10Jan 7, 2020Updated 6 years ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 7 months ago
- Static site generator framework tailored for building nerdish personal websites.☆28Feb 3, 2026Updated 3 weeks ago
- FPsolve: solver for polynomial equations over omega-continuous semirings☆11Aug 15, 2015Updated 10 years ago
- An implementation of the maxflow algorithm by Yuri Boykov and Vladimir Kolmogorov.☆12Nov 26, 2014Updated 11 years ago
- Manipulate tensors with PackedSequence and CattedSequence☆12Jan 4, 2026Updated last month
- Ilya Sutskever 推荐的30篇Deep learning 必读论文 (中英文对照翻译版)☆13Dec 18, 2024Updated last year