Beyond Straight-Through
☆109Apr 29, 2023Updated 2 years ago
Alternatives and similar repositories for ReinMax
Users that are interested in ReinMax are comparing it to the libraries listed below
Sorting:
- Sparse Backpropagation for Mixture-of-Expert Training☆29Jul 2, 2024Updated last year
- ☆14Mar 4, 2022Updated 3 years ago
- [ICLR 2023, ICLR DG oral] PAIR, the optimizer and model selection criteria for OOD Generalization☆54Apr 12, 2024Updated last year
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Jan 4, 2024Updated 2 years ago
- Simple notebooks to learn diffusion models on toy datasets☆17Feb 9, 2023Updated 3 years ago
- Code for Mind the Label Shift of Augmentation-based Graph OOD generalization (LiSA) in CVPR 2023. LiSA is a model-agnostic Graph OOD fram…☆16Jun 24, 2023Updated 2 years ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆138Apr 30, 2024Updated last year
- Code relative to "Adversarial robustness against multiple and single $l_p$-threat models via quick fine-tuning of robust classifiers"☆19Nov 30, 2022Updated 3 years ago
- Directed masked autoencoders☆14Feb 20, 2026Updated last week
- ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.☆63Nov 18, 2021Updated 4 years ago
- Codebase for the EMNLP 2021 paper "HittER: Hierarchical Transformers for Knowledge Graph Embeddings".☆12Nov 1, 2021Updated 4 years ago
- The demo for "Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem".☆12Oct 25, 2021Updated 4 years ago
- ☆23Jul 23, 2021Updated 4 years ago
- Here we collect trick questions and failed tasks for open source LLMs to improve them.☆32Apr 20, 2023Updated 2 years ago
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- Official implementation of Our NeurIPS 2024 Paper "Boundary Matters: A Bi-Level Active Finetuning Method"☆14Feb 11, 2025Updated last year
- Recursive Bayesian Networks☆11May 11, 2025Updated 9 months ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- ☆32Jan 7, 2024Updated 2 years ago
- ☆29Jul 12, 2022Updated 3 years ago
- University of Illinois letterhead template☆13Jun 4, 2019Updated 6 years ago
- ☆19Mar 28, 2022Updated 3 years ago
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- Materials of public talks given By SJTU X-LANCE members☆14Dec 3, 2022Updated 3 years ago
- What Has Been Enhanced in my Knowledge-Enhanced Language Model?☆13Oct 26, 2022Updated 3 years ago
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆87Mar 7, 2023Updated 2 years ago
- Open-Pandora: On-the-fly Control Video Generation☆35Nov 28, 2024Updated last year
- Torch-based tool for quantizing high-dimensional vectors using additive codebooks☆54May 25, 2022Updated 3 years ago
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆59Oct 29, 2023Updated 2 years ago
- ☆32Dec 29, 2020Updated 5 years ago
- 逻辑回归和单层softmax的解析解☆12Jul 29, 2021Updated 4 years ago
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆28Aug 19, 2025Updated 6 months ago
- Sampling-Based Minimum Bayes-Risk Decoding for Neural Machine Translation☆16Oct 14, 2022Updated 3 years ago
- The official implementation of AAAI'24 paper: Self-Interpretable Graph Learning with Sufficient and Necessary Explanations.☆14Jan 29, 2024Updated 2 years ago
- Implementation of "Analyzing and Improving the Training Dynamics of Diffusion Models"☆97Feb 12, 2024Updated 2 years ago
- ☆18Mar 10, 2023Updated 2 years ago
- Prototype for a Category Theory-based GNN Library☆15Apr 20, 2022Updated 3 years ago
- list of related work on AI DJ research☆15Apr 4, 2020Updated 5 years ago