Beyond Straight-Through
☆110Apr 29, 2023Updated 3 years ago
Alternatives and similar repositories for ReinMax
Users that are interested in ReinMax are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Mar 4, 2022Updated 4 years ago
- Sparse Backpropagation for Mixture-of-Expert Training☆30Jul 2, 2024Updated 2 years ago
- Code for Mind the Label Shift of Augmentation-based Graph OOD generalization (LiSA) in CVPR 2023. LiSA is a model-agnostic Graph OOD fram…☆16Jun 24, 2023Updated 3 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- [ICLR 2023, ICLR DG oral] PAIR, the optimizer and model selection criteria for OOD Generalization☆54Apr 12, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Jan 4, 2024Updated 2 years ago
- Chrome extension for OA sites like arxiv, openreivew: 1. PDF back to abstract page, 2. Rename PDF page with paper title.☆18Oct 12, 2023Updated 2 years ago
- 基于 mirai, Graia 的 QQ 机器人,可执行 Python, Mathematica, C++ 等代码,可以调用 Copilot 补全、Stable Diffusion (NovelAI) 文字转图片☆16Oct 14, 2022Updated 3 years ago
- Recursive Bayesian Networks☆11May 11, 2025Updated last year
- ☆32Jan 7, 2024Updated 2 years ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆139Apr 30, 2024Updated 2 years ago
- ☆18Apr 19, 2024Updated 2 years ago
- The repository for 'Unsupervised Learning for Combinatorial Optimization with Principled Proxy Design'☆16Oct 9, 2022Updated 3 years ago
- ☆29Jul 12, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆21Mar 4, 2024Updated 2 years ago
- The demo for "Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem".☆12Oct 25, 2021Updated 4 years ago
- ☆23Jul 23, 2021Updated 4 years ago
- ☆31Jun 28, 2022Updated 4 years ago
- We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench show…☆65Feb 4, 2026Updated 5 months ago
- Sampling-Based Minimum Bayes-Risk Decoding for Neural Machine Translation☆16Oct 14, 2022Updated 3 years ago
- ☆54Jan 19, 2023Updated 3 years ago
- Directed masked autoencoders☆15Mar 25, 2026Updated 3 months ago
- ☆18Mar 10, 2023Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Code for the note "NF4 Isn't Information Theoretically Optimal (and that's Good)☆20Jun 22, 2023Updated 3 years ago
- [NeurIPS 2022] Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs☆123Aug 28, 2023Updated 2 years ago
- Open-Pandora: On-the-fly Control Video Generation☆35Nov 28, 2024Updated last year
- An implementation of the DISP-LLM method from the NeurIPS 2024 paper: Dimension-Independent Structural Pruning for Large Language Models.☆24Aug 6, 2025Updated 10 months ago
- AdaCat☆48Aug 4, 2022Updated 3 years ago
- maximal update parametrization (µP)☆1,734Jul 17, 2024Updated last year
- [VLDB'22] SUREL is a novel walk-based computation framework for efficient subgraph-based graph representation learning.☆20Apr 10, 2025Updated last year
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- Here we collect trick questions and failed tasks for open source LLMs to improve them.☆32Apr 20, 2023Updated 3 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- 🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts☆41Sep 29, 2024Updated last year
- ☆10May 9, 2021Updated 5 years ago
- Implementation of "Analyzing and Improving the Training Dynamics of Diffusion Models"☆96Feb 12, 2024Updated 2 years ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆12Mar 14, 2025Updated last year
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆18May 29, 2023Updated 3 years ago
- Implementation of Bit Diffusion, Hinton's group's attempt at discrete denoising diffusion, in Pytorch☆357Oct 14, 2023Updated 2 years ago
- [ICML 2024 Oral] LSH-Based Efficient Point Transformer (HEPT)☆27Jan 24, 2025Updated last year