DaShenZi721 / HRA
☆27Updated last month
Alternatives and similar repositories for HRA:
Users that are interested in HRA are comparing it to the libraries listed below
- source code for paper "Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models"☆24Updated 10 months ago
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆22Updated 10 months ago
- ☆18Updated last month
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆30Updated 5 months ago
- Official Code Repository for the paper "Continuous Diffusion Model for Language Modeling".☆25Updated last month
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 6 months ago
- PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes [EMNLP 2024]☆25Updated 5 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆66Updated 6 months ago
- ☆27Updated this week
- Official PyTorch implementation for "Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data" (ICLR…☆37Updated 2 months ago
- Pytorch implementation of ICML-2024 "Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching"☆24Updated 10 months ago
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆33Updated last week
- Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)☆28Updated last year
- A list of papers for group meeting☆16Updated 3 months ago
- Codes for Merging Large Language Models☆29Updated 8 months ago
- Data distillation benchmark☆58Updated last week
- Official repository of the "Transformer Fusion with Optimal Transport" paper, published as a conference paper at ICLR 2024.☆27Updated last year
- Welcome to the 'In Context Learning Theory' Reading Group☆27Updated 5 months ago
- source code of (quasi-)Givens Orthogonal Fine Tuning integrated to peft lib☆16Updated last month
- ☆10Updated 2 months ago
- Listing some diffusion papers in NLP domain I have read, text generation is main, table will continue to be updated.☆40Updated last month
- AnchorAttention: Improved attention for LLMs long-context training☆206Updated 3 months ago
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆40Updated last year
- [NeurIPS 2023 Spotlight] Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training☆34Updated 2 weeks ago
- [ICLR'24] "DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training" by Aochuan Chen*, Yimeng Zhang*, Jinghan Jia, James Di…☆57Updated 6 months ago
- Official PyTorch implementation for "Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations"☆37Updated last year
- Code for "Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective"☆20Updated last year
- Official implementation of GOAT model (ICML2023)☆37Updated last year
- Code Repository for the NeurIPS 2022 paper: "Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights".☆16Updated 9 months ago
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆56Updated last month