BenChaliah / Superposition-Transformer
a novel architecture that leverages Autoencoders to superimpose the hidden representations of a base model and a fine-tuned model within a shared parameter space. Using B-spline-based blending coefficients and autoencoders that adaptively reconstruct the original hidden states based on the input data distribution.
☆42Updated last week
Alternatives and similar repositories for Superposition-Transformer:
Users that are interested in Superposition-Transformer are comparing it to the libraries listed below
- Official repository of DialSim☆16Updated 2 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 5 months ago
- MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities☆15Updated this week
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated last year
- [NeurIPS 2024] RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models☆14Updated 2 months ago
- ☆19Updated last month
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆29Updated 2 months ago
- RuleRAG: Rule-guided Retrieval-Augmented Generation with Language Models for Question Answering☆18Updated 2 months ago
- The official implementation of Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models.☆11Updated last month
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆23Updated 2 months ago
- ☆23Updated 4 months ago
- ☆16Updated 2 months ago
- LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba (Official Implementation)☆10Updated 2 months ago
- Code for paper: Long cOntext aliGnment via efficient preference Optimization☆13Updated this week
- ☆16Updated 6 months ago
- ☆29Updated this week
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆16Updated this week
- ☆13Updated 2 months ago
- ☆60Updated 3 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆15Updated last week
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆18Updated 2 months ago
- Official Implementation Of The Paper: `DeciMamba: Exploring the Length Extrapolation Potential of Mamba'☆22Updated 5 months ago
- Dataset of paper: On the Compositional Generalization of Multimodal LLMs for Medical Imaging☆28Updated 2 weeks ago
- Representing Rule-based Chatbots with Transformers☆19Updated 6 months ago
- ☆12Updated 2 months ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆46Updated 2 months ago
- The official repo of continuous speculative decoding☆21Updated 2 months ago
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆15Updated 3 months ago
- ☆21Updated 3 months ago
- Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxiang Li, Lu Yi…☆16Updated 3 weeks ago