BenChaliah / Superposition-Transformer
a novel architecture that leverages Autoencoders to superimpose the hidden representations of a base model and a fine-tuned model within a shared parameter space. Using B-spline-based blending coefficients and autoencoders that adaptively reconstruct the original hidden states based on the input data distribution.
☆43Updated 4 months ago
Alternatives and similar repositories for Superposition-Transformer
Users that are interested in Superposition-Transformer are comparing it to the libraries listed below
Sorting:
- ☆11Updated last month
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Updated 10 months ago
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆14Updated 6 months ago
- Control LLM☆14Updated last month
- Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024☆15Updated 6 months ago
- Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆15Updated 2 months ago
- Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models☆13Updated last month
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆14Updated last week
- ☆13Updated 10 months ago
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆20Updated 9 months ago
- [ACM MM 2023] Official PyTorch implementation of "Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Reco…☆11Updated last year
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆13Updated last month
- The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"☆27Updated last year
- Add n-gram and large language model support to Whisper models.☆15Updated 2 weeks ago
- Code for "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆15Updated last month
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆34Updated 3 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆26Updated 6 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆14Updated 11 months ago
- User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…☆15Updated 2 weeks ago
- Official implementation of ECCV24 paper: POA☆24Updated 9 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 9 months ago
- Repo of FocusedAD☆12Updated last month
- ☆16Updated 9 months ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆16Updated last year
- MIO: A Foundation Model on Multimodal Tokens☆25Updated 5 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated last year
- ☆18Updated 2 months ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆29Updated last month
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆42Updated 2 weeks ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆14Updated 3 weeks ago