Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch
☆96Feb 24, 2025Updated last year
Alternatives and similar repositories for deep-cross-attention
Users that are interested in deep-cross-attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆41Dec 21, 2025Updated 3 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Jan 21, 2025Updated last year
- Explorations into whether a transformer with RL can direct a genetic algorithm to converge faster☆71May 18, 2025Updated 10 months ago
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper☆799Aug 15, 2025Updated 7 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Oct 15, 2025Updated 5 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch☆73Nov 18, 2025Updated 4 months ago
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆59May 31, 2025Updated 9 months ago
- open source alpha evolve☆68May 19, 2025Updated 10 months ago
- Unofficial implementation of GotenNet, new SOTA 3d equivariant transformer, in Pytorch☆67Apr 7, 2025Updated 11 months ago
- Implementation of the proposed minGRU in Pytorch☆319Dec 10, 2025Updated 3 months ago
- PyTorch Implementation of ViT-TTS (EMNLP'23)☆11Oct 20, 2023Updated 2 years ago
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated 11 months ago
- Implementation of the new SOTA for model based RL, from the paper "Improving Transformer World Models for Data-Efficient RL", in Pytorch☆153May 2, 2025Updated 10 months ago
- Implementation of the dynamic chunking mechanism in H-net by Hwang et al. of Carnegie Mellon☆68Feb 8, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Inference server for MioTTS, a lightweight and fast LLM-based TTS model.☆121Feb 14, 2026Updated last month
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- Official repository for the paper "Fast Predictive Uncertainty for Classification with Bayesian Deep Networks". Accepted at UAI 2022. htt…☆12May 25, 2022Updated 3 years ago
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆15Apr 30, 2025Updated 10 months ago
- Axial Positional Embedding for Pytorch☆84Feb 25, 2025Updated last year
- recipe for training fully-featured self supervised image jepa models☆12Jun 4, 2025Updated 9 months ago
- FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS☆24Sep 9, 2024Updated last year
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- Implementation of Kronecker Attention in Pytorch☆19Sep 12, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- The implementation of g2pL with a new open dataset.☆16May 14, 2023Updated 2 years ago
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- ☆88Jan 23, 2025Updated last year
- Code for the paper AdvST: Revisiting Data Augmentations for Single Domain Generalization (AAAI 2024)☆13May 6, 2024Updated last year
- Official PyTorch Implementation of "Diffusion Autoencoders are Scalable Image Tokenizers"☆166Jan 31, 2025Updated last year
- Integrates Imbue's Cost Aware pareto-Region Bayesian Search (CARBS) with Weights and Biases (WanDB)☆12Mar 17, 2025Updated last year
- ☆22Sep 16, 2025Updated 6 months ago
- Test-Time Memory Framework: Control Hallucinations in Foundation Models☆11Nov 4, 2025Updated 4 months ago
- Trying to deconstruct RWKV in understandable terms☆14May 6, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A simple Transformer where the softmax has been replaced with normalization☆20Sep 11, 2020Updated 5 years ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- 来自于文章Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition☆27Nov 20, 2024Updated last year
- A neural speech codec based on discrete WavLM representations☆26Aug 28, 2024Updated last year
- Contrastive Reinforcement Learning☆58Jan 31, 2026Updated last month
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"☆70Apr 10, 2023Updated 2 years ago
- ☆23Oct 17, 2024Updated last year