Sara-Ahmed/ASiT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Sara-Ahmed/ASiT)

Sara-Ahmed / ASiT

ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation

☆30

Alternatives and similar repositories for ASiT

Users that are interested in ASiT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ta012 / DTFAT
View on GitHub
[AAAI 2024] DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification
☆12Mar 10, 2025Updated last year
y-chan / hifi-gan-misrnet
View on GitHub
unofficial pytorch implementation of HiFi-GAN with fast MISR.
☆15Mar 21, 2023Updated 3 years ago
haoheliu / diffres-python
View on GitHub
Learning differentiable temporal resolution on time-series data.
☆36Nov 12, 2022Updated 3 years ago
shengcanxu / canoSpeech
View on GitHub
text to speech
☆10Mar 19, 2024Updated 2 years ago
Sreyan88 / LAPE
View on GitHub
A unified framework for Low-resource Audio Processing and Evaluation (SSL Pre-training and Downstream Fine-tuning)
☆29Jul 9, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Sara-Ahmed / GMML
View on GitHub
☆14Aug 12, 2022Updated 3 years ago
mbrotos / SoundSeg
View on GitHub
Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation
☆13Feb 18, 2026Updated 5 months ago
Infinity-INF / fast-phasr
View on GitHub
Phonemes and durations labeling based on whisper small
☆11Jul 7, 2024Updated 2 years ago
YuanGongND / ssast
View on GitHub
Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
☆428Aug 14, 2022Updated 3 years ago
sony / diffiner
View on GitHub
☆68Aug 16, 2023Updated 2 years ago
icon-lab / HST
View on GitHub
Official implementation of Hierarchical Spectrogram Transformers (HST)
☆20Oct 10, 2022Updated 3 years ago
huutuongtu / Lightvoc
View on GitHub
LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM
☆18May 17, 2024Updated 2 years ago
umbertocappellazzo / PETL_AST
View on GitHub
This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" [IEEE MLSP 2024] …
☆41Jul 31, 2024Updated last year
dberghi / AV-SELD
View on GitHub
Python implementation of the paper "Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection"
☆31Apr 26, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
MorenoLaQuatra / ARCH
View on GitHub
ARCH: Audio Representations benCHmark
☆57Aug 26, 2024Updated last year
nttcslab / eval-audio-repr
View on GitHub
EVAR ~ Evaluation package for Audio Representations
☆81Feb 19, 2026Updated 5 months ago
audiodemo / voice-conversion
View on GitHub
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
☆17Aug 18, 2023Updated 2 years ago
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
SarthakYadav / audio-mamba-official
View on GitHub
Official implementation for our paper "Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations"
☆44Aug 14, 2025Updated 11 months ago
RicherMans / SAT
View on GitHub
Streaming Audiotransformers for online Audio tagging
☆57Jun 14, 2024Updated 2 years ago
brightwon / chord-generator-attention-lstm
View on GitHub
Keras implementation of "Chord Generation from Symbolic Melody Using BLSTM Networks"
☆13Aug 8, 2021Updated 4 years ago
vinceasvp / meta-sc
View on GitHub
☆11May 30, 2023Updated 3 years ago
WangHelin1997 / Aty-TTS
View on GitHub
Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech
☆11May 14, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
fss1t / CausalStarGANv2-VC
View on GitHub
☆22Apr 4, 2023Updated 3 years ago
RetroCirce / HTS-Audio-Transformer
View on GitHub
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
☆503Sep 18, 2025Updated 10 months ago
fschmid56 / PretrainedSED
View on GitHub
☆145May 13, 2025Updated last year
innnky / FreeSVC
View on GitHub
基于FreeVC的歌声转换
☆21Dec 16, 2022Updated 3 years ago
ishine / Mutiband-HIFIGAN
View on GitHub
Mutiband version of HIFIGAN
☆19Nov 6, 2020Updated 5 years ago
saurjya / EnsembleSep
View on GitHub
This branch of Asteroid contains code for the vocal harmony and chamber ensemble separation related papers.
☆12Nov 7, 2024Updated last year
RicherMans / Dasheng
View on GitHub
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
☆86Nov 7, 2025Updated 8 months ago
guilhermedonizetti / OCR_Python
View on GitHub
Aplicação em Python para Optical Character Recognition (OCR), uma técnica para extrair textos em imagens. Adicionalmente, o programa tent…
☆12Aug 13, 2021Updated 4 years ago
sungnyun / ARMHuBERT
View on GitHub
(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT
☆41Aug 29, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
chenllliang / CTDNN
View on GitHub
MMM 2021: Crossed-Time Delay Neural Network for Speaker Recognition
☆11Dec 4, 2021Updated 4 years ago
nttcslab / msm-mae
View on GitHub
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations
☆99Feb 20, 2026Updated 5 months ago
anton-kashkin / hifi_vc
View on GitHub
☆25Jan 24, 2023Updated 3 years ago
AlanBaade / MAE-AST-Public
View on GitHub
Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
☆93Jun 9, 2022Updated 4 years ago
DeepLearn-lab / Acoustic-Feature-Fusion_Chime18
View on GitHub
Code for our paper "Acoustic Features Fusion using Attentive Multi-channel Deep Architecture" in Keras and tensorflow
☆26Nov 23, 2018Updated 7 years ago
samsad35 / code-ancogen
View on GitHub
[ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
☆14Mar 11, 2025Updated last year
KentoNishi / torch-time-stretch
View on GitHub
Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included…
☆40Sep 5, 2022Updated 3 years ago