v-manhlt3/m-LTM-Audio-Text-Retrieval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/v-manhlt3/m-LTM-Audio-Text-Retrieval)

v-manhlt3 / m-LTM-Audio-Text-Retrieval

☆13

Alternatives and similar repositories for m-LTM-Audio-Text-Retrieval

Users that are interested in m-LTM-Audio-Text-Retrieval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

9rum / flatflow
View on GitHub
Fast and exact parallel training of neural networks
☆13Updated this week
XinhaoMei / audio-text_retrieval
View on GitHub
Implementation of our paper 'On Metric Learning For Audio-Text Cross-Modal Retrieval'
☆51May 17, 2022Updated 4 years ago
jaeyeonkim99 / EnCLAP
View on GitHub
Official Implementation of EnCLAP (ICASSP 2024)
☆96Jun 2, 2024Updated 2 years ago
JinhuaLiang / lam4fsl
View on GitHub
An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"
☆31May 31, 2023Updated 3 years ago
Lilidamowang / T2VIndexer-generativeSearch
View on GitHub
☆16Aug 28, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Labbeti / aac-metrics
View on GitHub
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
☆75Mar 22, 2026Updated 4 months ago
shinhyeokoh / rwen
View on GitHub
☆14Jun 16, 2023Updated 3 years ago
ZerQAQ / Zroutinue
View on GitHub
a coroutinue lib writen by pure C
☆10Feb 24, 2021Updated 5 years ago
Labbeti / conette-audio-captioning
View on GitHub
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
☆23Dec 17, 2025Updated 7 months ago
microsoft / AudioEntailment
View on GitHub
Audio Entailment: Deductive Reasoning for Audio Understanding
☆17Dec 10, 2024Updated last year
miccio-dk / NISQA
View on GitHub
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
☆16Apr 13, 2022Updated 4 years ago
WangHelin1997 / Aty-TTS
View on GitHub
Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech
☆11May 14, 2025Updated last year
kjw11 / CSEnet-ASR
View on GitHub
Cross-Speaker Encoding Network for Multi-talker Speech Recognition
☆12Mar 14, 2025Updated last year
llm-lab-org / CLASP
View on GitHub
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval
☆13Jun 27, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
maum-ai / sane-tts
View on GitHub
SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech
☆11Jun 30, 2023Updated 3 years ago
freds0 / kabooks
View on GitHub
KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…
☆13Mar 24, 2023Updated 3 years ago
pkufool / simple-wer
View on GitHub
A simple command line tool to calculate WER for ASR.
☆14Oct 14, 2024Updated last year
audiodemo / voice-conversion
View on GitHub
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
☆17Aug 18, 2023Updated 2 years ago
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
V-Sense / 360AudioVisual
View on GitHub
This repository contains materials for the paper: Towards generating ambisonics using audio-visual cue for virtual reality
☆13Jul 2, 2019Updated 7 years ago
ashi-ta / speechGLUE
View on GitHub
SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.
☆13Jun 2, 2023Updated 3 years ago
jaeyeonkim99 / visage
View on GitHub
Official implementation of "ViSAGe: Video-to-Spatial AUdio Generation" (ICLR 2025)
☆47Sep 10, 2025Updated 10 months ago
jiangshdd / ReviewCritique
View on GitHub
☆13Sep 26, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
huutuongtu / Lightvoc
View on GitHub
LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM
☆18May 17, 2024Updated 2 years ago
zelaki / DisfluentFA
View on GitHub
A Weakly Supervised Forced Alignment for disluent speech
☆15Nov 12, 2023Updated 2 years ago
maxencenoble / tree-diffusion-schrodinger-bridge
View on GitHub
Tree-Based Diffusion Schrödinger Bridge with Applications to Wasserstein Barycenters
☆10Mar 5, 2024Updated 2 years ago
Jackson-Kang / Prosody-augmentation-for-Text-to-speech
View on GitHub
Simple tool for speech dataset augmentation for modeling various prosodies.
☆14Jan 14, 2021Updated 5 years ago
GeWu-Lab / MWAFM
View on GitHub
Multi-Scale Attention for Audio Question Answering
☆28Jul 19, 2023Updated 3 years ago
HS-YN / PanoAVQA
View on GitHub
Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)
☆16Oct 12, 2021Updated 4 years ago
JmlrOrg / jmlr-coverletter
View on GitHub
JMLR Cover Letter Template
☆10Dec 15, 2021Updated 4 years ago
Labbeti / aac-datasets
View on GitHub
Audio Captioning datasets for PyTorch.
☆129Mar 25, 2026Updated 4 months ago
baptiste-genest / NESOTS
View on GitHub
Source code of the article "Non Euclidean Sliced Optimal Transort Sampling" published at Eurographics 2024, authors : Baptiste GENEST, Ni…
☆12Aug 28, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
ex3ndr / supervoice-hybrid
View on GitHub
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
☆26Aug 5, 2024Updated last year
zhai-lw / SQCodec
View on GitHub
A lightweight audio codec based on a single quantizer
☆72Aug 15, 2025Updated 11 months ago
minguinho26 / Prefix_AAC_ICASSP2023
View on GitHub
Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"
☆30Dec 6, 2023Updated 2 years ago
miaoYuanyuan / gen_melSpec_from_wav
View on GitHub
Thanks auspicious3000's greate work! https://github.com/auspicious3000/autovc This is the implementation of generating mel-spectrogram fr…
☆13Oct 21, 2019Updated 6 years ago
csmliu / pretrained-GANs
View on GitHub
A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration
☆17Jul 22, 2022Updated 4 years ago
exeex / vocoder_eva
View on GitHub
used to evaluate wavenet vocoder by rmse f0, MCD, rmse ap...
☆15Jan 20, 2020Updated 6 years ago
Madhuvod / VoxLingua
View on GitHub
A Model (maybe an app) that translates the audio of a video from one language to another language, cloning the voice of original video wi…
☆17May 19, 2025Updated last year