The official implementation of ImageBind-LLM and Whisper-LLM from the paper "Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech".
☆21Oct 30, 2023Updated 2 years ago
Alternatives and similar repositories for multimodal-llama
Users that are interested in multimodal-llama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- AdvSV stands as the first dataset developed specifically for evaluating Speaker Verification (SV) systems against adversarial attacks. I…☆11Nov 21, 2023Updated 2 years ago
- This repository is the open source code for our latest feasibility work: "Human Anomalous Gait Termination Recognition Via Through-the-Wa…☆26Jun 4, 2025Updated last year
- This is the implementation of the paper "Adversarial Attacks on Spoofing Countermeasures of automatic speaker verification".☆42Mar 9, 2023Updated 3 years ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆58Apr 17, 2024Updated 2 years ago
- ☆11May 7, 2022Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The goal of the OSSCI Fleet is to provide a central mechanism to enable test automation, batch job scheduling, and developer access to a …☆13Apr 28, 2026Updated last month
- Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024☆16Nov 19, 2024Updated last year
- Super Flappy Bird in p5.js☆10Mar 8, 2021Updated 5 years ago
- A casual and simple ChatGPT Python script that can run using terminal (as long as you have an API). Support Azure API.☆20May 3, 2025Updated last year
- Official implementation of the ICASSP 2023 paper "HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields"☆27Dec 3, 2023Updated 2 years ago
- The alignment of feature distributions between visible and thermal domains is crucial for achieving effective object detection. Our propo…☆14Jul 16, 2025Updated 10 months ago
- ☆14Dec 15, 2022Updated 3 years ago
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆271May 19, 2024Updated 2 years ago
- Multimodal Home Activity Dataset with Multi-Angle Videos and Synchronized Physiological Signals☆21Dec 21, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆19Feb 15, 2023Updated 3 years ago
- A high-performance acceleration library dedicated to large-scale model training on AMD GPUs☆64Updated this week
- Casually implementation a classic metric about clustering☆12Mar 14, 2023Updated 3 years ago
- ☆14Mar 23, 2023Updated 3 years ago
- COG-MHEAR Audio-Visual Speech Enhancement Challenge☆48Feb 17, 2026Updated 3 months ago
- A complete academic research Skill suite. Supports Claude Code, ChatGPT / Codex CLI, and Gemini CLI.☆89Apr 4, 2026Updated 2 months ago
- Official repository for the paper Multimodal Transformer Distillation for Audio-Visual Synchronization (ICASSP 2024).☆29Apr 3, 2024Updated 2 years ago
- A Pytorch implementation of the paper : SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification☆34Jun 25, 2021Updated 4 years ago
- Random collections of my interested research papers / projects☆20May 20, 2021Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.☆231Jun 4, 2026Updated last week
- Changes in this fork has been merged to upstream.☆16Jun 10, 2025Updated last year
- Full stack data-science project☆12Jan 13, 2022Updated 4 years ago
- Multimodal Instruction Tuning for Llama 3☆52Apr 25, 2024Updated 2 years ago
- "Tail-Aware Sperm Analysis for Transparent Tracking of Spermatozoa" Official Implementation☆10Jan 21, 2026Updated 4 months ago
- The official repository of Dynamic-SUPERB.☆200Jun 24, 2025Updated 11 months ago
- Source code for "BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement"☆14Feb 13, 2022Updated 4 years ago
- Yeast, a lite and light beamer theme☆18Dec 6, 2020Updated 5 years ago
- ☆29Jun 21, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Add function calling to text-generation-inference☆13Oct 10, 2023Updated 2 years ago
- Examples in the MLX framework☆11Sep 23, 2024Updated last year
- Code for EMNLP 2023 findings paper "A Closer Look into Using Large Language Models for Automatic Evaluation"☆19Oct 9, 2023Updated 2 years ago
- Get up and running with Llama 2 and other large language models locally☆15Jun 8, 2026Updated last week
- Adaptive Multimodal Reasoning via Reinforcement Learning☆23Jan 11, 2026Updated 5 months ago
- This is a chatbot built using Gradio that can access Google Search and webpages to answer questions. Supports GPT-3.5, GPT-4, Claude 2, …☆13Aug 31, 2023Updated 2 years ago
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM☆17Nov 7, 2024Updated last year