EvolvingLMMs-Lab/Aero-1

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/EvolvingLMMs-Lab/Aero-1)

EvolvingLMMs-Lab / Aero-1

☆79

Alternatives and similar repositories for Aero-1

Users that are interested in Aero-1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

synvo-ai / local-cocoa
View on GitHub
A local AI assistant running on your device. It turns your files into actionable memory.
☆55Mar 24, 2026Updated 4 months ago
EvolvingLMMs-Lab / sae
View on GitHub
A framework that allows you to apply Sparse AutoEncoder on any models
☆53Jul 11, 2025Updated last year
EvolvingLMMs-Lab / engram
View on GitHub
Privacy-first AI memory layer - Signal for AI Memory. E2EE, local-first, works with Claude, Cursor, and any MCP-compatible AI.
☆23Jun 12, 2026Updated last month
pufanyi / syphus
View on GitHub
Syphus: Automatic Instruction-Response Generation Pipeline
☆14Dec 14, 2023Updated 2 years ago
EvolvingLMMs-Lab / VideoMMMU
View on GitHub
Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
☆72Sep 5, 2025Updated 10 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Luodian / nano-hevc
View on GitHub
A minimal, educational HEVC (H.265) encoder written in Python.
☆53Feb 23, 2026Updated 5 months ago
ddlBoJack / Omni-Captioner
View on GitHub
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
☆142Apr 7, 2026Updated 3 months ago
penghao-wu / ProxyV
View on GitHub
[ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
☆20May 22, 2025Updated last year
synvo-ai / HippoCamp
View on GitHub
A benchmark for evaluating contextual agents on realistic multimodal personal-computer environments with profiling and factual-retention …
☆29Apr 2, 2026Updated 3 months ago
slp-rl / SLM-Discrete-Representations
View on GitHub
This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…
☆20Jan 3, 2023Updated 3 years ago
EvolvingLMMs-Lab / OpenMMReasoner
View on GitHub
[CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
☆164Mar 30, 2026Updated 3 months ago
penghao-wu / visual_jigsaw
View on GitHub
☆78Apr 9, 2026Updated 3 months ago
EvolvingLMMs-Lab / multimodal-sae
View on GitHub
[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆199Sep 26, 2025Updated 10 months ago
Hunterhuan / sphereface2_speaker_verification
View on GitHub
Exploring Binary Classification Loss for Speaker Verification
☆18Jul 18, 2023Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
cnaigithub / SpeechDewarping
View on GitHub
Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023
☆27Apr 27, 2023Updated 3 years ago
Nicous20 / FunQA
View on GitHub
FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …
☆104Dec 25, 2025Updated 7 months ago
NVIDIA / audio-intelligence
View on GitHub
Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with syntheti…
☆137Mar 3, 2026Updated 4 months ago
EvolvingLMMs-Lab / EgoLife
View on GitHub
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
☆451Mar 19, 2025Updated last year
AlanBaade / SyllableLM
View on GitHub
Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models
☆63Jul 1, 2025Updated last year
EvolvingLMMs-Lab / MGPO
View on GitHub
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆55Jul 23, 2025Updated last year
dongyh20 / Demo-ICL
View on GitHub
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition
☆41Mar 3, 2026Updated 4 months ago
merlresearch / sebbs
View on GitHub
Prediction of sound event bounding boxes (SEBBs)
☆35Aug 2, 2024Updated last year
ASLP-lab / FlashTTS
View on GitHub
Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
☆67Jun 16, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
LAION-AI / scaled-echo-tts
View on GitHub
Scaled diffusion transformer for text-to-speech synthesis (DiT + T5Gemma2 conditioning, TorchTitan & Megatron backends, tested up to 1024…
☆24Mar 29, 2026Updated 3 months ago
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
MotrixLab / InfiniteDance
View on GitHub
☆36Jul 10, 2026Updated 2 weeks ago
ardamamur / EgoExOR
View on GitHub
Official code of the paper "EgoExOR: EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding" accepted at …
☆28May 6, 2026Updated 2 months ago
XinyuZhou2000 / Spoken-Dialogue
View on GitHub
☆18Dec 7, 2023Updated 2 years ago
zeyuxie29 / AudioTime
View on GitHub
☆39Jul 4, 2024Updated 2 years ago
EvolvingLMMs-Lab / multimodal-search-r1
View on GitHub
[ACL-2026] MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal…
☆470Apr 7, 2026Updated 3 months ago
ChenYi99 / EgoPlan
View on GitHub
[IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
☆85Dec 6, 2024Updated last year
RanaCM / DSU-AVO
View on GitHub
Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023
☆12May 13, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
inclusionAI / Ming-UniAudio
View on GitHub
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
☆451Nov 27, 2025Updated 8 months ago
camenduru / TANGO-jupyter
View on GitHub
☆13Oct 14, 2024Updated last year
ZhikangNiu / A-DMA
View on GitHub
[INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"
☆67Jun 16, 2025Updated last year
qiuqiangkong / audio_understanding
View on GitHub
☆131Feb 6, 2025Updated last year
arthur-qiu / FreeTraj
View on GitHub
Code for FreeTraj, a tuning-free method for trajectory-controllable video generation
☆114Sep 19, 2025Updated 10 months ago
EvolvingLMMs-Lab / OneVision-Encoder
View on GitHub
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆386Jun 20, 2026Updated last month
cliangyu / Cola
View on GitHub
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆106Nov 9, 2023Updated 2 years ago