Generative Fusion Decoding (GFD) is a novel framework for integrating Large Language Models (LLMs) into multi-modal text recognition systems like ASR and OCR, improving performance and efficiency by enabling seamless fusion without requiring re-training.
☆87Jul 31, 2025Updated 8 months ago
Alternatives and similar repositories for generative-fusion-decoding
Users that are interested in generative-fusion-decoding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ⚙️Tool for NLP - handle file and text☆15Feb 16, 2025Updated last year
- Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5☆19Nov 29, 2022Updated 3 years ago
- PyTorch toolkit for streaming speech recognition, speech translation and simultaneous translation based on fairseq.☆25Oct 3, 2022Updated 3 years ago
- 聯發創新基地(MediaTek Research) 致力於研究基礎模型。我們將研究體現在適合繁體中文使用者的模型上,並在使用權許可的情況下,提供模型給學術界研究或產業界使用。☆269Sep 8, 2025Updated 7 months ago
- A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenizat…☆115Sep 3, 2025Updated 7 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 一小時 No-Code 自製客服機器人 GPT☆17May 28, 2024Updated last year
- sharing and learning python skills☆15Jun 19, 2023Updated 2 years ago
- Code repository for the paper "Improving End-to-End SLU performance with Prosodic Attention and Distillation" accepted at Interspeech 202…☆27May 17, 2023Updated 2 years ago
- REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR☆15Dec 11, 2024Updated last year
- Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition☆19Jul 16, 2024Updated last year
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- GraphRAG indexed files & visual tutorials for cost-saving, quick start!☆33Jul 22, 2024Updated last year
- 11th of ironman contest, talking about k8s introduction and using it on EKS.☆11Jun 8, 2025Updated 10 months ago
- ☆13Sep 25, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/ab…☆36Feb 10, 2024Updated 2 years ago
- Accompanying code for paper "Attention-Based Contextual Language Model Adaptation for Speech Recognition", submitted to ACL 2021.☆14Jul 25, 2023Updated 2 years ago
- provide SPHERE-formatted output as well as RIFF, AU, AIFF and raw☆14Dec 18, 2021Updated 4 years ago
- ☆237Aug 25, 2025Updated 7 months ago
- ☆17Jul 22, 2024Updated last year
- ☆17May 5, 2024Updated last year
- Yet another frontend for LLM, written using .NET and WinUI 3☆11Sep 14, 2025Updated 7 months ago
- scrap stock information and send to me daily☆19Feb 2, 2025Updated last year
- This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…☆20Jan 3, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- Unsupervised spoken sentence embeddings☆14Dec 14, 2022Updated 3 years ago
- A Multi-Format Transfer Learning Model for Event Argument Extraction via Variational Information Bottleneck☆10Sep 9, 2022Updated 3 years ago
- This repository collects drivers for peripherals used in Arduino with LinkIt 7697.☆12Aug 14, 2018Updated 7 years ago
- ☆31Jul 13, 2023Updated 2 years ago
- LINEBot☆13Apr 7, 2025Updated last year
- Textless (ASR-transcript free) Spoken Question Answering. The official release of NMSQA dataset and the implementation of "DUAL: Textless…☆35Aug 10, 2023Updated 2 years ago
- Fine-Tune Model Data Arrangement/Annotation via Simon's tool.☆11Aug 6, 2025Updated 8 months ago
- A Model Agnostic function to directly remove specified layers from the LLM☆10May 23, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra☆16Dec 10, 2024Updated last year
- Taiwanese Translation with BERT based model and RNN. Collection of Taiwanese text corpus☆13Oct 15, 2022Updated 3 years ago
- Technical Analysis on Cryptocurrency☆25Oct 14, 2025Updated 6 months ago
- ☆13Jan 9, 2024Updated 2 years ago
- A publishing website of a table collecting meta-learning-related papers in the area of human language processing.☆17Aug 2, 2021Updated 4 years ago
- Awesome Traditional Chinese Datasets☆48Dec 25, 2025Updated 3 months ago
- Repository for "LLM-based speaker diarization correction: A generalizable approach" paper☆21Jul 31, 2024Updated last year