daniel3303 / StoryReasoningLinks
Code for the paper: "StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation"
☆35Updated 8 months ago
Alternatives and similar repositories for StoryReasoning
Users that are interested in StoryReasoning are comparing it to the libraries listed below
Sorting:
- VLLM Port of the Chatterbox TTS model☆365Updated 3 months ago
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆29Updated last month
- SoTA open-source TTS☆150Updated last month
- ACE-Step: A Step Towards Music Generation Foundation Model☆49Updated 8 months ago
- A pipeline parallel training script for LLMs.☆166Updated 9 months ago
- Adding a multi-text multi-speaker script (diffe) that is based on a script from asiff00 on issue 61 for Sesame: A Conversational Speech G…☆26Updated 10 months ago
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆266Updated 11 months ago
- B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.☆26Updated last year
- ☆178Updated 5 months ago
- Dia-JAX: A JAX port of Dia, the text-to-speech model for generating realistic dialogue from text with emotion and tone control.☆30Updated 9 months ago
- Sparse Inferencing for transformer based LLMs☆217Updated 5 months ago
- ☆95Updated 8 months ago
- ☆51Updated 3 months ago
- ☆112Updated 7 months ago
- Orpheus Chat WebUI☆76Updated 10 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆49Updated 3 months ago
- A GTK4-based text-to-speech and AI assistant app in Rust, featuring PDF reading and LLM chat powered by Kokoro TTS☆18Updated 5 months ago
- Service for testing out the new Qwen2.5 omni model☆63Updated 9 months ago
- ☆386Updated 3 months ago
- Automated speech dataset creator☆215Updated 7 months ago
- ☆83Updated 11 months ago
- The most feature-complete local AI workstation. Multi-GPU inference, integrated Stable Diffusion + ADetailer, voice cloning, research-gra…☆55Updated last week
- Enhancing LLMs with LoRA☆206Updated 3 months ago
- Cascading voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.☆128Updated 5 months ago
- AudioStory: Generating Long-Form Narrative Audio with Large Language Models☆301Updated 4 months ago
- Human-taught Computer-use Agent Designed for Real Windows and MacOS Desktops.☆177Updated 3 weeks ago
- gguf (GPT-Generated Unified Format) connector☆50Updated 3 weeks ago
- A Conversational Speech Generation Model with Gradio UI and OpenAI compatible API. UI and API support CUDA, MLX and CPU devices.☆211Updated 9 months ago
- This extension enhances the capabilities of textgen-webui by integrating advanced vision models, allowing users to have contextualized co…☆57Updated last year
- A highly compressive and high-quality neural audio codec for speech models.☆250Updated 2 weeks ago