FareedKhan-dev / text2video-from-scratch
A Straightforward, Step-by-Step Implementation of a Video Diffusion Model
☆42Updated 3 months ago
Alternatives and similar repositories for text2video-from-scratch
Users that are interested in text2video-from-scratch are comparing it to the libraries listed below
Sorting:
- [ICLR'25] ApolloMoE: Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts☆40Updated 5 months ago
- A new novel multi-modality (Vision) RAG architecture☆27Updated 7 months ago
- Building LLaMA 4 MoE from Scratch☆43Updated last month
- OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆74Updated last month
- Maximizing the Performance of a Simple RAG using RL☆57Updated last month
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆82Updated 3 months ago
- ☆57Updated 5 months ago
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated 11 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆81Updated last week
- World's Smallest Vision-Language Model☆27Updated last year
- Tutorials from AutoGen Basics to Use Cases☆31Updated last year
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆37Updated last year
- Receipts for creating AI Applications with APIs from DashScope (and friends)!☆51Updated 7 months ago
- rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking☆38Updated 4 months ago
- ☆68Updated 10 months ago
- Small Multimodal Vision Model "Imp-v1-3b" trained using Phi-2 and Siglip.☆17Updated last year
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [F…☆62Updated 11 months ago
- This is the code repo for our paper "Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts".☆30Updated 2 months ago
- 最简易的R1结果在小模型上的复现,阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证,对于强推理能力,think思考过程性内容是AGI/ASI的核心。☆45Updated 3 months ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆29Updated 2 months ago
- An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.☆49Updated this week
- 🤗 HF Downloader (Hugging Face Downloader) 📦 A user-friendly GUI tool for downloading Hugging Face resources with enhanced connectivity…☆11Updated 4 months ago
- Composition of Multimodal Language Models From Scratch☆14Updated 9 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 8 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆37Updated 8 months ago
- An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning☆32Updated 3 months ago
- Hybrid-RAG is a hybrid Retrieval-Augmented Generation (RAG) model that leverages BERT for retrieving relevant documents and GPT-2 for gen…☆26Updated 3 months ago
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆81Updated last year
- ☆89Updated last week
- FuseAI Project☆86Updated 3 months ago