kyegomez / Paper-Implementation-TemplateLinks
A simple reproducible template to implement AI research papers
☆24Updated 9 months ago
Alternatives and similar repositories for Paper-Implementation-Template
Users that are interested in Paper-Implementation-Template are comparing it to the libraries listed below
Sorting:
- The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Model…☆15Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Updated 7 months ago
- ☆84Updated 2 weeks ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆98Updated 8 months ago
- ☆73Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆205Updated 5 months ago
- ☆50Updated last year
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆44Updated 4 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆44Updated 11 months ago
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆53Updated 3 weeks ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆95Updated this week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆104Updated last month
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆47Updated 3 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆143Updated 7 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆79Updated 3 weeks ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 4 months ago
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆32Updated 10 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆91Updated last month
- Official repo for StableLLAVA☆95Updated last year
- Official repository of S-Agents: Self-organizing Agents in Open-ended Environment☆26Updated last year
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆51Updated 6 months ago
- ☆115Updated 10 months ago
- A repository for research on medium sized language models.☆76Updated last year
- ☆29Updated 10 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆27Updated last month
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆70Updated 4 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆158Updated 2 months ago
- ☆44Updated 5 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆83Updated last year
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆25Updated last month