kyegomez / Paper-Implementation-Template
A simple reproducible template to implement AI research papers
☆23Updated 5 months ago
Alternatives and similar repositories for Paper-Implementation-Template:
Users that are interested in Paper-Implementation-Template are comparing it to the libraries listed below
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated last week
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆48Updated 2 weeks ago
- Official repo for StableLLAVA☆94Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated 3 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆189Updated last month
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆86Updated last month
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆73Updated 3 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated this week
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 7 months ago
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆25Updated 6 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆149Updated last month
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆127Updated 3 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆96Updated 4 months ago
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆128Updated last month
- ☆47Updated last year
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆149Updated last month
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆120Updated last month
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆104Updated 3 weeks ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆33Updated 7 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆144Updated 2 weeks ago
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆117Updated this week
- Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.☆62Updated 3 months ago
- Implementation of the premier Text to Video model from OpenAI☆57Updated 3 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆28Updated 3 weeks ago
- ☆58Updated last month
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆88Updated 10 months ago
- ☆68Updated 4 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆81Updated this week
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆46Updated 2 months ago