kyegomez / Kosmos2.5
View external linksLinks

My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

☆74

Alternatives and similar repositories for Kosmos2.5

Users that are interested in Kosmos2.5 are comparing it to the libraries listed below

Sorting:

kyegomez / KosmosG
View on GitHub
My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"
☆14Nov 11, 2024Updated last year
kyegomez / Qwen-VL
View on GitHub
My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…
☆12Jan 29, 2024Updated 2 years ago
kyegomez / AlphaDev
View on GitHub
Implementation of the model from "Faster sorting algorithms discovered using deep reinforcement learning" that discovered an all-new ult…
☆11Aug 29, 2023Updated 2 years ago
Agora-Lab-AI / Atom
View on GitHub
a suite of finetuned LLMs for atomically precise function calling 🧪
☆17Feb 6, 2026Updated last week
vis-nlp / OpenCQA
View on GitHub
☆12Jun 20, 2023Updated 2 years ago
NExTplusplus / TAT-DQA
View on GitHub
TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning
☆23Sep 17, 2024Updated last year
kyegomez / Blockwise-Parallel-Transformer
View on GitHub
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆50Jun 16, 2023Updated 2 years ago
hint-lab / doctrack
View on GitHub
Dataset for EMNLP'23 Paper "DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading"
☆11Oct 25, 2023Updated 2 years ago
theAdamColton / ijepa-enhanced
View on GitHub
recipe for training fully-featured self supervised image jepa models
☆12Jun 4, 2025Updated 8 months ago
phucty / wtabhtml
View on GitHub
Tool to parse wiki tables from the HTML dump of Wikipedia
☆11Jun 12, 2022Updated 3 years ago
SCUT-DLVCLab / Document-AI-Recommendations
View on GitHub
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
☆203Mar 1, 2025Updated 11 months ago
WenjinW / LATIN-Prompt
View on GitHub
☆51May 28, 2024Updated last year
RQLuo / MixTeX-DataHub
View on GitHub
LaTeXDataHub is an open-source platform dedicated to the sharing and contribution of real-world LaTeX image datasets and their annotation…
☆12Aug 13, 2024Updated last year
kyegomez / MAGVIT2
View on GitHub
Open source community's implementation of the model from "LANGUAGE MODEL BEATS DIFFUSION — TOKENIZER IS KEY TO VISUAL GENERATION"
☆15Nov 11, 2024Updated last year
kyegomez / Pegasus
View on GitHub
PegasusX: The Future of Multimodal Embeddings 🦄 🦄
☆14Oct 16, 2024Updated last year
elsatch / daily_hf_papers_abstracts
View on GitHub
This repository includes the code to download the curated HuggingFace papers into a single markdown formatted file
☆16Jul 26, 2024Updated last year
kyegomez / TinyGPTV
View on GitHub
Simple Implementation of TinyGPTV in super simple Zeta lego blocks
☆16Nov 11, 2024Updated last year
ZackBradshaw / ikigAI
View on GitHub
☆14Mar 28, 2024Updated last year
kyegomez / NeVA
View on GitHub
The open source implementation of "NeVA: NeMo Vision and Language Assistant"
☆17Aug 26, 2023Updated 2 years ago
SALT-NLP / LLaVAR
View on GitHub
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
☆269Jun 12, 2024Updated last year
adlnlp / pdfvqa
View on GitHub
☆17Jun 12, 2024Updated last year
OSU-slatelab / MapQA
View on GitHub
☆14Jan 9, 2026Updated last month
kyegomez / HRTX
View on GitHub
Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2
☆15Jun 27, 2025Updated 7 months ago
Ucas-HaoranWei / Vary-family
View on GitHub
☆57Jan 23, 2024Updated 2 years ago
kyegomez / MultiModal-ToT
View on GitHub
Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement
☆17Nov 11, 2024Updated last year
kyegomez / EAOT
View on GitHub
The open source implementation of "Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers"
☆19Mar 11, 2024Updated last year
kyegomez / PALI
View on GitHub
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
☆93Mar 20, 2024Updated last year
OpenGVLab / MM-Interleaved
View on GitHub
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
☆248Apr 3, 2024Updated last year
kyegomez / GPT3
View on GitHub
An implementation of the base GPT-3 Model architecture from the paper by OPENAI "Language Models are Few-Shot Learners"
☆20Jun 29, 2024Updated last year
GradientHQ / lattica
View on GitHub
💥 Make peer-2-peer global works
☆46Jan 29, 2026Updated 2 weeks ago
HCIILAB / M6Doc
View on GitHub
☆156May 8, 2025Updated 9 months ago
kyegomez / EXA-1
View on GitHub
An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!
☆40Feb 1, 2024Updated 2 years ago
kyegomez / autogpt-tot
View on GitHub
Simple Autogpt with tree of thoughts
☆14May 25, 2023Updated 2 years ago
buptlihang / CDLA
View on GitHub
CDLA: A Chinese document layout analysis (CDLA) dataset
☆288Sep 13, 2021Updated 4 years ago
uhh-lt / wsd
View on GitHub
A system for unsupervised knowledge-free interpretable word sense disambiguation based on distributional semantics
☆19Mar 25, 2018Updated 7 years ago
LlamaGenAI / note-ai
View on GitHub
An open-source Notion-style WYSIWYG editor with AI-powered autocompletions.
☆24Jul 13, 2023Updated 2 years ago
togethercomputer / flash-attention-3
View on GitHub
Fast and memory-efficient exact attention
☆29Dec 2, 2024Updated last year
kyegomez / LM-Infinite
View on GitHub
Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆40Nov 11, 2024Updated last year
kyegomez / AnyMAL
View on GitHub
The open source implementation of "AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model"
☆22Jan 27, 2025Updated last year

kyegomez / Kosmos2.5View external linksLinks

Alternatives and similar repositories for Kosmos2.5

kyegomez / Kosmos2.5
View external linksLinks