CoreJT / NLPPapersSpider
☆11Updated 5 years ago
Alternatives and similar repositories for NLPPapersSpider
Users that are interested in NLPPapersSpider are comparing it to the libraries listed below
Sorting:
- A paper list about diffusion models for natural language processing.☆182Updated last year
- ☆35Updated 10 months ago
- [MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models☆287Updated 3 months ago
- Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.☆13Updated 7 months ago
- ☆39Updated 3 weeks ago
- A collection of omni-mllm☆28Updated this week
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆94Updated 3 months ago
- Recent Advances on MLLM's Reasoning Ability☆25Updated last month
- Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) 🍓☆34Updated last month
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆236Updated last year
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆50Updated last year
- Keras implement of Finite Scalar Quantization☆71Updated last year
- ☆91Updated last year
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆121Updated this week
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆24Updated 4 months ago
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆140Updated 10 months ago
- Explanation of the llama2 repo.☆10Updated 10 months ago
- All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)☆162Updated 8 months ago
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆69Updated 3 months ago
- A Pytorch Implementation of Finite Scalar Quantization☆132Updated last year
- The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation☆196Updated last year
- ☆11Updated 2 years ago
- A python implement for Certifiable Robust Multi-modal Training☆18Updated 9 months ago
- Narrative movie understanding benchmark☆70Updated last year
- ☆30Updated last month
- Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"☆27Updated last year
- EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Vi…☆22Updated last week
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated last year
- ☆42Updated 4 years ago
- pytorch单精度、半精度、混合精度、单卡、多卡(DP / DDP)、FSDP、DeepSpeed模型训练代码,并对比不同方法的训练速度以及GPU内存的使用☆99Updated last year