CoreJT / NLPPapersSpider
☆11Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for NLPPapersSpider
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆40Updated 4 months ago
- ☆38Updated last year
- pytorch单精度、半精度、混合精度、单卡、多卡(DP / DDP)、FSDP、DeepSpeed模型训练代码,并对比不同方法的训练速度以及GPU内存 的使用☆76Updated 7 months ago
- Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.☆12Updated last month
- official repository for DiffCap: Exploring Continuous Diffusion on Image Captioning☆7Updated last year
- Code for "Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes"☆51Updated 7 months ago
- Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"☆26Updated 8 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆31Updated 2 weeks ago
- A paper list about diffusion models for natural language processing.☆175Updated last year
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆47Updated last year
- ☆24Updated 4 months ago
- Narrative movie understanding benchmark☆58Updated 6 months ago
- a thin wrapper of chatgpt for improving paper writing.☆253Updated last year
- ☆34Updated 2 years ago
- Build a daily academic subscription pipeline! Get daily Arxiv papers and corresponding chatGPT summaries with pre-defined keywords. It is…☆30Updated last year
- Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"☆29Updated 4 months ago
- ☆30Updated last month
- [CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding☆42Updated 3 months ago
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆14Updated 2 weeks ago
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆81Updated 6 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆52Updated last year
- The official repo of our work "Pensieve: Retrospect-then-Compare mitigates Visual Hallucination"☆14Updated 6 months ago
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆207Updated last week
- ☆84Updated 11 months ago
- ☆30Updated 4 months ago
- The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". Th…☆32Updated this week
- Grab GPU whenever available☆278Updated 2 years ago
- ☆102Updated last year
- Official implementation for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆42Updated last year
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆81Updated 4 months ago