Image captioning with weight pruning in PyTorch
☆22Jan 14, 2022Updated 4 years ago
Alternatives and similar repositories for sparse-image-captioning
Users that are interested in sparse-image-captioning are comparing it to the libraries listed below
Sorting:
- A paper list of image captioning.☆22Apr 23, 2022Updated 3 years ago
- ☆24Apr 4, 2022Updated 3 years ago
- implementation of paper https://arxiv.org/abs/2210.04559☆56Nov 26, 2025Updated 3 months ago
- From Gradient Leakage to Adversarial Attacks in Federated Learning☆16Sep 21, 2021Updated 4 years ago
- Implementation of paper "Improving Image Captioning with Better Use of Caption"☆33Sep 15, 2020Updated 5 years ago
- Controllable mage captioning model with unsupervised modes☆21Apr 14, 2023Updated 2 years ago
- Official Code for 'RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words' (CVPR 2021)☆123Dec 17, 2022Updated 3 years ago
- Official Code for "Knowing what it is: Semantic-enhanced Dual Attention Transformer" (TMM2022)☆19Oct 15, 2022Updated 3 years ago
- ☆85Dec 4, 2022Updated 3 years ago
- [CVPR 2022] This repository is for the paper ``DIFNet: Boosting Visual Information Flow for Image Captioning'' .☆21Nov 28, 2022Updated 3 years ago
- Implementation for CVPR 2022 paper " Injecting Semantic Concepts into End-to-End Image Captionin".☆43May 28, 2022Updated 3 years ago
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generation☆23Jul 30, 2025Updated 7 months ago
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.☆24Aug 5, 2023Updated 2 years ago
- Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation, ECCV 2024☆22Feb 15, 2024Updated 2 years ago
- The official implementation codes of greedy residuals for the paper Watermarking Deep Neural Networks with Greedy Residuals (ICML 2021).☆24May 21, 2022Updated 3 years ago
- Removing Adversarial Noise in Class Activation Feature Space☆14Oct 12, 2023Updated 2 years ago
- ☆59Aug 30, 2023Updated 2 years ago
- Repository for an end-to-end image captioning method PTSN(ACM MM22).☆60Dec 11, 2022Updated 3 years ago
- Code for Paper 'Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach'☆35Jan 2, 2026Updated 2 months ago
- Code for paper "Image Captioning with End-to-End Attribute Detection and Subsequent Attributes Prediction". IEEE Transactions on Image Pr…☆26Mar 24, 2021Updated 4 years ago
- ☆26Jun 25, 2021Updated 4 years ago
- [CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion m…☆67Jun 11, 2024Updated last year
- Towards Local Visual Modeling for Image Captioning☆29Mar 31, 2023Updated 2 years ago
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆26May 26, 2025Updated 9 months ago
- Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …☆61Oct 21, 2022Updated 3 years ago
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆198May 9, 2023Updated 2 years ago
- The implementation of multi-branch attentive Transformer (MAT).☆33Aug 27, 2020Updated 5 years ago
- [NeurIPS 2025] VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning☆64Jan 6, 2026Updated last month
- Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).☆202Jun 8, 2022Updated 3 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆12Nov 14, 2025Updated 3 months ago
- Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation☆12Feb 16, 2025Updated last year
- ☆10Apr 2, 2022Updated 3 years ago
- MCFL for Pedestrian attribute recognition☆13Jul 20, 2020Updated 5 years ago
- ☆23Jun 19, 2025Updated 8 months ago
- STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models☆26Feb 12, 2026Updated 2 weeks ago
- ☆22Dec 11, 2025Updated 2 months ago
- PyTorch implementation of Chinese image captioning on AI_challenger dataset☆34Dec 25, 2019Updated 6 years ago
- Official training code for MUG-V 10B video generation model. Built on Megatron-LM (v0.14.0) with production-ready distributed training fo…☆19Oct 20, 2025Updated 4 months ago