NimrodShabtay / LiveXivLinks
☆13Updated 5 months ago
Alternatives and similar repositories for LiveXiv
Users that are interested in LiveXiv are comparing it to the libraries listed below
Sorting:
- Official Pytorch Implementation of "Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generati…☆10Updated 4 months ago
- 2D-TPE: Two-Dimensional Positional Encoding Enhances Table Understanding for Large Language Models (WWW 2025)☆10Updated 8 months ago
- The implementation for FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts (ICCV25)☆12Updated 6 months ago
- 本项目主要是2025届浙江大学软件学院夏令营(AI营)的考核项目☆11Updated 10 months ago
- ☆10Updated last year
- ☆25Updated 6 months ago
- ☆26Updated 5 months ago
- A Comprehensive Dataset for Advanced Image Generation and Editing}☆30Updated 3 months ago
- KV cache compression via sparse coding☆17Updated 2 months ago
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆19Updated last year
- [WACV2025] source code of StrDA: https://arxiv.org/abs/2410.09913☆12Updated 8 months ago
- Code for the paper "ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions" published at CVPR 2025☆19Updated 9 months ago
- ☆16Updated 9 months ago
- Self Evolving Large Multimodal Models with Continuous Rewards☆17Updated last month
- unofficial☆12Updated last year
- Instagram Automation Tool is a framework that automates various Instagram tasks, including file-based operations and web automation (via …☆15Updated 8 months ago
- ☆21Updated 3 months ago
- Context-Informed Machine Translation of Manga using Multimodal Large Language Models☆14Updated last year
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆79Updated last year
- The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders☆41Updated 7 months ago
- ☆14Updated last year
- CodeRepoQA dataset☆15Updated 10 months ago
- [Arxiv 2025] In-Video Instructions: Visual Signals as Generative Control☆46Updated last month
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆47Updated 10 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆51Updated 5 months ago
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆32Updated 4 months ago
- Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆73Updated last month
- Text-Only Data Synthesis for Vision Language Model Training☆23Updated 7 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆24Updated last year
- ☆39Updated 3 weeks ago