yangcaoai/Awesome-Large-Vision-Language-Models

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yangcaoai/Awesome-Large-Vision-Language-Models)

yangcaoai / Awesome-Large-Vision-Language-Models

😎 Awesome lists of papers and codes about Large Vision-Language Models

☆13

Alternatives and similar repositories for Awesome-Large-Vision-Language-Models

Users that are interested in Awesome-Large-Vision-Language-Models are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yangcaoai / Awesome-Open-Vocabulary-Perception
View on GitHub
😎 Awesome lists of papers and codes about open-vocabulary perception, including both 3D and 2D
☆64Jul 27, 2025Updated 11 months ago
yangcaoai / 3DGS-DET
View on GitHub
Official codes for paper: 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for Indoor 3D Object …
☆165Mar 16, 2026Updated 4 months ago
yangcaoai / CoDA_NeurIPS2023
View on GitHub
Official code for NeurIPS2023 paper CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detec…
☆223May 28, 2026Updated last month
yangcaoai / VGGT-Det-CVPR2026
View on GitHub
Official code for CVPR 2026 paper: VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection
☆145Jul 15, 2026Updated last week
jzh15 / SpatialStack
View on GitHub
[CVPR 2026]SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
☆31Jul 15, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
prismformore / DiffusionMTL
View on GitHub
Code of our CVPR2024 paper - DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data
☆60Mar 25, 2024Updated 2 years ago
QingZhong1996 / Awesome-Video-Instance-Segmentation-Papers
View on GitHub
☆36Oct 21, 2022Updated 3 years ago
W-Ted / UDC-NeRF
View on GitHub
Official code for ICCV2023 paper: Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis
☆34Dec 27, 2023Updated 2 years ago
SenZHANG-GitHub / InfoOdometry
View on GitHub
[IJCV 2022] Information-Theoretic Odometry Learning
☆16Apr 19, 2023Updated 3 years ago
westfish / Awesome-Video-Diffusion-Models
View on GitHub
A collection of resources and papers on diffusion models of video generation.
☆10Feb 11, 2023Updated 3 years ago
MingSun-Tse / Caffe_IncReg
View on GitHub
[IJCNN'19, IEEE JSTSP'19] Caffe code for our paper "Structured Pruning for Efficient ConvNets via Incremental Regularization"; [BMVC'18] …
☆14Feb 14, 2020Updated 6 years ago
PengtaoJiang / TSP6K
View on GitHub
The official PyTorch code for "Traffic Scene Parsing through the TSP6K Dataset".
☆34Jul 6, 2025Updated last year
zhonghangqiu / EGASR
View on GitHub
☆10Jan 19, 2024Updated 2 years ago
meteorshowers / Sora-Generates-Videos-with-Stunning-Geometrical-Consistency
View on GitHub
Sora Generates Videos with Stunning Geometrical Consistency
☆51Mar 24, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
AberHu / ImageNet-training
View on GitHub
Pytorch ImageNet training codes with various tricks, lr schedulers, distributed training, mixed precision training, DALI dataloader etc.
☆18Aug 12, 2020Updated 5 years ago
LAION-AI / laion50BU
View on GitHub
Un-*** 50 billions multimodality dataset
☆24Sep 14, 2022Updated 3 years ago
tayden / geotiff-crop-dataset
View on GitHub
A Pytorch Dataloader for tif image files that dynamically crops the image.
☆13Aug 21, 2020Updated 5 years ago
btma48 / AutoLA
View on GitHub
Code of our Neurips2020 paper "Auto Learning Attention", coming soon
☆22Apr 14, 2021Updated 5 years ago
xyq7 / Human-Contribution-Measurement
View on GitHub
☆13Jun 4, 2025Updated last year
pipilurj / bootstrapped-preference-optimization-BPO
View on GitHub
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆63Aug 23, 2024Updated last year
mever-team / visloc-estimation
View on GitHub
Authors official PyTorch implementation of the "Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale Location Estim…
☆13Feb 28, 2024Updated 2 years ago
houqb / SeeNet
View on GitHub
Self-Erasing Network for Integral Object Attention
☆54Nov 27, 2018Updated 7 years ago
NiaBie / FreeLive
View on GitHub
Managed L2D tool libs. (In Dev)
☆14Apr 20, 2019Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ht014 / SG2HOI
View on GitHub
☆12Sep 19, 2021Updated 4 years ago
WeihongLi-ac / Learning-to-impute
View on GitHub
Learning to Impute: A General Framework for Semi-supervised Learning
☆19Sep 19, 2020Updated 5 years ago
jinpeng0528 / SEFE
View on GitHub
☆13May 6, 2025Updated last year
NKU-MetautoAI / awesome-large-vision-language-models
View on GitHub
Advances in recent large vision language models (LVLMs)
☆15Sep 23, 2024Updated last year
sdkfinance / sdk-finance-frontend
View on GitHub
Legacy frontend repository associated with early SDK.finance implementations. Not under active development.
☆12Feb 25, 2026Updated 5 months ago
vivoCameraResearch / PPC-Official
View on GitHub
Code for our NeurIPS25 paper "Photography Perspective Composition: Towards Aesthetic Perspective Recommendation"
☆39Mar 6, 2026Updated 4 months ago
zhongyingji / CVT-xRF
View on GitHub
CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs (CVPR2024)
☆17Jun 14, 2024Updated 2 years ago
ChengyueGongR / PatchVisionTransformer
View on GitHub
☆74Dec 8, 2022Updated 3 years ago
discord / cassandra-rs
View on GitHub
Cassandra (CQL) driver for Rust, using the DataStax C/C++ driver under the covers.
☆13Jun 17, 2022Updated 4 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
CodeJjang / multiscale-attention-patch-matching
View on GitHub
☆13Nov 26, 2023Updated 2 years ago
AdamRain / YFCC15M_downloader
View on GitHub
A subset of YFCC100M. Tools, checking scripts and links of web drive to download datasets(uncompressed).
☆19Nov 13, 2024Updated last year
etrulls / slurm-gcp
View on GitHub
Slurm on Google Cloud Platform
☆13Sep 21, 2020Updated 5 years ago
ShaelynZ / synergize-motion-appearance
View on GitHub
[CVPR 2025] Official code for "Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation"
☆65Jul 3, 2026Updated 3 weeks ago
XiaRho / SEMat
View on GitHub
☆55Jan 6, 2025Updated last year
qizhust / esceme
View on GitHub
☆24Mar 9, 2023Updated 3 years ago
backseason / DFI
View on GitHub
Code for our IEEE TIP 2020 paper "Dynamic Feature Integration for Simultaneous Detection of Salient Object, Edge and Skeleton"
☆52Dec 13, 2021Updated 4 years ago