2tianyao1 / ActionLLMLinks
Multimodal Large Models Are Effective Action Anticipators (IEEE TMM)🌳
☆24Updated 4 months ago
Alternatives and similar repositories for ActionLLM
Users that are interested in ActionLLM are comparing it to the libraries listed below
Sorting:
- Official repository of the GraSP dataset and implemention of TAPIS☆45Updated 11 months ago
- [ECCV 2024] Official Implementation of "OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding"☆59Updated 5 months ago
- ☆17Updated 2 months ago
- ☆25Updated last year
- [MICCAI 2024] Surgformer: Surgical Transformer with Hierarchical Temporal Attention for Surgical Phase Recognition☆40Updated 3 months ago
- Long Surgical Phase Recognition☆23Updated last year
- ☆36Updated 8 months ago
- ☆13Updated 3 years ago
- ☆38Updated last month
- Official Code for "Large-scale Self-supervised Video Foundation Model for Intelligent Surgery"☆25Updated 6 months ago
- Papers and Public Datasets for Medical Vision-Language Learning☆19Updated 2 years ago
- [CVPR 2024 Oral] MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation.☆181Updated last year
- MICCAI 2022: Free Lunch for Surgical Video Understanding by Distilling Self-Supervisions☆12Updated 3 years ago
- ☆16Updated 4 years ago
- [TMI'2021] Temporal Memory Relation Network for Workflow Recognition from Surgical Video☆65Updated 3 years ago
- Official implementation of "Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Ro…☆24Updated last year
- Official Code for Contrastive Learning with Counterfactual Explanations for Radiology Report Generation (ECCV 2024)☆14Updated 8 months ago
- There are compilations of surgery-related tasks, datasets, and papers.☆126Updated last month
- ☆43Updated 2 weeks ago
- [TMI'22]Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation☆23Updated 2 years ago
- Official implementation of “CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surg…☆17Updated last year
- Pytorch implementation of the MICCAI 2020 paper ISINet: An Instance-Based Approach for Surgical Instrument Segmentation.☆24Updated 4 years ago
- Surgical Visual Question Answering. A transformer-based surgical VQA model. Offical Implementation of "Surgical-VQA: Visual Question Answ…☆60Updated 2 years ago
- [CVPR 2024]Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation☆28Updated 2 months ago
- ☆90Updated 3 years ago
- ☆46Updated 6 months ago
- CV codes from CIAM Group at SUSTech, Shenzhen, China☆11Updated last year
- Multi-Aspect Vision Language Pretraining - CVPR2024☆85Updated last year
- Reading list and publicly available datasets for surgical vision☆37Updated 4 years ago
- Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding (ICLR 2025)☆111Updated 8 months ago