mbzuai-oryx / AINView external linksLinks
AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding across diverse domains.
β51Mar 13, 2025Updated 11 months ago
Alternatives and similar repositories for AIN
Users that are interested in AIN are comparing it to the libraries listed below
Sorting:
- [ACL 2025 π₯] Time Travel is a Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifactsβ18May 22, 2025Updated 8 months ago
- VideoMathQA is a benchmark designed to evaluate mathematical reasoning in real-world educational videosβ22Jan 26, 2026Updated 2 weeks ago
- [MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathologyβ12Jun 17, 2025Updated 7 months ago
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite foβ¦β50Aug 23, 2024Updated last year
- [ACCV 2024] ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes πππβ37Jan 21, 2025Updated last year
- [ICCVW 2025 (Oral)] Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Modelsβ28Oct 20, 2025Updated 3 months ago
- [BMVC 2024] On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Modelsβ15Nov 1, 2024Updated last year
- [MICCAI 2023][Early Accept] Official code repository of paper titled "Cross-modulated Few-shot Image Generation for Colorectal Tissue Claβ¦β47Sep 28, 2023Updated 2 years ago
- Official repository of paper titled "D3Former: Debiased Dual Distilled Transformer for Incremental Learning".β25Jul 10, 2023Updated 2 years ago
- [CVPRW 2025] Official repository of paper titled "Towards Evaluating the Robustness of Visual State Space Models"β25Jun 8, 2025Updated 8 months ago
- Reinforcement Training of Robotβ11Dec 1, 2019Updated 6 years ago
- A new multi-task learning framework using Vision Transformersβ11Jun 19, 2024Updated last year
- β70Jul 2, 2025Updated 7 months ago
- Composed Video Retrievalβ62May 2, 2024Updated last year
- [NeurIPS2023] 3D-OWIS is capable of detecting unknown instances in inference, and progressively learning novel classes in the process of β¦β68Dec 3, 2023Updated 2 years ago
- β18Sep 23, 2024Updated last year
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing theirβ¦β20Jan 11, 2026Updated last month
- [CVPR 2025 π₯]A Large Multimodal Model for Pixel-Level Visual Grounding in Videosβ96Apr 14, 2025Updated 10 months ago
- [MICCAI 2023] Official code repository of paper titled "Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation"β¦β52Nov 14, 2023Updated 2 years ago
- Official repository of paper titled "UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalitieβ¦β154Jan 19, 2026Updated 3 weeks ago
- [ACL 2025 π₯] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understandingβ60May 24, 2025Updated 8 months ago
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasksβ36Nov 27, 2025Updated 2 months ago
- RayGen: Multi-Modal Dataset Reinforcement for MobileCLIP and MobileCLIP2β38Aug 29, 2025Updated 5 months ago
- Bilingual Medical Mixture of Experts LLMβ32Nov 23, 2024Updated last year
- Context-Sensitive Neural Spelling Checkerβ20Sep 25, 2024Updated last year
- [BMVC 2025] Official Implementation of the paper "PerSense: Personalized Instance Segmentation in Dense Images"β28Dec 18, 2025Updated last month
- Official code repository of paper titled "Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Visioβ¦β31May 11, 2025Updated 9 months ago
- Language Grounded Single Source Domain Generalization in Medical Image Segmentation [ISBI2024]β32Oct 27, 2024Updated last year
- [ECCV 2024] Soft Prompt Generation for Domain Generalizationβ31Oct 1, 2024Updated last year
- β31Oct 2, 2025Updated 4 months ago
- [AAAI'25, CVPRW 2024] Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".β121Dec 17, 2024Updated last year
- Code for "Enhancing In-context Learning via Linear Probe Calibration"β37Apr 24, 2024Updated last year
- [WACV 2025] Official code for our paper "Enhancing Novel Object Detection via Cooperative Foundational Models"β84Jan 2, 2026Updated last month
- Source code for MICCAI 2022 paper entitled: 'Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification'β37Jan 13, 2023Updated 3 years ago
- All the details for our UAVs Jammming Detection projectβ11Feb 20, 2023Updated 2 years ago
- Original VinVL visual backbone with simplified APIs to easily extract features, boxes, object detections, in a few lines of Python code.β11Nov 27, 2022Updated 3 years ago
- [CVPR 2025] Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generationβ19Dec 18, 2025Updated last month
- Official implementation of the paper "LTrack: Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Repβ¦β12Jul 26, 2023Updated 2 years ago
- Interview questions asked in Data Science/ Machine Learning interviewsβ19Jan 15, 2020Updated 6 years ago