clinicalml / onboarding_human_ai

Onboarding Humans to work with AI: Algorithms to find regions and describe them in natural language that show how humans should collaborate with AI (NeurIPS23)

☆12

Alternatives and similar repositories for onboarding_human_ai:

Users that are interested in onboarding_human_ai are comparing it to the libraries listed below

kyegomez / OmniByteFormer
OmniByteFormer is a generalized Transformer model that can process any type of data by converting it into byte sequences, bypassing tradi…
☆10Updated this week
wgcban / apt
PyTorch Implementation of Attention Prompt Tuning: Parameter-Efficient Adaptation of Pre-Trained Models for Action Recognition
☆14Updated last year
EternityYW / Gemini-Commonsense-Evaluation
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
☆36Updated last year
apple / ml-mia-bench
This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
☆30Updated last month
kyegomez / MC-ViT
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
☆21Updated 3 weeks ago
Netflix / clove
☆13Updated 7 months ago
Huage001 / Paint-Anything
An interactive demo based on Segment-Anything for stroke-based painting which enables human-like painting.
☆35Updated 2 years ago
kyegomez / LIMoE
Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…
☆29Updated 2 weeks ago
TencentARC / Plot2Code
☆19Updated 8 months ago
microsoft / klite
[NeurIPS 2022] code for "K-LITE: Learning Transferable Visual Models with External Knowledge" https://arxiv.org/abs/2204.09222
☆51Updated last year
kyegomez / EXA-1
An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!
☆42Updated last year
langchain-ai / multi-modal-code-agent
☆14Updated last year
apple / ml-ogen
☆13Updated last year
shulin16 / MMInA
Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"
☆42Updated last month
microsoft / MageBench
Official Repo for MageBench: Bridging Large Multimodal Models to Agents
☆21Updated 3 months ago
kyegomez / forest-of-thoughts
A forest of autonomous agents.
☆19Updated 2 months ago
TIGER-AI-Lab / VisualWebInstruct
The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"
☆24Updated last month
showlab / Show-Anything-3D
Edit and Generate Anything in 3D world!
☆13Updated 2 years ago
NVlabs / STL
Official Pytorch Implementation of Self-emerging Token Labeling
☆33Updated last year
01yzzyu / wikiautogen
☆14Updated last month
FactoDeepLearning / MultitaskVLFM
☆24Updated last year
PirateforFreedom / TypeAgent
Luann allows you to create a LLM agent,which has complete memory module (long-term memory, short-term memory) and knowledge module（Variou…
☆21Updated last month
kyegomez / dev-swarm
A swarm of LLM agents that will help you test, document, and productionize your code!
☆15Updated last week
WeihuangLin / INF-LLaVA
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
☆42Updated 8 months ago
cyzus / thoughtsculpt
☆13Updated 4 months ago
NExT-GPT / NExT-GPT.github.io
NExT-GPT: Any-to-Any Multimodal Large Language Model
☆19Updated 5 months ago
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆35Updated 10 months ago
KastanDay / video-pretrained-transformer
Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scra…
☆53Updated 2 years ago
showlab / assistgpt
☆66Updated last year
om-ai-lab / ZoomEye
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆30Updated 3 months ago