WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
☆36Jun 10, 2025Updated 8 months ago
Alternatives and similar repositories for WeThink
Users that are interested in WeThink are comparing it to the libraries listed below
Sorting:
- [SIGGRAPH Asia 2025] Official Implementation of "ConsistEdit: Highly Consistent and Precise Training-free Visual Editing"☆69Dec 2, 2025Updated 3 months ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Jan 1, 2026Updated 2 months ago
- [CVPR Challenge Rank 2nd] The codes and related files to reproduce the results for Video Similarity Challenge Descriptor Track.☆20Apr 15, 2025Updated 10 months ago
- Implementation and checkpoints of Imagen, Google's text-to-image synthesis neural network, in Pytorch☆17Dec 22, 2022Updated 3 years ago
- [CIKM-2024] Official code for work "ERASE: Error-Resilient Representation Learning on Graphs for Label Noise Tolerance"☆19Aug 14, 2024Updated last year
- ☆34Jul 8, 2025Updated 7 months ago
- [ICCV 2025] GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding☆73Jun 26, 2025Updated 8 months ago
- [TPAMI 2024] Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding☆29Sep 11, 2024Updated last year
- [SIGGRAPH Asia 2025] The official implementation of the paper "DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinat…☆32Nov 22, 2025Updated 3 months ago
- [Arxiv 2024] MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms☆14Dec 1, 2024Updated last year
- [CVPR 2024 Accepted] TaskWeave: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection☆29Sep 26, 2024Updated last year
- [ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant☆246Aug 14, 2024Updated last year
- Find strongest response of convolutional layers on an image dataset. Automatically compute receptive field for any CNN layer.☆14Feb 19, 2021Updated 5 years ago
- ☆11Dec 6, 2024Updated last year
- 红外和可见光融合☆10Apr 17, 2019Updated 6 years ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆19Jul 10, 2025Updated 7 months ago
- build vgg16 with pytorch 0.4.0 for classification of CIFAR datasets☆10Mar 31, 2019Updated 6 years ago
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆142Jun 30, 2025Updated 8 months ago
- OpenTMA: support text-motion alignment for HumanML3D, Motion-X, and UniMoCap☆46May 22, 2024Updated last year
- ☆10Nov 25, 2020Updated 5 years ago
- UMB: Understanding Model Behavior for Open-World object Detection (NeurIPS 2024)☆11May 26, 2024Updated last year
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation☆30Dec 22, 2025Updated 2 months ago
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆21Jun 23, 2025Updated 8 months ago
- Official implementation of our CVPR'22 paper.☆13Nov 18, 2022Updated 3 years ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 7 months ago
- ALAS: Autonomous Learning Agent System☆15Aug 14, 2025Updated 6 months ago
- Python bindings for NVIDIA CUDA APIs.☆13Mar 2, 2024Updated 2 years ago
- [ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding☆19Mar 2, 2025Updated last year
- ☆26Oct 16, 2025Updated 4 months ago
- ☆13Jun 21, 2025Updated 8 months ago
- 本项目提供了面向中文的XLNet预训练模型,旨在丰富中文自然语言处理资源,提供多元化的中文预训练模型选择。 我们欢迎各位专家学者下载使用,并共同促进和发展中文资源建设。☆11May 30, 2023Updated 2 years ago
- codes for ICML2021 paper iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients☆10May 27, 2021Updated 4 years ago
- This is an official implementation of our CVPR 2020 paper "Non-Local Neural Networks With Grouped Bilinear Attentional Transforms".☆12Jan 30, 2021Updated 5 years ago
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆11Jul 16, 2024Updated last year
- Some commonly used functions and modules☆10Jan 15, 2024Updated 2 years ago
- Ling-Coder-Lite is a MoE LLM provided and open-sourced by CodeFuse and InclusionAI.☆14Apr 22, 2025Updated 10 months ago
- The official source code of our AAAI25 paper "D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matchin…☆10Feb 9, 2025Updated last year
- Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining☆13Oct 22, 2021Updated 4 years ago
- A PyTorch implementation of ResNet-preact☆11Aug 5, 2019Updated 6 years ago