AI-confused / arxiv_auto_crawlerLinks
auto scrawl for arrive data
☆16Updated 3 years ago
Alternatives and similar repositories for arxiv_auto_crawler
Users that are interested in arxiv_auto_crawler are comparing it to the libraries listed below
Sorting:
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆189Updated 2 years ago
- [ACM MM 2022 Oral] This is the official implementation of "SER30K: A Large-Scale Dataset for Sticker Emotion Recognition"☆29Updated 3 years ago
- WuDaoMM this is a data project☆74Updated 3 years ago
- A light-weight script for maintaining a LOT of machine learning experiments.☆92Updated 3 years ago
- ☆70Updated 7 months ago
- ☆72Updated 2 years ago
- pytorch单精度、半精度、混合精度、单卡、多卡(DP / DDP)、FSDP、DeepSpeed模型训练代码,并对比不同方法的训练速度以及GPU内存的使用☆129Updated last year
- Keras implement of Finite Scalar Quantization☆83Updated 2 years ago
- Research Code for Multimodal-Cognition Team in Ant Group☆171Updated 2 months ago
- The official GitHub page for the survey paper "Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey". And this paper is unde…☆76Updated 5 months ago
- ☆64Updated 7 months ago
- ☆168Updated 2 years ago
- ☆10Updated 2 years ago
- ☆42Updated 11 months ago
- Narrative movie understanding benchmark☆76Updated 7 months ago
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks☆301Updated 2 years ago
- Video dataset dedicated to portrait-mode video recognition.☆55Updated 2 months ago
- ☆37Updated last year
- ☆36Updated 2 months ago
- ☆22Updated 2 months ago
- [MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models☆291Updated 5 months ago
- 一款便捷的抢占显卡脚本☆391Updated 3 weeks ago
- Lion: Kindling Vision Intelligence within Large Language Models☆51Updated last year
- Product1M☆90Updated 3 years ago
- ☆118Updated 2 years ago
- 😎 A simple and easy-to-use toolkit for GPU scheduling.☆45Updated 8 months ago
- Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks☆209Updated 11 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆279Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Updated last year
- official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆39Updated 6 months ago