tianruochen / MultimodalVideoTagLinks
多模态视频分类模型
☆21Updated 2 years ago
Alternatives and similar repositories for MultimodalVideoTag
Users that are interested in MultimodalVideoTag are comparing it to the libraries listed below
Sorting:
- 这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。 视频描述生成任务指的是:输入一个视频,输出一句描述整个视频内容的文字(前提是视频较短且可以用一句 话来描述)。本repo主要目的是帮助视力障碍…☆92Updated 3 years ago
- Tiny Kinetics-400 for test☆92Updated last year
- Papers, codes collection of video summarization / video highlight detection / video key frame selection☆36Updated 4 years ago
- 使用pytorch完成的一个多模态分类任务,文本和图像部分分别使用了bert和resnet提取特征(在config里可以组合多种模型),在我的小规模数据集上取得了良好的性能(验证集acc96%)☆79Updated 2 years ago
- 基于ClipCap的看图说话Image Caption模型☆306Updated 3 years ago
- 目标检测算法主要包括:两类two-stage和one-stage 一类是two-stage,two-stage检测算法将检测问题划分为两个阶段,首先产生候选区域(region proposals),然后对候选区域分类(一般还需要对位置精修),这一类的典型代表是R-CNN…☆13Updated 3 years ago
- Frames Extraction With OpenCV and Python☆15Updated 4 years ago
- 人脸识别、人脸细粒度表情识别、异常行为检测和识别☆11Updated 3 years ago
- Efficient dual attention SlowFast networks for video action recognition☆24Updated 3 years ago
- 利用pytorch实现图像分类的一个完整的代码,训练,预测,TTA,模型融合,模型部署,cnn提取特征,svm或者随机森林等进行分类,模型蒸馏,一个完整的代码☆31Updated 4 years ago
- Multimodal short video classification task, integrating video, image, audio and text modes for short video classification☆19Updated 5 years ago
- 基于多模态检索的互联网图文匹配☆14Updated last year
- 使用django+pyecharts+PP-Human开发的动态数据大屏, 有人流数据的采集入库, 打架、摔倒等事件警报,口罩检测等实用功能。边缘端版本使用onnx推理提升效率,服务端版本支持视频流推拉☆32Updated 2 years ago
- Action_Recognition_Surveillance, for huamn fall down action recognition, helmet、smoke、cell-phone detection.☆18Updated 2 years ago
- ☆85Updated 4 years ago
- (TIP'2023) Concept-Aware Video Captioning: Describing Videos with Effective Prior Information☆29Updated 6 months ago
- 用yoloV3进行人的侦测,再用SPPE(Alphapose)进行骨架提取,再用连续30帧的ST-GCN进行行为识别。站立、走路、跌倒行为识别,在https://github.com/GajuuzZ/Human-Falling-Detect-Tracks的./models里…☆72Updated 4 years ago
- ☆115Updated 2 years ago
- 视频分类标注、视频时空标注☆42Updated last year
- In the consideration that the college student behavior dataset is scarce, we set up a novel college students’ action dataset in the class…☆18Updated 2 years ago
- Implementation of ViViT: A Video Vision Transformer - Zipping Coding Challenge☆32Updated 4 years ago
- DALC华录杯比赛定向赛双赛道(摔倒检测&人群密度计数)方案☆27Updated 3 years ago
- 该项目旨在通过输入文本描述来检索与之相匹配的图片。☆40Updated last year
- 一个通用的图像分类模板,天池/CVPR AliProducts挑战赛 3/688☆86Updated 4 years ago
- Workshop on Foundation Model 1st foundation model challenge Track1 codebase (Open TransMind v1.0)☆18Updated 2 years ago
- [AAAI 2024 Oral] M2CLIP: A Multimodal, Multi-Task Adapting Framework for Video Action Recognition☆63Updated 6 months ago
- 多模态融合情感分析☆131Updated 5 years ago
- Code for the paper: "Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM"☆61Updated last year
- 中软杯baseline-基于百度飞桨的单/多镜头行人追踪,使用百度飞桨PaddleDetection套件的PP-YOLO+Sort算法开发☆56Updated 2 years ago
- 多模态情感分析——基于BERT+ResNet的多种融合方法☆314Updated 2 years ago