forbes110 / PLEDGE--Paragraph-LEvel-image-Description-GEnerationLinks
Apply an end-to-end model structure (ViT + GPT) to describe images in more detail, rather than traditional image captioning that only provides object detections or a few simple sentences.
☆11Updated 8 months ago
Alternatives and similar repositories for PLEDGE--Paragraph-LEvel-image-Description-GEneration
Users that are interested in PLEDGE--Paragraph-LEvel-image-Description-GEneration are comparing it to the libraries listed below
Sorting:
- A version of the Temporal Fusion Transformer in TF2 that is lightweight, utilizes Keras layers, and ultimately readable and modifiable.☆16Updated last year
- ☆14Updated 2 years ago
- Optimal Planning for NTU YouBike Assignment with Operation Research and Machine Learning Techniques☆10Updated last year
- TrustAi website☆12Updated last year
- Simple, Unified Repository for Retrieval-based Voice Conversion☆17Updated last year
- I created some notebooks about different concepts of financial engineering☆10Updated 6 months ago
- A Mixed Sample Data Augmentation method for Training with Time-Frequency Domain Features☆10Updated 2 years ago
- Apply pre-trained models to help quickly grasp investment news, including three tasks, 1. summarizationm 2. sentiment analysis 3. domain …☆13Updated last year
- A curated list of resources in audio visual question answering and related area. :-)☆12Updated 2 months ago
- This repo contains the code for "Voice Disorder Analysis: A Transformer-based Approach", accepted at Interspeech 2024☆12Updated last year
- The offical code of "Parameter-Efficient Learning for Text-to-Speech Accent Adaptation"☆13Updated 2 years ago
- GiMeFive: Towards Interpretable Facial Emotion Classification 😄😲😭😡🤢😨 (PyTorch Implementation)☆15Updated last year
- ☆19Updated 4 years ago
- ☆14Updated 2 years ago
- An implementation of the paper titled "Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset" https://…☆12Updated 3 years ago
- We archive data because we are interested in the diffs. All data is from https://video-api.cartoonnetwork.com. We run the check every min…☆10Updated this week
- Code for "Learning an adaptation function to assess image visual similarities", ICIP'21☆10Updated 2 years ago
- Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022☆11Updated 5 months ago
- Run Retrieval-based Voice Conversion training and inference with ease.☆11Updated 7 months ago
- SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings☆14Updated last year
- Deep Learning Methods for Identifying Human Postures from Hip-Worn Accelerometer Data☆11Updated 6 months ago
- Repository for "Zero is Not Hero Yet: Benchmarking Zero-Shot Performance of LLMs for Financial Tasks"☆24Updated 2 years ago
- ☆11Updated 4 years ago
- Inference model for stock prediction using Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting☆13Updated last year
- Official Pytorch Implementation for Continual Learning For On-Device Environmental Sound Classification☆14Updated 3 years ago
- Scripts, data and researches related to cow weight and breed prediction☆12Updated 3 weeks ago
- Diffusion Model for Voice Conversion☆17Updated 2 years ago
- ☆10Updated last year
- Cantonese Selfish Project 廣東話自肥企劃 at PYCON HK 2021☆15Updated 3 years ago
- Indic-Conformer models for ASR☆18Updated last year