forbes110 / PLEDGE--Paragraph-LEvel-image-Description-GEnerationLinks
Apply an end-to-end model structure (ViT + GPT) to describe images in more detail, rather than traditional image captioning that only provides object detections or a few simple sentences.
☆11Updated 8 months ago
Alternatives and similar repositories for PLEDGE--Paragraph-LEvel-image-Description-GEneration
Users that are interested in PLEDGE--Paragraph-LEvel-image-Description-GEneration are comparing it to the libraries listed below
Sorting:
- This project predicts wind turbine failure using numerous sensor data by applying classification based ML models that improves prediction…☆10Updated 2 years ago
- I created some notebooks about different concepts of financial engineering☆10Updated last week
- A version of the Temporal Fusion Transformer in TF2 that is lightweight, utilizes Keras layers, and ultimately readable and modifiable.☆17Updated last year
- ☆15Updated 2 years ago
- TrustAi website☆12Updated last year
- Optimal Planning for NTU YouBike Assignment with Operation Research and Machine Learning Techniques☆10Updated last year
- A Mixed Sample Data Augmentation method for Training with Time-Frequency Domain Features☆10Updated 3 years ago
- ☆10Updated 3 years ago
- Code and Data for M3A: Multimodal Multi-speaker Mergers & Acquisitions at ACL-IJCNLP 2021 (main)☆15Updated 4 years ago
- ☆19Updated 4 years ago
- A curated list of resources in audio visual question answering and related area. :-)☆14Updated 3 months ago
- The offical code of "Parameter-Efficient Learning for Text-to-Speech Accent Adaptation"☆13Updated 2 years ago
- KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…☆12Updated 2 years ago
- GiMeFive: Towards Interpretable Facial Emotion Classification 😄😲😭😡🤢😨 (PyTorch Implementation)☆15Updated last year
- SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings☆14Updated last year
- This repository relates to the paper "Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Oppor…☆21Updated 4 years ago
- Run Retrieval-based Voice Conversion training and inference with ease.☆11Updated 8 months ago
- ☆14Updated 2 years ago
- An implementation of the paper titled "Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset" https://…☆12Updated 3 years ago
- ☆14Updated 2 years ago
- [InterSpeech'2023] "Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion"☆13Updated last year
- Diffusion Model for Voice Conversion☆17Updated 2 years ago
- This repo contains the code for "Voice Disorder Analysis: A Transformer-based Approach", accepted at Interspeech 2024☆12Updated last year
- Cantonese Selfish Project 廣東話自肥企劃 at PYCON HK 2021☆15Updated 3 years ago
- Official Pytorch Implementation for Continual Learning For On-Device Environmental Sound Classification☆14Updated 3 years ago
- This is a repository dedicated for pre-trained acoustic models of Hong Kong Cantonese and Cantonese forced alignment.☆17Updated 10 months ago
- (NeurIPS 2023 Workshop on DGM4H) Official Implementation of "Adversarial Fine-tuning using Generated Respiratory Sound to Address Class I…☆19Updated 10 months ago
- Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"☆26Updated 2 years ago
- Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.☆10Updated 2 years ago
- The Land-Diffuser is a novel application of the Denoising Diffusion Probabilistic Model (DDPM) in the realm of 3D Talking Head generation…☆13Updated last year