forbes110 / PLEDGE--Paragraph-LEvel-image-Description-GEnerationLinks
Apply an end-to-end model structure (ViT + GPT) to describe images in more detail, rather than traditional image captioning that only provides object detections or a few simple sentences.
☆11Updated 6 months ago
Alternatives and similar repositories for PLEDGE--Paragraph-LEvel-image-Description-GEneration
Users that are interested in PLEDGE--Paragraph-LEvel-image-Description-GEneration are comparing it to the libraries listed below
Sorting:
- A version of the Temporal Fusion Transformer in TF2 that is lightweight, utilizes Keras layers, and ultimately readable and modifiable.☆15Updated 11 months ago
- Information Retrieval project.☆9Updated 3 years ago
- ☆13Updated 2 years ago
- Simple, Unified Repository for Retrieval-based Voice Conversion☆17Updated last year
- Optimal Planning for NTU YouBike Assignment with Operation Research and Machine Learning Techniques☆10Updated 10 months ago
- WindTurbineHighSpeedBearingPrognosis-Data☆10Updated 4 years ago
- In this project, I used Decision Tree Learning Model as the main algorithm to build the model. Due to the big amount of flight data, we i…☆12Updated 3 years ago
- I created some notebooks about different concepts of financial engineering☆10Updated 4 months ago
- ⚡ From Zero to Monitoring LLMs in 5 minutes ⚡☆6Updated last year
- A Streamlit based web app which targets on converting voices into different languages (Hindi to English (for now)) keeping the voice in…☆8Updated 3 years ago
- A curated list of resources in audio visual question answering and related area. :-)☆10Updated 3 weeks ago
- Kinship Face Synthesis☆4Updated 2 years ago
- A Mixed Sample Data Augmentation method for Training with Time-Frequency Domain Features☆11Updated 2 years ago
- KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…☆12Updated 2 years ago
- ☆14Updated last year
- Indic-Conformer models for ASR☆17Updated last year
- Offical implementation of "Confidence-Calibrated Face and Kinship Verification"☆21Updated last year
- Code and Data for M3A: Multimodal Multi-speaker Mergers & Acquisitions at ACL-IJCNLP 2021 (main)☆15Updated 4 years ago
- Monetize.ai is a web-based chatbot that provides personalized investment advice using GPT-3.5 and Yahoo Finance API. It's built using Fla…☆15Updated 2 years ago
- This is a repository dedicated for pre-trained acoustic models of Hong Kong Cantonese and Cantonese forced alignment.☆15Updated 8 months ago
- This repo contains the code for "Voice Disorder Analysis: A Transformer-based Approach", accepted at Interspeech 2024☆10Updated last year
- Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.☆10Updated last year
- Dataset accompanying the paper titled "Pothole detection and dimension estimation system using deep learning (YOLO) and image processing"☆11Updated 2 years ago
- SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings☆14Updated last year
- A complete end-to-end Deep Learning system to generate high quality human like speech in English for Korean Drama (WIP)☆13Updated 2 years ago
- Cantonese Selfish Project 廣東話自肥企劃 at PYCON HK 2021☆15Updated 3 years ago
- Bounding and Filling: A Fast and Flexible Framework for Image Captioning☆9Updated last year
- ☆18Updated 4 years ago
- The offical code of "Parameter-Efficient Learning for Text-to-Speech Accent Adaptation"☆13Updated last year
- A Python neural network made with TensorFlow that converts one person's voice into another.☆10Updated 4 years ago