Bhashini-IITJ / IndicPhotoOCRLinks
Comprehensive Scene Text Recognition Toolkit across 11 Indian Languages
☆37Updated last week
Alternatives and similar repositories for IndicPhotoOCR
Users that are interested in IndicPhotoOCR are comparing it to the libraries listed below
Sorting:
- Implementation of Baseline for Scene Text-to-Scene Text Translation☆18Updated 8 months ago
- This project predicts wind turbine failure using numerous sensor data by applying classification based ML models that improves prediction…☆10Updated 2 years ago
- A open-source framework designed to adapt pre-trained Language Models (LLMs), such as Llama, Mistral, and Mixtral, to a wide array of dom…☆22Updated last year
- Indic-Conformer models for ASR☆20Updated last year
- Apply an end-to-end model structure (ViT + GPT) to describe images in more detail, rather than traditional image captioning that only pro…☆11Updated 11 months ago
- Scripts, data and researches related to cow weight and breed prediction☆13Updated 3 months ago
- Sample and Computation Redistribution for Efficient Face Detection☆15Updated last year
- Simple, Unified Repository for Retrieval-based Voice Conversion☆17Updated last year
- Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022☆11Updated 8 months ago
- CalBERT - Code-mixed Adaptive Language representations using BERT, published at AAAI-MAKE 2022☆13Updated last year
- This repository is a comprehensive project that leverages the XLM-Roberta model for intent detection. This repository is a valuable resou…☆16Updated last year
- An implementation of the paper titled "Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset" https://…☆13Updated 3 years ago
- This repo contains the code for "Voice Disorder Analysis: A Transformer-based Approach", accepted at Interspeech 2024☆13Updated last year
- The Land-Diffuser is a novel application of the Denoising Diffusion Probabilistic Model (DDPM) in the realm of 3D Talking Head generation…☆13Updated last year
- Faceprecision is a comprehensive face analysis project leveraging advanced deep learning and computer vision techniques. This project inc…☆14Updated last year
- The main objective of this experiment is detect blur on face pictures to improve the results of face recognition process.☆14Updated 2 years ago
- ☆16Updated last year
- A curated list of resources in audio visual question answering and related area. :-)☆16Updated 5 months ago
- GiMeFive: Towards Interpretable Facial Emotion Classification 😄😲😭😡🤢😨 (PyTorch Implementation)☆16Updated last year
- With one whole audio and corresponding text, the audio can be split line by line and saved with exact sentence using comparison with the …☆10Updated 6 years ago
- Mission to create a Hebrew TTS model as powerful and user-friendly as WaveNet☆38Updated 11 months ago
- A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension☆14Updated 2 years ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 4 months ago
- This repository is the implementation of the paper, "Score-balanced Loss for Multi-aspect Pronunciation Assessment" (Interspeech 2023).☆22Updated last year
- Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.☆29Updated last year
- [ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding☆60Updated 6 months ago
- A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanaga…☆36Updated last year
- POC for creating a AI Video Editor for Content Creators with Various Capabilities.☆11Updated 10 months ago
- ML algorithms implementations that are good for learning the underlying principles☆26Updated last year
- Wav2vec2 Large XLSR 53 fine-tuned for Malayalam☆11Updated 4 years ago