malaysia-ai / datasetLinks
Recipes to prepare datasets!
☆15Updated last week
Alternatives and similar repositories for dataset
Users that are interested in dataset are comparing it to the libraries listed below
Sorting:
- ☆28Updated 2 years ago
- ☆17Updated 4 years ago
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downs…☆32Updated 4 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Updated 3 months ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 3 years ago
- TorchServe+Streamlit for easily serving your HuggingFace NER models☆33Updated 3 years ago
- Finetune LayoutLM on SROIE dataset using W&B tools☆19Updated 4 years ago
- Deploy DL/ ML inference pipelines with minimal extra code.☆102Updated last year
- The collection of bulding blocks building fine-tunable metric learning models☆35Updated 2 months ago
- Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at SDU@AAAI-22☆14Updated 2 years ago
- Implementation of the DocLLM paper for Llama models.☆13Updated 8 months ago
- ☆20Updated 4 years ago
- Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages☆11Updated 2 years ago
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training☆74Updated 2 months ago
- ☆33Updated 6 years ago
- Evaluation of the Layoutlm model on the CORD dataset☆32Updated 3 years ago
- A series of notebooks demonstrating how to build simple NLP web apps with Gradio and Hugging Face transformers☆44Updated 2 months ago
- All my experiments with the various transformers and various transformer frameworks available☆14Updated 4 years ago
- NLP Examples using the 🤗 libraries☆40Updated 4 years ago
- Using short models to classify long texts☆21Updated 2 years ago
- The search for the best Conversational AI pipeline☆14Updated 5 years ago
- minimal scripts for 24GB VRAM GPUs. training, inference, whatever☆50Updated last month
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…☆44Updated last year
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆39Updated last year
- [Computer Speech & Language] A transformer-based spelling error correction framework for Bangla and resource scarce Indic languages☆14Updated last year
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated 2 years ago
- Document Classification and Post-OCR Key Value Extraction☆63Updated 6 years ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Updated 2 years ago
- Cross-lingual learning in scene text recognition (ICASSP2024)☆18Updated last year
- PyLate efficient inference engine☆68Updated 3 months ago