malaysia-ai / datasetLinks
Recipes to prepare datasets!
☆14Updated 3 weeks ago
Alternatives and similar repositories for dataset
Users that are interested in dataset are comparing it to the libraries listed below
Sorting:
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 3 years ago
- TorchServe+Streamlit for easily serving your HuggingFace NER models☆33Updated 3 years ago
- ☆28Updated 2 years ago
- Deploy DL/ ML inference pipelines with minimal extra code.☆100Updated 11 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆33Updated last month
- Comparing PyTorch, JIT and ONNX for inference with Transformers☆20Updated 4 years ago
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downs…☆32Updated 4 years ago
- All my experiments with the various transformers and various transformer frameworks available☆14Updated 4 years ago
- Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages☆11Updated 2 years ago
- NLP Examples using the 🤗 libraries☆40Updated 4 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated 2 years ago
- ☆33Updated 6 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆35Updated 2 years ago
- Exploring NLP weak supervision approaches to train text classification models. The project is also a prototype for a semi-automated text …☆22Updated last year
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆58Updated last year
- The search for the best Conversational AI pipeline☆14Updated 5 years ago
- Open Source Speech Inferencing Libary for Indic Languages☆13Updated 3 years ago
- Using short models to classify long texts☆21Updated 2 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆96Updated 2 years ago
- Large Scale BERT Distillation☆33Updated 2 years ago
- Multi-Modal Language Modeling with Image, Audio and Text Integration, included multi-images and multi-audio in a single multiturn.☆18Updated last year
- Implementation of the DocLLM paper for Llama models.☆13Updated 7 months ago
- minimal scripts for 24GB VRAM GPUs. training, inference, whatever☆48Updated last week
- ☆14Updated last year
- ☆15Updated last year
- BERT Probe: A python package for probing attention based robustness to character and word based adversarial evaluation. Also, with recipe…☆18Updated 3 years ago
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training☆73Updated 3 weeks ago
- Using open-source LLM Llama2 by Meta on local CPU inference for document question-and-answer☆15Updated 2 years ago
- Finetune multiple pre-trained Transformer-based models to solve Vietnamese Fake News Detection problem (ReINTEL) in VLSP2020 shared task☆18Updated 4 years ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆38Updated last year