Tools for managing datasets for governance and training.
☆90May 25, 2026Updated 2 weeks ago
Alternatives and similar repositories for data_tooling
Users that are interested in data_tooling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code used for sourcing and cleaning the BigScience ROOTS corpus☆318Mar 20, 2023Updated 3 years ago
- Personal information identification standard☆21Jan 24, 2024Updated 2 years ago
- All-in-one text de-duplication☆760Mar 9, 2026Updated 3 months ago
- Thử nghiệm gần đây mô hình MLP-Mixer trên bài toán nhận diện cảm xúc (Sentiment sentiment analysis)☆13Jul 9, 2021Updated 4 years ago
- Library for fast text representation and classification.☆31Jan 9, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code and Data for Evaluation WG☆42May 4, 2022Updated 4 years ago
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.☆1,017Jul 29, 2024Updated last year
- a ducttape workflow for neural machine translation☆14Mar 23, 2021Updated 5 years ago
- Multilingual bert retrained on news + squad2 for vietnamese☆24Feb 16, 2020Updated 6 years ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆74Mar 2, 2024Updated 2 years ago
- I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment☆17Jan 16, 2025Updated last year
- ☆78Dec 7, 2023Updated 2 years ago
- Pre-training script for BART in JAX/Flax☆38Aug 4, 2022Updated 3 years ago
- Semeval-2021 Multilingual and Cross-lingual Word-in-Context Task☆18May 27, 2021Updated 5 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- CMU Linguistic Annotation Backend☆15Sep 22, 2025Updated 8 months ago
- 청와대 국민청원 데이터 아카이브☆15Aug 29, 2020Updated 5 years ago
- Pipeline for pulling and processing online language model pretraining data from the web☆179Jul 31, 2023Updated 2 years ago
- Viewer for text datasets in formats like HuggingFace, JSONL, etc.☆15Feb 25, 2025Updated last year
- ☆12Dec 9, 2015Updated 10 years ago
- a compact audio-to-phoneme aligner for singing voice☆12Jan 17, 2024Updated 2 years ago
- ☆12Dec 15, 2022Updated 3 years ago
- ☆12Oct 22, 2019Updated 6 years ago
- An ensemble system with a search engine for relevant document retrieval and a deep learning model (BERT) for machine comprehension in Vie…☆14Oct 17, 2019Updated 6 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- The pipeline for the OSCAR corpus☆178Nov 9, 2025Updated 6 months ago
- IBM Molecule Generation Experience (MolGX) is a tool to accelerate an AI-driven design of new materials.☆15Oct 26, 2022Updated 3 years ago
- Machine Reading Comprehension special for the Vietnamese language☆41Mar 13, 2022Updated 4 years ago
- Code and data for the paper "Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?"☆26Jun 3, 2025Updated last year
- Preprocessing of datasets of chemical reactions: standardization, filtering, augmentation, tokenization, etc.☆16Sep 10, 2025Updated 8 months ago
- ☆1,273Jul 30, 2024Updated last year
- Intro to Machine Learning and Deep Learning for Earth-Life Sciences☆14Jun 29, 2019Updated 6 years ago
- Model Behavior Study Group☆30May 22, 2026Updated 2 weeks ago
- Anh - LAION's multilingual assistant datasets and models☆28Apr 5, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- kogpt를 oslo로 파인튜닝하는 예제.☆23Aug 26, 2022Updated 3 years ago
- ☆95Jul 16, 2022Updated 3 years ago
- Translation demonstrator☆38May 12, 2020Updated 6 years ago
- Diapositivas, notebooks y material de charlas, talleres y el grupo de estudio☆12Apr 24, 2024Updated 2 years ago
- ☆13Aug 29, 2020Updated 5 years ago
- Các thí nghiệm liên quan tới LLMs cho tiếng Việt (insprised by Physics of LLMs Series)☆11Oct 21, 2024Updated last year
- Korean Named Entity Corpus☆25May 12, 2023Updated 3 years ago