decontamination
☆26Dec 3, 2025Updated 3 months ago
Alternatives and similar repositories for decon
Users that are interested in decon are comparing it to the libraries listed below
Sorting:
- docker for HF wav2vec2-sprint☆13Mar 26, 2021Updated 4 years ago
- Experiments for XLM-V Transformers Integeration☆13Feb 8, 2023Updated 3 years ago
- ☆16Feb 18, 2026Updated 2 weeks ago
- Train LLM on Hugging Face infra☆68Nov 13, 2025Updated 3 months ago
- [NeurIPS 2022]MorphTE: Injecting Morphology in Tensorized Embeddings☆17Oct 29, 2022Updated 3 years ago
- Implementation of the paper 'Sentence Bottleneck Autoencoders from Transformer Language Models'☆17Mar 14, 2022Updated 3 years ago
- ☆30Sep 27, 2021Updated 4 years ago
- [EMNLP 2022] Adapting a Language Model While Preserving its General Knowledge☆21Feb 12, 2023Updated 3 years ago
- ☆48Jan 20, 2026Updated last month
- Code for ACL 2023 paper titled "Lifting the Curse of Capacity Gap in Distilling Language Models"☆29Jul 14, 2023Updated 2 years ago
- A Streamlit app to add structured tags to a dataset card☆22Jun 30, 2022Updated 3 years ago
- Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…☆33Feb 7, 2025Updated last year
- Robust Self-augmentation for NER with Meta-reweighting☆29Nov 8, 2022Updated 3 years ago
- Tooling for exact and MinHash deduplication of large-scale text datasets☆72Feb 19, 2026Updated last week
- Staged Training for Transformer Language Models☆33Mar 31, 2022Updated 3 years ago
- A tiny BERT for low-resource monolingual models☆31Dec 24, 2025Updated 2 months ago
- Data mapping framework for rust stuff☆46Feb 26, 2026Updated last week
- Ultrafast & lightweight realtime package for React☆69Updated this week
- UDapter is a multilingual dependency parser that uses "contextual" adapters together with language-typology features for language-specifi…☆31Dec 5, 2022Updated 3 years ago
- Meta Representation Transformation for Low-resource Cross-lingual Learning☆41May 5, 2021Updated 4 years ago
- This Rust-powered backend leverages Axum, AES, RSA encryption, and block-modes to secure file sharing with end-to-end encryption. Easily …☆39Sep 30, 2024Updated last year
- MiniMax-Provider-Verifier offers a rigorous, vendor-agnostic way to verify whether third-party deployments of the Minimax M2 model are co…☆29Feb 18, 2026Updated 2 weeks ago
- A Python framework for interacting with in-browser DOM via websockets☆11Mar 28, 2018Updated 7 years ago
- OLMost every training recipe you need to perform data interventions with the OLMo family of models.☆64Updated this week
- Training framework with a goal to explore the frontier of sample efficiency of small language models☆98Jan 25, 2026Updated last month
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆96Feb 9, 2023Updated 3 years ago
- Issue tracker for the Open Targets Platform☆13Jul 8, 2025Updated 7 months ago
- ☆12Apr 21, 2025Updated 10 months ago
- ☆17Aug 5, 2025Updated 6 months ago
- A minimalistic deployment software focused on simplicity and clarity.☆11Feb 12, 2022Updated 4 years ago
- ☆10Sep 4, 2025Updated 6 months ago
- Some kind of TidalCycles implementation for SuperCollider☆14May 29, 2020Updated 5 years ago
- Creates Random Coding Sequences with specified GC content and Amino Acid usage☆10Jun 21, 2022Updated 3 years ago
- ☆20Sep 11, 2025Updated 5 months ago
- Analysis on stop reasons☆10Jun 17, 2024Updated last year
- ☆10Oct 2, 2024Updated last year
- Simple Flutter State Management☆10Oct 30, 2025Updated 4 months ago
- CWTS OpenAlex ETL data pipeline.☆16Oct 29, 2025Updated 4 months ago
- Trading signals processing solution that supports signals filtering and posting to broker or exchanges that are not integrated into your …☆10May 9, 2021Updated 4 years ago