Pretraining data reconstruction scripts for Apertus
☆120Oct 27, 2025Updated 4 months ago
Alternatives and similar repositories for pretrain-data
Users that are interested in pretrain-data are comparing it to the libraries listed below
Sorting:
- Muon fsdp 2☆55Aug 8, 2025Updated 7 months ago
- This GUI aims to simplify the process of converting GGUF files to llamafile format by providing an intuitive and convenient way for users…☆14Jan 2, 2026Updated 2 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆34Feb 11, 2026Updated last month
- Create embeddings for LLM using the Nomic API☆23Nov 21, 2024Updated last year
- The test set for Koala☆45Mar 31, 2023Updated 2 years ago
- This is a dataset that aligns piano music MIDI with their corresponding textual descriptions and comments. It can be used for multi-modal…☆12Nov 21, 2023Updated 2 years ago
- ☆42Mar 10, 2026Updated last week
- OpenFaaS function demonstrating how CloudEvents0.1 may be handled within the function itself.☆13Jun 8, 2018Updated 7 years ago
- European Parliament website Python scraper☆12Oct 19, 2016Updated 9 years ago
- Highly concurrent and fast content processing for Mighty Inference Server☆10Feb 6, 2023Updated 3 years ago
- OpenFaaS anonymize function. Use MachineBox service.☆35Jan 19, 2018Updated 8 years ago
- ☆57Feb 10, 2025Updated last year
- German Parliamentary Corpus (GerParCor)☆30Jan 14, 2026Updated 2 months ago
- Tiny evaluation of leading LLMs on competitive programming problems☆14Nov 28, 2024Updated last year
- generate css sprites automagically☆14Mar 31, 2016Updated 9 years ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- Badgers: Bad Data Generators☆14Jan 29, 2026Updated last month
- A living document about DIY room correction☆15Feb 10, 2020Updated 6 years ago
- MCPatcher as a 1.7.10 forge mod, using mixins☆12Jan 25, 2024Updated 2 years ago
- ☆10Oct 20, 2023Updated 2 years ago
- Web-based spreadsheet editor with support for real-time collaboration☆17Mar 17, 2022Updated 4 years ago
- Personal website☆10Mar 13, 2026Updated last week
- ☆11Nov 2, 2024Updated last year
- Debiasing Through Data Attribution☆12May 23, 2024Updated last year
- an advanced level shifting neighbor☆11Dec 6, 2021Updated 4 years ago
- Official Implementation of Knowledge Flow Prompting☆35Oct 20, 2025Updated 5 months ago
- A diff tool for language models☆44Dec 28, 2023Updated 2 years ago
- ACL24☆11Jun 7, 2024Updated last year
- Developing a legal research tool leveraging ChatGPT / GPT-4☆15Mar 10, 2024Updated 2 years ago
- Example code for the NNGeometry PyTorch library☆10Aug 20, 2025Updated 7 months ago
- ☆10Oct 16, 2017Updated 8 years ago
- Easily convert HuggingFace models to GGUF-format for llama.cpp☆23Jul 27, 2024Updated last year
- The #1 modpack to use for all common versions of Minecraft Java Edition (1.7.10 - 1.20.6)☆14Nov 7, 2024Updated last year
- A game in MacRuby and OpenGL developed during 2010 Super Game Dev Weekend☆49Dec 17, 2010Updated 15 years ago
- Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation☆14Jan 2, 2026Updated 2 months ago
- An Empirical Study of Memorization in NLP (ACL 2022)☆13Jun 22, 2022Updated 3 years ago
- ☆19Jul 4, 2025Updated 8 months ago
- CQL parser for Java☆15Feb 10, 2026Updated last month
- ☆10Apr 26, 2021Updated 4 years ago