Pretraining data reconstruction scripts for Apertus
☆121Oct 27, 2025Updated 5 months ago
Alternatives and similar repositories for pretrain-data
Users that are interested in pretrain-data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19Feb 25, 2024Updated 2 years ago
- ☆23May 4, 2025Updated 11 months ago
- This GUI aims to simplify the process of converting GGUF files to llamafile format by providing an intuitive and convenient way for users…☆14Jan 2, 2026Updated 3 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆34Feb 11, 2026Updated 2 months ago
- The test set for Koala☆45Mar 31, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [CVPR2026] Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens☆53Mar 20, 2026Updated 3 weeks ago
- StyleGAN - Official TensorFlow Implementation☆11Jun 2, 2019Updated 6 years ago
- Wurst Client for Minecraft 1.7.2 - 1.7.10☆11Mar 10, 2017Updated 9 years ago
- Tiny evaluation of leading LLMs on competitive programming problems☆14Nov 28, 2024Updated last year
- win32 native frontend for llama-cli☆13Nov 2, 2024Updated last year
- ☆36Sep 22, 2025Updated 6 months ago
- Reusable AI coding agent skills for building voice AI with LiveKit☆39Feb 25, 2026Updated last month
- ☆10Oct 20, 2023Updated 2 years ago
- ☆16Apr 12, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A framework for evaluating semantic search across custom datasets, metrics, and embedding backends.☆38May 26, 2025Updated 10 months ago
- Debiasing Through Data Attribution☆13May 23, 2024Updated last year
- A suite of tools for text preparation, vectorization and processing for deep learning with Keras.☆13Jul 29, 2023Updated 2 years ago
- ACL24☆11Jun 7, 2024Updated last year
- Developing a legal research tool leveraging ChatGPT / GPT-4☆14Mar 10, 2024Updated 2 years ago
- Example code for the NNGeometry PyTorch library☆10Aug 20, 2025Updated 7 months ago
- Easily convert HuggingFace models to GGUF-format for llama.cpp☆23Jul 27, 2024Updated last year
- Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation☆14Jan 2, 2026Updated 3 months ago
- Hackathon: HackHarvard 2017 Project | Wanting to change the way teams catalog brainstorming sessions, we developed Team Bo[AR]d, a native…☆14Oct 28, 2017Updated 8 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- An Empirical Study of Memorization in NLP (ACL 2022)☆13Jun 22, 2022Updated 3 years ago
- Utilities for working with RDF/Linked Data in JavaScript / TypeScript☆10Sep 12, 2022Updated 3 years ago
- A smarter way to deal with timers. Intended for LÖVE☆10Aug 21, 2017Updated 8 years ago
- An AI-powered GitHub search tool utilising Generative UI☆14Jul 20, 2024Updated last year
- ☆14May 14, 2025Updated 10 months ago
- ☆13Jun 5, 2024Updated last year
- ☆23Apr 2, 2026Updated last week
- ☆12Mar 31, 2026Updated last week
- ☆27Aug 16, 2025Updated 7 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Synthetic Data Generation with Execution-Based Verification and Grounding for LLM Training.☆20Feb 7, 2025Updated last year
- Open-source, knowledge-grounded conversational assistant☆14Jun 30, 2025Updated 9 months ago
- WikiTableSet: A largest publicly available image-based table recognition dataset in three languages built from Wikipedia☆32Jun 12, 2025Updated 9 months ago
- The Programmers Open Workbench☆13Dec 19, 2011Updated 14 years ago
- The Conceptual Coverage Across Languages Benchmark for Text-to-Image Models☆12Oct 28, 2024Updated last year
- Code for react youtube tutorial☆31Feb 14, 2024Updated 2 years ago
- Research code and scripts used in the Silburt et al. (2021) EMNLP 2021 paper 'FANATIC: FAst Noise-Aware TopIc Clustering'☆11Jul 6, 2023Updated 2 years ago