Data preparation code for CrystalCoder 7B LLM
☆45May 10, 2024Updated 2 years ago
Alternatives and similar repositories for crystalcoder-data-prep
Users that are interested in crystalcoder-data-prep are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Pre-training code for CrystalCoder 7B LLM☆59May 10, 2024Updated 2 years ago
- Data preparation code for Amber 7B LLM☆96May 10, 2024Updated 2 years ago
- Pre-training code for Amber 7B LLM☆175May 10, 2024Updated 2 years ago
- Open Implementations of LLM Analyses☆109Oct 8, 2024Updated last year
- ☆238May 10, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A list where most values will be None (or default)☆11Jun 22, 2026Updated last week
- An open-source conversational language model developed by the Knowledge Works Research Laboratory at Fudan University.☆64Oct 12, 2023Updated 2 years ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated 2 years ago
- [NeurIPS 2024 poster] Cross-model Control: Improving Multiple Large Language Models in One-time Training☆14Oct 25, 2024Updated last year
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- Curso de Deep Learning desde las bases de Python, Fundamentos del Machine Learning, Fundamentos Matematicos del ML y DL, Redes Neuronale…☆16Jun 24, 2021Updated 5 years ago
- INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness☆15Jun 2, 2026Updated last month
- ☆10Apr 15, 2023Updated 3 years ago
- ☆13Oct 11, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICML'25] MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents☆34Jul 31, 2025Updated 11 months ago
- Minimum Description Length probing for neural network representations☆20Jan 28, 2025Updated last year
- ☆15Oct 2, 2024Updated last year
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆40Nov 11, 2024Updated last year
- Text-2-SQL☆19Feb 21, 2025Updated last year
- [Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models☆19Mar 16, 2023Updated 3 years ago
- An open-source framework for building monolithic or distributed agentic systems, ranging from simple LLM calls to compositional workflows…☆29Jan 14, 2026Updated 5 months ago
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆178Sep 15, 2023Updated 2 years ago
- Source code for paper: Knowledge Inheritance for Pre-trained Language Models☆37Apr 24, 2022Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- BH hackathon☆14Apr 4, 2024Updated 2 years ago
- OOPSLA 2019 Artifact for AutoPandas. Website at https://rbavishi.github.io/autopandas☆31Nov 21, 2022Updated 3 years ago
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.☆13Jun 7, 2023Updated 3 years ago
- Ongoing research training transformer models at scale☆49Jun 25, 2026Updated last week
- awesome-LLM-controlled-constrained-generation☆57Aug 16, 2024Updated last year
- Tiny evaluation of leading LLMs on competitive programming problems☆14Apr 10, 2026Updated 2 months ago
- OpenSource deployment made easy☆10Jun 13, 2015Updated 11 years ago
- Official repository for "Reweighting Strategy based on Synthetic Data Identification for Sentence Similarity (COLING2022)"☆18Sep 4, 2022Updated 3 years ago
- Build a level 1 coding agent.☆17Jan 28, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Scaling In-context Learning from Few-shot to 1,024-shot on Tabular ML☆59Dec 12, 2025Updated 6 months ago
- Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon☆16May 8, 2025Updated last year
- This is the repo for our work “An Extensible Plug-and-Play Method for Multi-Aspect Controllable Text Generation” (ACL 2023).☆14Jul 23, 2023Updated 2 years ago
- 记录自己对《代码审计》的理解和总结,对危险函数的深入分析以及在p牛的博客和代码审计圈的收获☆10Feb 27, 2018Updated 8 years ago
- ☆19Dec 31, 2025Updated 6 months ago
- A Dataset of 600k Java Source Code Changes Categorized by Diff Size http://arxiv.org/pdf/2108.04631☆22Mar 22, 2024Updated 2 years ago
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆48Jul 17, 2025Updated 11 months ago