Repository for analysis and experiments in the BigCode project.
☆128Mar 20, 2024Updated last year
Alternatives and similar repositories for bigcode-analysis
Users that are interested in bigcode-analysis are comparing it to the libraries listed below
Sorting:
- ☆489Aug 15, 2024Updated last year
- ☆15Oct 24, 2023Updated 2 years ago
- A framework for the evaluation of autoregressive code generation language models.☆1,020Jul 22, 2025Updated 7 months ago
- ☆19Aug 10, 2024Updated last year
- User-friendly viewer for Parquet files☆10Jan 10, 2026Updated last month
- Code used for sourcing and cleaning the BigScience ROOTS corpus☆318Mar 20, 2023Updated 2 years ago
- ☆23Jul 10, 2023Updated 2 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆27Nov 30, 2024Updated last year
- Demonstrates how to formulate the n-queens problem as a QUBO, which we then solve using Leap’s hybrid solvers.☆10Oct 31, 2023Updated 2 years ago
- DARPA Cyber Grand Challenge Linux source code☆17Jul 9, 2015Updated 10 years ago
- Python Module implementing SRP☆12Jul 29, 2022Updated 3 years ago
- Scaling Data-Constrained Language Models☆340Jun 28, 2025Updated 8 months ago
- Hugging Face Download (Cache) Manager☆21Aug 7, 2022Updated 3 years ago
- All-in-one text de-duplication☆744Jan 2, 2026Updated last month
- Fun project to run your own LLM chat bot using llama.cpp☆11Jun 9, 2023Updated 2 years ago
- Standalone commandline CLI tool for compiling Triton kernels☆20Sep 13, 2024Updated last year
- Haskell bindings for libNVVM☆20Apr 1, 2014Updated 11 years ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"☆18May 15, 2025Updated 9 months ago
- ANE accelerated embedding models!☆20Dec 11, 2024Updated last year
- Guides and examples to help achieve optimal performance on a NVIDIA Grace CPU☆16Aug 9, 2024Updated last year
- Flexibly track outputs and grad-outputs of torch.nn.Module.☆13Oct 6, 2023Updated 2 years ago
- Development containers for triton and triton-cpu☆24Feb 16, 2026Updated last week
- ☆16Jul 16, 2024Updated last year
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Apr 20, 2024Updated last year
- Operations Research Algorithms☆19Mar 20, 2024Updated last year
- A Proxy service using FastAPI and Protocol Buffers (Proto3)☆13Jun 17, 2023Updated 2 years ago
- ☆12Jun 2, 2023Updated 2 years ago
- Official code for TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations☆36Jan 24, 2026Updated last month
- Gradio UI for a Cog API☆70Apr 8, 2024Updated last year
- JAX implementation of the Mistral 7b v0.2 model☆35Jul 3, 2024Updated last year
- Fine-tune SantaCoder for Code/Text Generation.☆196Apr 11, 2023Updated 2 years ago
- A basic pure pytorch implementation of flash attention☆16Oct 28, 2024Updated last year
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆40Jul 13, 2024Updated last year
- Ongoing research training transformer models at scale☆395Aug 20, 2024Updated last year
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆104Jan 15, 2024Updated 2 years ago
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)☆20Aug 20, 2024Updated last year
- [NAACL 2024] Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers https://arxiv.org/abs/2307.…☆17Jan 27, 2024Updated 2 years ago
- various experiments for scaling inference time compute with small reasoning models☆17Jan 16, 2025Updated last year