bigcode-project / bigcode-tokenizerLinks

☆15

Alternatives and similar repositories for bigcode-tokenizer

Users that are interested in bigcode-tokenizer are comparing it to the libraries listed below

Sorting:

IlyasMoutawwakil / py-txi
A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.
☆32Updated 2 months ago
castorini / hf-spacerini
Plug-and-play Search Interfaces with Pyserini and Hugging Face
☆32Updated 2 years ago
jxmorris12 / bm25_pt
minimal pytorch implementation of bm25 (with sparse tensors)
☆104Updated last month
sgugger / torchdynamo-tests
☆19Updated 3 years ago
luyug / magix
Supercharge huggingface transformers with model parallelism.
☆77Updated 4 months ago
nateraw / spaces-docker-templates
🚀🤗 A collection of templates for Hugging Face Spaces
☆35Updated 2 years ago
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆96Updated 2 years ago
Knowledgator / FlashDeBERTa
Trully flash implementation of DeBERTa disentangled attention mechanism.
☆67Updated 2 months ago
nbroad1881 / strideformer
Using short models to classify long texts
☆21Updated 2 years ago
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆35Updated 2 years ago
bminixhofer / tokenkit
A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.
☆55Updated 4 months ago
ChrisHayduk / qlora-multi-gpu
QLoRA with Enhanced Multi GPU Support
☆37Updated 2 years ago
argilla-io / distilabel-spin-dibt
Repository containing the SPIN experiments on the DIBT 10k ranked prompts
☆24Updated last year
facebookresearch / lss_eval
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Updated 2 years ago
huggingface / data-measurements-tool
Developing tools to automatically analyze datasets
☆75Updated last year
AnswerDotAI / ModernBERT-Instruct-mini-cookbook
☆52Updated 9 months ago
Upaya07 / NeurIPS-llm-efficiency-challenge
Code for NeurIPS LLM Efficiency Challenge
☆59Updated last year
choosewhatulike / case2code
☆17Updated 7 months ago
Hannibal046 / nanoColBERT
Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).
☆79Updated last year
illuin-tech / contextual-embeddings
Model implementation for the contextual embeddings project
☆36Updated 6 months ago
google-research-datasets / swim-ir
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…
☆49Updated 2 years ago
mixedbread-ai / mxbai-rerank
Crispy reranking models by Mixedbread
☆42Updated 2 months ago
huggingface / hffs
**ARCHIVED** Filesystem interface to 🤗 Hub
☆58Updated 2 years ago
ChrisHayduk / QLoRA-for-MLM
QLoRA for Masked Language Modeling
☆22Updated 2 years ago
mungg / FABLES
☆58Updated last year
CarperAI / squeakily
A library for squeakily cleaning and filtering language datasets.
☆49Updated 2 years ago
MeLeLBGU / SaGe
Code for SaGe subword tokenizer (EACL 2023)
☆27Updated last year
NielsRogge / awesome-huggingface
Repository containing awesome resources regarding Hugging Face tooling.
☆48Updated last year
bloomberg / minilmv2.bb
Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)
☆61Updated 2 years ago
bminixhofer / zett
Code for Zero-Shot Tokenizer Transfer
☆142Updated 10 months ago