Tokun to can tokens
☆18Jun 19, 2025Updated 8 months ago
Alternatives and similar repositories for tokun
Users that are interested in tokun are comparing it to the libraries listed below
Sorting:
- Arabic edition of ALBERT pretrained language models☆16Apr 25, 2021Updated 4 years ago
- Testing paligemma2 finetuning on reasoning dataset☆18Dec 28, 2024Updated last year
- QLoRA for Masked Language Modeling☆23Sep 11, 2023Updated 2 years ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Feb 15, 2024Updated 2 years ago
- Few-shot Learning with Auxiliary Data☆31Dec 8, 2023Updated 2 years ago
- QALD-9-Plus Dataset for Knowledge Graph Question Answering☆29Jun 5, 2024Updated last year
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Aug 5, 2023Updated 2 years ago
- ☆30Dec 6, 2021Updated 4 years ago
- A Node.Js / Neo4J tool that translates words and relations into network graphs and shows you how it all connects.☆11Oct 24, 2019Updated 6 years ago
- Arabic News Stance Corpus☆11Feb 5, 2021Updated 5 years ago
- ☆18Jun 25, 2025Updated 8 months ago
- Conversion of audio files to text using whisper from OpenAI with a simple tkinter GUI☆10Apr 13, 2023Updated 2 years ago
- a blog starter project☆11Oct 29, 2018Updated 7 years ago
- LLM Building Blocks for Python Course☆15Nov 17, 2025Updated 3 months ago
- ☆10May 1, 2025Updated 10 months ago
- A web interface for SleekDB written in PHP☆11Jan 22, 2022Updated 4 years ago
- The official implementation of the paper "Text Classification in the Wild: a Large-scale Long-tailed Name Normalization Dataset"(ICASSP 2…☆12Feb 19, 2023Updated 3 years ago
- Code Roberta version of RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder☆10Mar 16, 2023Updated 2 years ago
- ☆40Dec 25, 2022Updated 3 years ago
- Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-…☆11Jul 1, 2025Updated 8 months ago
- T5Patches is a set of tools for fast and targeted editing of generative language models built with T5X.☆12May 31, 2024Updated last year
- Radix Primitives Cheatsheet☆12Mar 11, 2022Updated 3 years ago
- A context-aware embedding similarity score☆11Aug 23, 2023Updated 2 years ago
- This repository contains code used for our Multi Sentence Inference NAACL'22 paper.☆12Mar 6, 2023Updated 2 years ago
- ☆10Mar 22, 2024Updated last year
- Ask Poddy: Run Open Source LLMs and Embeddings as OpenAI-Compatible Serverless Endpoints (Tutorial)☆11Jul 19, 2024Updated last year
- ☆12Mar 3, 2023Updated 3 years ago
- Seamless Voice Interactions with LLMs☆12Oct 28, 2023Updated 2 years ago
- This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…☆11Dec 27, 2024Updated last year
- Pytorch implementation of standard metrics for clustering☆10Mar 21, 2023Updated 2 years ago
- extending laughbot project to encoder-based transformer model finetuned on same dataset for humor classification☆10Jan 4, 2023Updated 3 years ago
- ☆11Feb 25, 2025Updated last year
- ☆11Jul 19, 2018Updated 7 years ago
- ☆14Oct 8, 2025Updated 4 months ago
- ☆10Apr 3, 2024Updated last year
- Finetuning a codegen model with python instruction set using QLORA technique for better efficacy☆11Aug 31, 2023Updated 2 years ago
- This repository contains the implementation code for paper: Mixup Your Own Pairs☆12Oct 1, 2023Updated 2 years ago
- The contrastive token loss function for reducing generative repetition of autoregressive neural language models.☆13May 11, 2022Updated 3 years ago
- Main Panax Documentation☆11Feb 12, 2016Updated 10 years ago