DunZhang/Jasper-Token-Compression-Training

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DunZhang/Jasper-Token-Compression-Training)

DunZhang / Jasper-Token-Compression-Training

The training codes of Jasper-Token-Compression-600M

☆20

Alternatives and similar repositories for Jasper-Token-Compression-Training

Users that are interested in Jasper-Token-Compression-Training are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

DSBA-Lab / Contrastive-Accumulation
View on GitHub
☆14Jul 7, 2024Updated 2 years ago
instructkr / reranker-simple-benchmark
View on GitHub
Make running benchmark simple yet maintainable, again. Now only supports Korean-based cross-encoder.
☆35Dec 2, 2025Updated 7 months ago
realsigridjin / crisp-py
View on GitHub
The Python Implementation of CRISP: Clustering Multi-Vector Representations for Denoising and Pruning
☆27Jul 27, 2025Updated 11 months ago
Debrup-61 / RaDeR
View on GitHub
Official Code Repositiry for "RaDeR: Reasoning-aware Dense Retrieval Models" accepted at Main Conference EMNLP 2025
☆18Jun 23, 2025Updated last year
Marker-Inc-Korea / AutoRAG-example-korean-embedding-benchmark
View on GitHub
AutoRAG example about benchmarking Korean embeddings.
☆45Oct 2, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
sanderland / script_tok
View on GitHub
Code for the paper "BPE stays on SCRIPT", "Which Pieces Does Unigram Tokenization Really Need?" and MinGram
☆18Jun 26, 2026Updated 3 weeks ago
roipony / flash-maxsim
View on GitHub
☆27Jun 11, 2026Updated last month
JHU-CLSP / mmBERT
View on GitHub
A massively multilingual modern encoder language model
☆145Jan 20, 2026Updated 6 months ago
frinkleko / LIMIT-Sparse-Embedding
View on GitHub
Evaluate state-of-the-art sparse embedding models on the LIMIT dataset (`limit-small` and `limit`) from google's paper `On the Theoretica…
☆16Sep 4, 2025Updated 10 months ago
Zerohertz / Instruct_KR_2025_Summer_Meetup_vLLM
View on GitHub
🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹
☆23Aug 2, 2025Updated 11 months ago
enjalot / latent-sae
View on GitHub
Training code for Sparse Autoencoders on Embedding models
☆39Jul 11, 2026Updated last week
facebookresearch / mexma
View on GitHub
MEXMA: Token-level objectives improve sentence representations
☆43Jan 6, 2025Updated last year
facebookresearch / MetaEmbed
View on GitHub
[ICLR 2026 Oral] Official Implementation of the paper "MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interactio…
☆18Jul 2, 2026Updated 2 weeks ago
microsoft / multifield-adaptive-retrieval
View on GitHub
Code for the paper "Multi-Field Adaptive Retrieval," a research project on a semi-structured document retrieval
☆18Feb 13, 2026Updated 5 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
yjoonjang / rebuttal-skills
View on GitHub
Draft grounded rebuttals to your paper's reviews, with the experiments actually run in your workspace
☆16Updated this week
OnAnd0n / ko-embedding-leaderboard
View on GitHub
Korean-MTEB
☆94May 12, 2026Updated 2 months ago
uiuctml / MergeBench
View on GitHub
[NeurIPS 2025] MergeBench: A Benchmark for Merging Domain-Specialized LLMs
☆47Feb 11, 2026Updated 5 months ago
allenai / DrawEduMath
View on GitHub
Can VLMs understand students' hand-drawn math work?
☆19Jan 20, 2026Updated 6 months ago
hanxiao / embedding-compatibility-adapters
View on GitHub
Bridge incompatible embedding spaces with a single SVD. When your embedding provider deprecates a model, adapt instead of re-embedding.
☆36Apr 28, 2026Updated 2 months ago
kensho-technologies / pathpiece
View on GitHub
PathPiece tokenizer
☆14Nov 10, 2024Updated last year
JHU-CLSP / ettin-encoder-vs-decoder
View on GitHub
State-of-the-art paired encoder and decoder models (17M-1B params)
☆74Aug 6, 2025Updated 11 months ago
daekeun-ml / evaluate-llm-on-korean-dataset
View on GitHub
Performs benchmarking on two Korean datasets with minimal time and effort.
☆45Jan 22, 2026Updated 5 months ago
kyopark2014 / llm-agent
View on GitHub
It shows how to deploy and use an agent with LLM.
☆19Mar 1, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
flairNLP / familiarity
View on GitHub
Label shift estimation for transfer difficulty with Familiarity.
☆10Feb 4, 2025Updated last year
marepilc / pink-parquet
View on GitHub
User-friendly viewer for Parquet files
☆16May 8, 2026Updated 2 months ago
gangiswag / llm-reranker
View on GitHub
☆63Jan 26, 2025Updated last year
suhan1433 / LLM-as-a-judge-using-G-eval
View on GitHub
LLM-as-a-judge using G-eval Scratch
☆15Oct 12, 2025Updated 9 months ago
stephantul / pynife
View on GitHub
Nearly Inference Free Embeddings: make your RAG queries 500x faster
☆80Apr 27, 2026Updated 2 months ago
jina-ai / embedding-inversion-demo
View on GitHub
Embedding Inversion via Conditional Masked Diffusion: recover original text from embedding vectors using parallel denoising. Live demo + …
☆59Mar 7, 2026Updated 4 months ago
jina-ai / jzip-compressor
View on GitHub
Compression for unit-norm embedding vectors using spherical coordinates
☆83Jan 23, 2026Updated 5 months ago
jina-ai / embedding-fingerprints
View on GitHub
Identify which embedding model produced a vector using digit-level tokenization and a tiny transformer
☆21Mar 7, 2026Updated 4 months ago
stanford-futuredata / colbert-serve
View on GitHub
☆23May 30, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
cisnlp / multypo
View on GitHub
A Multilingual Keyboard Layout-Based Typo Generator
☆17Nov 23, 2025Updated 7 months ago
HanxiangQin / omni-col-press
View on GitHub
A modular framework for training and inference of (compressed) multi-vector retrieval across any modality.
☆22Apr 4, 2026Updated 3 months ago
whybe-choi / kovidore-benchmark
View on GitHub
[ACL'26 Workshop] KoViDoRe: Korean Visual Document Retrieval Benchmark
☆24Jul 2, 2026Updated 2 weeks ago
vincentamato / mlx-esm-2
View on GitHub
An MLX implementation of Meta AI's ESM-2 protein language model
☆16Aug 16, 2025Updated 11 months ago
ssisOneTeam / Korean-Embedding-Model-Performance-Benchmark-for-Retriever
View on GitHub
Korean Sentence Embedding Model Performance Benchmark for RAG
☆49Jan 27, 2025Updated last year
s-sahoo / scaling-dllms
View on GitHub
[ICML 2026] Scaling Beyond Masked Diffusion Language Models
☆31Jul 3, 2026Updated 2 weeks ago
J-Seo / KoCommonGEN-V2
View on GitHub
KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models
☆25Aug 24, 2024Updated last year