The training codes of Jasper-Token-Compression-600M
☆19Nov 19, 2025Updated 6 months ago
Alternatives and similar repositories for Jasper-Token-Compression-Training
Users that are interested in Jasper-Token-Compression-Training are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Make running benchmark simple yet maintainable, again. Now only supports Korean-based cross-encoder.☆34Dec 2, 2025Updated 6 months ago
- The Python Implementation of CRISP: Clustering Multi-Vector Representations for Denoising and Pruning☆27Jul 27, 2025Updated 10 months ago
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆38Aug 4, 2025Updated 10 months ago
- AutoRAG example about benchmarking Korean embeddings.☆45Oct 2, 2024Updated last year
- PathPiece tokenizer☆14Nov 10, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Compression for unit-norm embedding vectors using spherical coordinates☆82Jan 23, 2026Updated 4 months ago
- It shows how to deploy and use an agent with LLM.☆20Mar 1, 2025Updated last year
- ☆65Feb 6, 2026Updated 4 months ago
- Code snippets and reproductions from JustAByte☆48Apr 6, 2026Updated 2 months ago
- MEXMA: Token-level objectives improve sentence representations☆43Jan 6, 2025Updated last year
- Training code for Sparse Autoencoders on Embedding models☆39May 9, 2026Updated 3 weeks ago
- ☆15Jan 6, 2025Updated last year
- [EMNLP 2025] The official implementation of "Zero-shot Multimodal Document Retrieval via Cross-Modal Question Generation"☆15Aug 26, 2025Updated 9 months ago
- Performs benchmarking on two Korean datasets with minimal time and effort.☆45Jan 22, 2026Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Can VLMs understand students' hand-drawn math work?☆19Jan 20, 2026Updated 4 months ago
- Kor-IR: Korean Information Retrieval Benchmark☆87Jul 3, 2024Updated last year
- [ICML 2026] Transform Trained Transformer for Accelerating Native 4K Video Generation☆41Dec 16, 2025Updated 5 months ago
- Internal utility libraries for Pkl☆16Updated this week
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆27Nov 25, 2024Updated last year
- A framework aiming to bridge fast robot prototyping, predefined motion primitives, heterogeneous teleoperation, data collection, and flex…☆27Apr 4, 2026Updated 2 months ago
- ☆34Feb 27, 2024Updated 2 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆28Nov 30, 2024Updated last year
- 🚀🤗 A collection of templates for Hugging Face Spaces☆34Oct 9, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹☆23Aug 2, 2025Updated 10 months ago
- ☆12Jun 12, 2024Updated last year
- Multilingual and Multiculture Benchmark and LLM☆40May 18, 2026Updated 3 weeks ago
- A framework for benchmarking embedding models in hybrid search scenarios (BM25 + vector search) using Weaviate.☆40May 20, 2026Updated 2 weeks ago
- ☆13Mar 5, 2025Updated last year
- ☆63Jan 26, 2025Updated last year
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Oct 3, 2025Updated 8 months ago
- An extensive and commented list of resources on Learned Sparse Retrieval.☆61Apr 27, 2026Updated last month
- Generate fixed dimensional embeddings for multi-dimensional vectors in python based on Muvera from Google.☆20Jun 28, 2025Updated 11 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 대학생을 위한 IT 스펙 저장소 PRE:FOLIO 클라이언트☆10Jul 19, 2023Updated 2 years ago
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 7 months ago
- ☆42Apr 21, 2026Updated last month
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆28Oct 3, 2021Updated 4 years ago
- Tree-based speculative decoding for Apple Silicon (MLX). ~10-15% faster than DFlash on code, ~1.5x over autoregressive. First MLX port wi…☆138Apr 15, 2026Updated last month
- [WWW24-UrbanCLIP] A comprehensive toolkit designed to facilitate the collection, processing, and integration of satellite imagery and ass…☆18Oct 6, 2024Updated last year
- Korean Sentence Embedding Model Performance Benchmark for RAG☆50Jan 27, 2025Updated last year