bethelmelesse / UnifiedCrawlLinks
☆13Updated 7 months ago
Alternatives and similar repositories for UnifiedCrawl
Users that are interested in UnifiedCrawl are comparing it to the libraries listed below
Sorting:
- Small python package to measure OCR quality and other related metrics.☆23Updated last year
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆80Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆66Updated 7 months ago
- Python library to use Pleias-RAG models☆57Updated last month
- Using modal.com to process FineWeb-edu data☆20Updated 2 months ago
- ☆47Updated 4 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆59Updated last month
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- utilities for loading and running text embeddings with onnx☆44Updated 10 months ago
- A tool to assist in the interpretation of learned features in sparse autoencoders (in particular the four SAE's trained by Joseph Bloom o…☆19Updated 8 months ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated 8 months ago
- ☆61Updated last week
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆54Updated 4 months ago
- Tokun to can tokens☆17Updated last week
- The first dense retrieval model that can be prompted like an LM☆74Updated last month
- XmodelLM☆39Updated 7 months ago
- ☆67Updated last year
- Code and data releases for the paper -- DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory☆44Updated 4 months ago
- Analysis on the cost of encoder based models☆11Updated 4 months ago
- ☆48Updated 5 months ago
- Simple GRPO scripts and configurations.☆59Updated 4 months ago
- Pre-train Static Word Embeddings☆80Updated 3 weeks ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆25Updated 7 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated 7 months ago
- ☆30Updated 11 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆20Updated 6 months ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆13Updated 10 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆80Updated last month
- ☆55Updated this week
- Training code for Sparse Autoencoders on Embedding models☆38Updated 4 months ago