ispras / dedoc
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML pars…
☆236Updated 3 weeks ago
Alternatives and similar repositories for dedoc
Users that are interested in dedoc are comparing it to the libraries listed below
Sorting:
- The tiniest sentence encoder for Russian language☆224Updated 9 months ago
- "Руформеры" - список п опулярных базовых моделей на основе трансформеров для решения задач по автоматической обработке русского языка☆36Updated last year
- LangChain-compatible integrations with YandexGPT and YandexGPT Embeddings☆42Updated 2 weeks ago
- SAGE: Spelling correction, corruption and evaluation for multiple languages☆151Updated 4 months ago
- ☆88Updated 7 months ago
- GigaChain telegram bot example for technical support☆30Updated 4 months ago
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundament…☆62Updated 7 months ago
- Курс "Разработка AI/LLM-приложений на Python: от идеи до релиза" предоставляет слушателям возможность пройти через полный цикл создания L…☆18Updated 2 months ago
- Augmentex — a library for augmenting texts with errors☆63Updated 10 months ago
- Language modeling and instruction tuning for Russian☆468Updated 8 months ago
- ☆26Updated last week
- ☆27Updated last month
- Modified Arena-Hard-Auto LLM evaluation toolkit with an emphasis on Russian language☆42Updated last month
- RAG pipeline implementation example for the Russian language☆22Updated last year
- В этом репозитории содержатся примеры реализации вопрос-ответного бота по документации на базе YandexGPT и других сервисов Yandex Cloud☆33Updated last year
- Библиотека для извлечения статистик из текстов на русском языке.☆120Updated 2 years ago
- Конспекты лекций магистратуры "Науки о данных" МФТИ☆22Updated 5 months ago
- Effective LLM Alignment Toolkit☆128Updated last month
- Overview of pipelines related to PDF to Markdown document processing.☆61Updated last month
- ☆82Updated last year
- ☆58Updated 3 months ago
- Telegram bot for different language models. Supports system prompts and images☆54Updated last week
- Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.☆381Updated 2 weeks ago
- Russian Corpus of Linguistic Acceptability☆43Updated 7 months ago
- ⚡ Набор решений для разработки LLM-приложений на русском языке с поддержкой GigaChat ⚡☆396Updated last week
- ML Course created for Bauman Moscow State Technical University☆61Updated 2 years ago
- A Python wrapper for the RuWordNet thesaurus☆63Updated 5 months ago
- Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat☆61Updated last year
- Rule-based facts extraction for Russian language☆323Updated last year
- Deep Learning based NLP modeling for Russian language☆234Updated last year