ispras / dedoc
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML pars…
☆209Updated last month
Alternatives and similar repositories for dedoc:
Users that are interested in dedoc are comparing it to the libraries listed below
- "Руформеры" - список популярных базовых моделей на основе трансформеров для решения задач по автоматической обработке русского языка☆36Updated last year
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundament…☆61Updated 4 months ago
- SAGE: Spelling correction, corruption and evaluation for multiple languages☆146Updated last month
- LangChain-compatible integrations with YandexGPT and YandexGPT Embeddings☆38Updated 3 months ago
- ☆66Updated 4 months ago
- The tiniest sentence encoder for Russian language☆205Updated 6 months ago
- RAG pipeline implementation example for the Russian language☆19Updated last year
- ☆43Updated 2 years ago
- Effective LLM Alignment Toolkit☆113Updated last week
- Сжатие и ускорение моделей машинного обучения☆18Updated last year
- Bunch of notebooks for pre-training custom Saiga-like LLM☆13Updated 11 months ago
- Handwritten Text Generation☆16Updated 2 years ago
- GigaChain telegram bot example for technical support☆29Updated last month
- Augmentex — a library for augmenting texts with errors☆61Updated 7 months ago
- Enterprise RAG Challenge to test accuracy of different LLM-driven assistants☆38Updated 3 weeks ago
- Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat☆59Updated last year
- Набор ноутбуков, в которых решаются различные задачи обработки естественного языка (NLP).☆16Updated 4 years ago
- Handwritten Kazakh and Russian (HKR) database for text recognition☆65Updated 3 years ago
- Код для файнтюна LM (rugpt, LLaMa, FRED T5) средствами transformers + deepspeed + LoRa☆15Updated last year
- Jupyter Notebooks and other files from my video tutorial series about GigaChat API☆48Updated 3 months ago
- ☆23Updated 2 months ago
- Telegram bot for different language models. Supports system prompts and images☆44Updated 2 months ago
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating SOTA mode…☆20Updated 2 months ago
- Modified Arena-Hard-Auto LLM evaluation toolkit with an emphasis on Russian language☆35Updated last month
- ☆130Updated last year
- Top ML papers of the week.☆24Updated this week
- Large silver standart Russian corpus with NER, morphology and syntax markup☆63Updated last year
- Russian Corpus of Linguistic Acceptability☆42Updated 4 months ago
- ML Course created for Bauman Moscow State Technical University☆60Updated 2 years ago