☆198Dec 7, 2025Updated 2 months ago
Alternatives and similar repositories for POINTS-Reader
Users that are interested in POINTS-Reader are comparing it to the libraries listed below
Sorting:
- 工业级中文语音识别系统电子书☆13Oct 30, 2020Updated 5 years ago
- Compression for Foundation Models☆35Jul 21, 2025Updated 7 months ago
- [arXiv 2025] ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models☆36Aug 26, 2025Updated 6 months ago
- Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…☆29Jan 18, 2026Updated last month
- ☆18Sep 25, 2025Updated 5 months ago
- Fine-tuned LLMs generate accurate 3D human avatars from textual descriptions using the SMPL-X model, enhancing customization and simulati…☆37Feb 5, 2025Updated last year
- ☆883Feb 13, 2026Updated 2 weeks ago
- (NeurIPS 2025 🔥) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"☆41Feb 11, 2026Updated 3 weeks ago
- Official repository for the 1st DAFx Parameter Estimation Challenge☆35Feb 16, 2026Updated 2 weeks ago
- Benchmark Large Language Models Reliably On Your Data☆18Dec 27, 2025Updated 2 months ago
- [CVPR 2026] Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆84Feb 13, 2026Updated 2 weeks ago
- ☆77May 4, 2025Updated 10 months ago
- A Python package for interacting with the MinerU Vision-Language Model.☆106Feb 5, 2026Updated last month
- pytorch crnn with centerloss to solve the near word problem☆16Jan 27, 2022Updated 4 years ago
- Code for "StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model", AAAI2026 Oral☆45Jan 16, 2026Updated last month
- [EMNLP 2025] Code for paper "Table-R1: Inference-Time Scaling for Table Reasoning"☆29Jun 3, 2025Updated 9 months ago
- Readability-aware automatic lyrics transcription (ALT) evaluation toolkit☆43Aug 29, 2024Updated last year
- ☆29Dec 22, 2025Updated 2 months ago
- ☆20May 7, 2025Updated 9 months ago
- LLM-Powered Semi-Structured Table Question Answering☆294Jan 30, 2026Updated last month
- Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation☆432Nov 27, 2025Updated 3 months ago
- Official Repository for "Efficient Vocal Source Separation Through Windowed RoFormer"☆43Oct 30, 2025Updated 4 months ago
- ☆20Mar 3, 2025Updated last year
- The code in "SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design"☆42Oct 20, 2025Updated 4 months ago
- 使用opencv部署读光-票证检测矫正模型,包含C++和Python两个版本的程序, 只依赖opencv库就能运行☆26Dec 23, 2024Updated last year
- Implementation for Enhancing Tampered Text Detection Through Frequency Feature Fusion and Decomposition☆28Feb 26, 2025Updated last year
- ☆190Feb 5, 2026Updated last month
- Analysis of Chinese and English layouts 中英文版面分析☆267Feb 25, 2026Updated last week
- ☆19Jan 3, 2025Updated last year
- This repository is a collection of legal instruction datasets☆26Jul 12, 2024Updated last year
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆57Mar 6, 2025Updated 11 months ago
- [AAAI 2026] SlideTailor: Personalized Presentation Slide Generation for Scientific Papers☆43Jan 1, 2026Updated 2 months ago
- GenExam: A Multidisciplinary Text-to-Image Exam☆56Feb 26, 2026Updated last week
- Your personal ArXiv Feed☆23Dec 18, 2024Updated last year
- DELT: Data Efficacy for Language Model Training☆43Feb 12, 2026Updated 3 weeks ago
- Use OpenCV API to run ONNX model by ONNXRuntime.☆24Jan 26, 2026Updated last month
- Audio Demo for "FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation"☆21Apr 7, 2021Updated 4 years ago
- Various algorithms for voice activity detection☆22Jan 31, 2017Updated 9 years ago
- DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement, Conference on Neural Information Processing Systems (NeurIPS), 2023☆59May 16, 2025Updated 9 months ago