Index of URLs to pdf files all over the internet and scripts
☆25May 2, 2023Updated 2 years ago
Alternatives and similar repositories for CCpdf
Users that are interested in CCpdf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.☆41Dec 7, 2023Updated 2 years ago
- JSON Schema format for storing datasets details, documents processed contents, and documents annotations in the document understanding do…☆14Nov 5, 2024Updated last year
- Training data for the NLPContributionGraph Shared Task 11 at SemEval-2021☆14Jan 11, 2021Updated 5 years ago
- Web archiving utility library☆11Mar 11, 2026Updated last month
- ☆18Jul 7, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Tensorflow implementation of the paper "Fast Compressive Sensing Using Generative Model with Structed Latent Variables"☆10Apr 7, 2020Updated 6 years ago
- The code related to the baselines from NeurIPS 2021 paper "DUE: End-to-End Document Understanding Benchmark."☆36Mar 2, 2023Updated 3 years ago
- We enable LLM with personalization capability☆11Nov 16, 2023Updated 2 years ago
- Benchmark dataset for the evaluation of scientific article representations on the task of citation recommendation across various scientif…☆12Oct 21, 2022Updated 3 years ago
- multimodal document analysis☆165Feb 28, 2026Updated last month
- Original VinVL visual backbone with simplified APIs to easily extract features, boxes, object detections, in a few lines of Python code.☆12Nov 27, 2022Updated 3 years ago
- I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived f…☆16Apr 22, 2021Updated 4 years ago
- utilities for loading and running text embeddings with onnx☆45Aug 16, 2025Updated 7 months ago
- Structured Multi-task Learning for Molecular Property Prediction, AISTATS'22 (https://proceedings.mlr.press/v151/liu22e.html)☆14Jul 6, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Curated list of awesome datasets for various table understanding tasks☆18Sep 5, 2025Updated 7 months ago
- Record animations on HTML5 canvas☆14Apr 16, 2024Updated last year
- ☆17Dec 11, 2023Updated 2 years ago
- Code for the paper "Modeling Information Change in Science Communication with Semantically Matched Paraphrases" from EMNLP 2022☆13Oct 20, 2022Updated 3 years ago
- ☆18Mar 27, 2020Updated 6 years ago
- pytorch crnn with centerloss to solve the near word problem☆16Jan 27, 2022Updated 4 years ago
- Collecting good beginner tasks and project ideas.☆16Apr 23, 2018Updated 7 years ago
- [Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"☆17Dec 1, 2023Updated 2 years ago
- GC4LM: A Colossal (Biased) language model for German☆13May 2, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A simple wrapper for lmdb. Support dict-like operations.☆23Apr 20, 2023Updated 2 years ago
- DSIR large-scale data selection framework for language model training☆272Apr 7, 2024Updated 2 years ago
- IPAdic packaged for easy use from Python.☆24Oct 31, 2021Updated 4 years ago
- Python toolbox to load, parse and process Official Journals of the European Union (EU).☆22May 3, 2024Updated last year
- ☆10Apr 4, 2023Updated 3 years ago
- The pipeline for the OSCAR corpus☆176Nov 9, 2025Updated 5 months ago
- Code for the paper ``Text2Math: End-to-end Parsing Text into Math Expressions" accepted by EMNLP 2019☆16Aug 20, 2019Updated 6 years ago
- Program Translator AI built on Pytorch☆15Dec 19, 2019Updated 6 years ago
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Sep 8, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Search the biomedical literature for protein interactions and protein associations☆11Nov 24, 2023Updated 2 years ago
- A template primarily for PhD theses but also suitable for Bachelor's or Master's theses☆11Nov 10, 2021Updated 4 years ago
- Conversational Recommender System Evaluation via Simulation☆19Apr 7, 2026Updated last week
- Code for "Learning Unitary Operators with Help From u(n)", AAAI-17. (https://arxiv.org/abs/1607.04903)☆17Jan 10, 2017Updated 9 years ago
- Shared repo for EM connectomics and Array Tomography render based image processing modules☆17Mar 2, 2026Updated last month
- Zählt Menschen im Videobild☆12May 21, 2015Updated 10 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Jul 17, 2024Updated last year