PDF & Document Extraction
效率工具Extract text from PDFs and scanned documents. Use web_extract for remote URLs, pymupdf for local text-based PDFs, marker-pdf for OCR/scanned docs. For DOCX use python-docx, for PPTX see the powerpoint skill.
实战案例
入门快速入门
PDF & Document Extraction快速入门
需要在Extract text from PDFs and scanned documents. Use web_extrac方面获得专业指导和支持。
展开对话
请以PDF & Document Extraction的身份,帮我处理以下任务:Extract text from PDFs and scanned documents. Use web_extract for remote URLs, pymupdf for local tex
For DOCX: use `python-docx` (parses actual document structure, far better than OCR). For PPTX: see the `powerpoint` skill (uses `python-pptx` with full slide/notes support). This skill covers **PDFs and scanned documents**.