好吧,我想制作一个将文本转换为语音的 pdf 阅读器,我为 .txt 文件制作了这个,但我对如何将 pdf 文件转换为 txt 感到困惑。
有些pdf文件是扫描件怎么办?
最佳答案
要做到这一点,你必须使用一些东西来识别代码中的文本,根据维基百科:
Optical character recognition
Optical Character Recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned or photographed images of typewritten or printed text into machine-encoded/computer-readable text. It is widely used as a form of data entry from some sort of original paper data source, whether passport documents, invoices, bank statement, receipts, business card, mail, or any number of printed records. It is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data extraction and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
一些引用资料:
- 有一些可用的教程:http://kurup87.blogspot.nl/2012/03/android-ocr-tutorial-image-to-text.html
- 示例应用程序:https://github.com/rmtheis/android-ocr https://github.com/GautamGupta/Simple-Android-OCR
- API 的 http://ocrapiservice.com
- 图书馆 http://www.abbyy.com/mobileocr/android/
如果你不能选择选择什么,有很多关于这个的stackoverflow帖子,只需谷歌“android ocr stackoverflow”
关于android - 如何将 pdf 转换为 android 应用程序中的文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22794264/