Building a tool to OCR and translate scanned PDFs without losing the formatting Thread poster: Kyle Corbitt
| Kyle Corbitt United States Local time: 00:19 Spanish to English + ...
Hi everyone, I've started building a system that combines OCR and MT to quickly produce a draft translation of scanned images and PDFs. It keeps all the formatting of the original document and just adds editable text boxes on top, which saves a ton of time on prep/formatting. It's particularly useful for simple forms like birth certificates (it doesn't work well yet for documents with longer prose... See more Hi everyone, I've started building a system that combines OCR and MT to quickly produce a draft translation of scanned images and PDFs. It keeps all the formatting of the original document and just adds editable text boxes on top, which saves a ton of time on prep/formatting. It's particularly useful for simple forms like birth certificates (it doesn't work well yet for documents with longer prose). The URL is https://translato.ai My wife and I actually built this because we wanted a tool like this for ourselves but couldn't find one. We had to manually translate her birth certificate and other documentation when we moved to the US, and I was surprised that there was no way to do that conveniently. I initially planned for the tool to be used by individuals, but I've actually shown it to a few professional translators and they mentioned that there wasn't a good tool for translating scanned documents for professionals either, so I decided to share it here as well. I'd really appreciate any feedback on whether this helps your workflow. The service is totally free.
[Edited at 2023-05-30 18:10 GMT]
[Edited at 2023-05-30 21:21 GMT] ▲ Collapse | | | Stepan Konev Russian Federation Local time: 10:19 English to Russian PDF output format | May 31, 2023 |
I have tested your tool. Thank you for your effort and work. However I doubt if I can find a use for it. I put in a non-editable jpg file and I get a non-editable pdf file again. Any machine translation requires post-editing. That means I have to ocr the output pdf to make it editable. | | | Samuel Murray Netherlands Local time: 09:19 Member (2006) English to Afrikaans + ...
I have tested your tool and it works well for its intended purpose. It takes a bit of experimentation to learn all of its features, as the way some users might expect it to work is not how it works. E.g. some might expect to be able to download an editable file. When I tested it, I selected English to English as the language combination, so as to not have a machine translation inserted into the segments. | | | Kyle Corbitt United States Local time: 00:19 Spanish to English + ... TOPIC STARTER
Stepan Konev wrote: I have tested your tool. Thank you for your effort and work. However I doubt if I can find a use for it. I put in a non-editable jpg file and I get a non-editable pdf file again. Any machine translation requires post-editing. That means I have to ocr the output pdf to make it editable. Hi Stepan, thanks so much for your feedback! The intention is to do all editing in the tool itself. When your file has been imported, you'll see an interface where all the text is editable. You can then click on any of the text boxes and move them, resize them, etc. Once you're satisfied with that you can then export the final version as a PDF. That said, I understand you may have a workflow where it's more convenient to export in an editable format and do further post-processing that way. Is there a particular export format that would be most convenient and useful for you? | |
|
|
Kyle Corbitt United States Local time: 00:19 Spanish to English + ... TOPIC STARTER English to English | May 31, 2023 |
Samuel Murray wrote: I have tested your tool and it works well for its intended purpose. It takes a bit of experimentation to learn all of its features, as the way some users might expect it to work is not how it works. E.g. some might expect to be able to download an editable file. When I tested it, I selected English to English as the language combination, so as to not have a machine translation inserted into the segments. Hi Samuel, thanks so much for your feedback! I assume you selected English to English because you intend to use it mostly for the OCR capabilities, not for the MT pass? What type of document did you use for your test, and did it have any trouble identifying the text and making it editable? | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Building a tool to OCR and translate scanned PDFs without losing the formatting CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |