The History of VelOCRaptor

In February 2009 I bought a scanner for my Mac. I dutifully installed the software that came with it and played around with the OCR (Abbyy FineReader seeing as you ask;-). It was a complicated beast, but buried inside the many options was a cute feature – it would write PDF’s of your scans, where the original image was preserved with the recognized text overlaid transparently on top. So I could read the original, but select the recognized text in Preview, and vitally, Spotlight could find the scan by searching for words in the text.

This looked very useful, so I went to upgrade the free version of FineReader to the latest, only to find that whilst the Windows version of FineReader was 9, the Mac version was 5. It was the same situation with OmniPage and Readiris. All dedicated Mac OCR programmes are expensive and complicated, and have not been updated for years - leaving them looking poor copies of old Windows programs.

It looked like the Mac product idea I had been searching for had just dropped into my lap – a simple OCR app that would write indexable PDFs. I found the open-source OCRopus OCR engine, a project funded by Google to let them index the millions of pages of image-only PDFs on the web, and enlisted my friend and proper Mac developer, Simon Taylor, to get it compiling on the Mac. I took the output from the engine and set about learning how to write PDF files. In 2 weeks we had a working prototype, and a product name, VelOCRaptor. I announced the project on MacInTouch in March and invited people to download our work in progress and give feedback.

Four months after it’s conception, we have shipped VelOCRaptor 1.0. I am really happy with the product’s simple workflow – drag a scanned image in most popular formats onto the big drop target and it is converted to a PDF that you can save with one click. We even feature No Click OCR (TM;-) allowing easy integration with scanner programs and Mac OS Image Capture.

To be frank I am less happy with the accuracy. On good quality documents the output is acceptable; with poor quality scans and odd fonts it can still struggle. For many uses, in particular locating scanned documents and copying small amounts of text, the accuracy is good enough. For verbatim transcriptions of complicated documents, we will improve as the OCRopus engine improves. In the meantime, for occasional use, for integration with other applications, or if you only have $29, we are the only game in town!