I use the HP software too, both for scanning images and for OCR. If I remember correctly, VelOCRaptor also worked for me while I was awaiting the new software from HP, but the HP Scan stuff is easier to use, IMHO. HP Easy scan is a specifically designed HP scan software MAC to be used on HP scanners and multifunctional printers. The main features of this software include multi-page document scanning, automatic image detection, PDF file output, and text recognition (OCR). Black-and-white scan speed measured at 300 dpi using included HP Smart Document Scan Software. Color scan speed measured at 300 dpi. Actual processing speeds may vary depending on scan resolution, network conditions, computer performance, and application software. Optical Character Recognition (OCR) software converts the letters in a graphic file to editable text (TST or RTF files). OCR software is required to use this feature. Open HP Scan software on the computer, and then look for a Save as Editable Text (OCR) shortcut.
|Click here to return to the '10.6: How to use OCR with HP multi-function printers' hint|
Hp Ocr Software Mac
Apart from the introduction dealing with HP dropping support for OCR, this has nothing to do with HP devices. The hint, as it relies on the Tesseract software, will work with any OS that supports the software.
Coumerelli, the folder-actions tricks should work with all OS X versions that support folder actions.. I'd imagine that includes 10.5.
The command-line stuff should work with all versions of OS X, can't see any reason it wouldn't.
I've also found a GUI interface to the Tesseract OCR script for 10.5 and later: http://download.dv8.ro/files/TesseractGUI/
Keep in mind that the basic Tesseract script takes uncompressed TIFF files only. So, whatever your scanner produces, you'll need to convert to uncompressed TIFF. The folder action trick does that when fed a .png.
There are ways to make Tesseract work with other formats if you really need to, and you can find those with a little googling and implement them with more command-line fussing. More trouble than it's worth, IMHO, given how easy it is to do uncompressed TIFF conversions under OS X.
One thing I've found is that the folder action for the OCR doesn't like to be fed multiple files all at once. It seems to prefer to have the first file converted and no other folder actions underway. This is no problem if your intent is to have it auto-OCR images as they come from the scanner (and any conversion process). But if you drag a whole bunch of TIFF files into the folder-action-enabled 'OCR me' folder, some of the files will be missed. This appears to possibly point to a bug in the folder-actions mechanism.
Although this was the case when Snow Leopard launched last year, HP quietly updated their printer software and supporting scanning/fax applications for many of their Officejet printers some months later to be compatible with Snow Leopard.
No OCR is yet available from HP for my top-of-the-line, 2006-vintage OfficeJet multifunction.
Maybe they're helping customers with other models, but not me.
I have no complaint about the printer or what functionality is available right now, but when they declined to support OCR after the dust settled following SL's release, they took away part of what I paid for.
Meanwhile, gotta say, it makes no sense to have to go to System Preferences, Print & Fax, and then hit a 'Scan' tab in order to access my scanner. A very un-Mac user experience.
This may depend on the specific printer type, but after the upgrade to 10.6.2 (I believe, it may also have been 10.6.1) the driver for my HP 1350 AIO was updated to allow the use of it's scanner through the standard SL imaging interface. This means I can now open Preview, choose File->Import from scanner->HP 1350
and it will let me scan from the application (and save as uncompressed tif right away).
This will work for all apps that support the imaging interface, which includes at least Preview and Image Capture.
In addition you can use the Capture Image Service in other apps to scan/import the image through Image Capture. This works well in Pages for instance.
I use the HP software too, both for scanning images and for OCR. Works fine. If I remember correctly, VelOCRaptor also worked for me while I was awaiting the new software from HP, but the HP Scan stuff is easier to use, IMHO.
I got the HP Scan.app software to work without any problems, including its IRIS OCR functionality. After the upgrade to Show Leopard, I simply tried to install the full featured HP software dated Sep 2009 and available on HP's website at http://tinyurl.com/dmjgvn. I was pleasantly surprised by how easy this app is to use and how well it performs. The PDFs that are generated are a bit larger than necessary, but I just post-process the documents with the quartz filter in Preview to reduce the file size.
I am using HP Scan.app v2.1.3 (7) on Mac OS X v10.6.2 on a MacBook Pro.
Hope this helps.
Y'know, if I had tried that, and it worked, I wouldn't have had to go down the path that led me to Tesseract. But I didn't try it because HP has all sorts of warnings not to do so.
Anybody have any idea why? They lost a ton of customer goodwill in this episode, and why do that if the old software works? Are there hidden consequences somewhere?
Yeah, I was confused as well since the HP apps are dated Sep of last year and even now after five months one gets the impression from the chats on the Web that no HP solution exists. But perhaps I just had dumb luck with these drivers, while they may not work for others?
Nevertheless, I much appreciate your effort and that you have shared your workaround with this community. Don't worry, it is just a matter of time until the next upgrade and HP software not working. Also, your hint is handy if one needs to do OCR on existing tiff files.
Again, thanks and keep the hints coming!
Apparently it would work sometimes, and not in other cases (models?).
Rather than fixing the problematic cases, HP chose to tell everyone to stop using the old software.
Before the driver update I tried every tip I could find, reinstalled the same HP software, it would install, it would start a scan and then report some obscure error that I couldn't find any info on.
Finally I just gave up, vowing to never buy an HP printer ever again. Then after a SL upgrade suddenly I could access the scanner from Preview (see my other comment of today).
I don't have an HP printer/scanner, but I have tried to install Google's tesseract unsuccessfully last year. I'll try these steps, and see if I get better results. Thanks.
If you run into trouble, please post all details. It worked well for me but, as you experienced, the instructions found on-line elsewhere are pretty terrible.
Hope the process I documented works for you. As an OCR engine, Tesseract really rocks.
Works like a champ. And in French and German, after I'd downloaded the dictionaries from Google Code. HP not required: I used it with TIFF output obtained from a Canon scanner using Image Capture. It's not entirely house-trained: give it a file name it does not like, and it crashes. Maybe I should be a good citizen and submit a patch..
Can't say for sure if it needs Snow Leopard, but a glance at the code suggests it should be pretty portable. (In particular, it doesn't use threads, which would make it faster on modern Macs. Not that it's a slouch anyway.)
As you can tell, I'm not a Terminal guy. Did not make it past the make command. What do I need to install to make this work?
iMac:tesseract-2.04 tcsdoc$ ./configure
checking build system type.. i686-apple-darwin10.2.0
checking host system type.. i686-apple-darwin10.2.0
checking for cl.exe.. no
checking for g++.. no
checking for C++ compiler default output file name..
configure: error: C++ compiler cannot create executables
See `config.log' for more details.
iMac:tesseract-2.04 tcsdoc$ make
-bash: make: command not found
They include the Gnu C compiler (gcc/g++) and gmake, which are both required to build software from source.
Thanks for pointing that out. I'd installed XCode long ago so had no idea this wasn't part of the standard OS X configuration.
Also available for Fink users: fink install tesseract
The Folder Action OCR step needs clarification, please. What is, on Snow Leopard, the exact procedure (from beginning to end) for using the script you've written/modified and shown in this textarea?
When replying, please keep in mind that I am speaking from the standpoint of a complete newbie. Please indicate the applications to open (for example: there is no such thing as a 'Folder Actions script editor' in my Utilities folder) and the steps necessary to get to the place where one can paste in the script you've provided.
You might be talking about context menu items, but that still isn't apparent to a complete newbie.
Oh, Folder Actions are wonnnnderful. Sorry to have been terse on the how-to aspect. (Have you seen my recipe for chicken pie? First you catch a chicken, then you bake it in a pie!)
Some resources and examples that will get you started:
..The second one notes, 'As of Snow Leopard (OS 10.6), Script Editor.app has been renamed AppleScript Editor.app and is located in your /Applications/Utilities/ folder.' So, depending on what version of OS X you're using, now you know what you want to use and where to look for it.
Open that app up. Paste in the code from the post here. Save it. Easiest to just save it in your Documents folder, then drag it to ~/Library/Scripts/Folder Action Scripts ..OS X will ask for authentication if needed. Give that, and voila, your new script is now available to be attached to any folder.
So let's do that. Make a folder somewhere handy. Right-click on it. (On a Mac laptop, press the keypad with two fingers and click.) In the menu that pops up, scroll all the way down to Folder Actions Setup. In the box that pops up, click on the name of the script you just created. Click the Attach button. Done.
Now anytime you drag a file into that folder, it'll get processed by that script. Gad, it's a wonderful feature of OS X. Go crazy with it, you'll love it.
I just want to point out a little more specifically.. TIFF files are generally saved with the tiff extension in OS X. If you use Preview for example to save your JPEG as an uncompressed TIFF for tesseract it'll make a file ending in .tiff which tesseract won't open, it wants .tif only.
Remember kids.. .jpg is a JPEG, and tif is a TIFF. Thank DOS for it's 3 character file extensions causing that. If you're going to automate converting JPEG to TIFF and passing it on to this script, be sure to enforce a single letter f in the extension.
to tsdoc and others who had problems getting it to build:
based on posts I found searching google, I reinstalled XCode and the install command worked.
(I don't know whether XCode has to be installed in the first place to make this work, but it looks like something about my XCode installation was messing with some of the commands or command paths.. and my installation had been imported from my previous Mac, maybe that's why.)
sjinsjca's script seems to be set up for making the adjustment, but it doesn't quite do it. Here's what you do to edit the script to change '.tiff' files to '.tif' before feeding they get fed to the tesseract shell script.
1. change the line: to
2. change the line: to 3. after the line: add this line:
The script should now successfully process files ending in '.tiff' as well as '.tif'.
Quick Folder Action Script Creation Steps
1. Copy the script text from the hint.
2. Open the application 'Applescript Editor' (In Application > Utilities)
3. Paste the script text into the script editing window.
4. Hit 'compile' and it will probably give you an error message because there are line breaks from your pasted text that shouldn't be there. In most cases you can just hit 'ok' and then hit the space bar to replace the highlighted linebreak with a space. Sometimes it requires manually fixing a linebreak-- in this script, 'giving up after 120' should not be on its own line, but should finish the line before it.
5. When you can hit 'compile' without an error message, consider making the edits I suggested.
6. Save the script in your User folder > Library > Scripts > Folder Action Scripts. If you don't have a 'Folder Action Scripts' folder, create one there.
7. Do a Spotlight search for 'Folder Actions Setup.app' and fire it up.
8. Select the folder (create it first in Finder if need be) you want to add a folder action script to. On the right-hand pane, hit the + sign and select the script you just saved from the available list.
9. Be sure 'Enable Folder Actions' is checked, and quit.
Thanks for the help in getting the make command to work. Installing Xcode did the trick. This leads to my next problem. Tesseract runs but I get the error message below:
iMac:SCRATCH tcsdoc$ tesseract scan.tif scan_text
Tesseract Open Source OCR Engine
read_tif_image:Error:Illegal image format:Compression
tesseract:Error:Read of file failed:Scan.tif
I have an Epson scanner and use Image Capture to scan the document. I've loaded the scan.tif file into Preview and saved it with no compression but still get the same error. Any ideas on this?
>read_tif_image:Error:Illegal image format:Compression
You need to either save as an uncompressed TIFF (open in Preview and Save As uncompressed TIF), or install libTIFF, then re-install tesseract (see my comment below).
Thanks for the tip. An 'out-of-the-box' limitation is support for multi-page TIFF's, however, if you install libTIFF (BEFORE installing tesseract), you not only will get support for multi-page TIFF's, but also support for compressed TIFF's
Get libTIFF 3.9.2 here: http://download.osgeo.org/libtiff/
libTIFF home page: http://www.remotesensing.org/libtiff/
note, this is mentioned in the FAQ: http://code.google.com/p/tesseract-ocr/wiki/FAQ
Does it support multi-page tiff files?
Only with 2.03 and later, and only if you have libtiff installed. See Compressed Tiff above.
Hp Scan Ocr Software Machine
Here's how I did an OCR scan in Snow Leopard using my HP 7210 all in one:
1st I updated the driver.
2nd I clicked on /Applications/Hewlett-Packard/HP Scan.app
3rd I choose Scan Documents
4th I hit the Save Icon at the top and choose format: TXT and make sure Contents were save to single file. . .
Works like a charm..It still uses Readiris software behind the scene
Console app for mac visual basic visual studio windows.
does someone knows what to do to get a HP 1312nfi scanning working under 10.6.x?
F.e I cant scan using the preview.app. I don't see my scanner in the print and fax pane, even after selecting the printer.