Praise and Suggestions
Posted: 24 September 2013 12:28 AM   [ Ignore ]
Newbie
Rank
Total Posts:  10
Joined  2013-09-22

Praise:

• I downloaded PDF Nomad specifically to try its OCR features, and so far I’m rather impressed!

• My 1,300-page document took several hours to process, but before I was able to save it, something else caused my Mac to freeze, requiring a restart. I expected to have lost the OCR processing, but to my surprise, even after restarting, the OCR data was still there! Nicely done!

Questions/Suggestions:

• Can PDFN repaginate (renumber) PDFs to match visible page numbers?

• Once OCR processing is complete, is that data saved with the PDF, such that it’s searchable by any PDF viewer? Or is PDF Nomad required?

(I’m aware I can test this, but I’m currently viewing an unsaved, 1,300-page document in Demo Mode. Saving it will watermark the pages (which is totally fair), but I’m using the document for work at this very moment, and if the watermarks were to render it difficult to read, I’d have to wait 8 hours for the OCR to process it again. I’ll probably purchase PDFN, but while I’m evaluating it, I’m curious about how OCR data is handled.)

• The search function needs case-sensitivity. When a given word appears hundreds of times in one document, narrowing by case-sensitivity is a must. (In my current document, section headings are in all caps. So, being able to search for “STORY” [5 results] vs. “story” [over 300 results] is critical.)

• Why is OCR processing modal? In other words, why must the entire interface lock up, while processing? This seems a bit archaic. Why can’t the processing be done in the background? I can understand preventing the user from modifying the document during processing, but we should be able to at least navigate and view the document (without modifying it).

• It would be nice to have the search bar separate from the “Page List” and “Thumbnail” views, because after performing a search, there appears to be no way to view search results by thumbnail. (Note how Preview.app handles this: Search results display both text and thumbnails.) PDFN’s current, list view is great for some searches; but for others, it would be better to view search results as thumbnails.

• Finally, when selecting the brightness and contrast settings for OCR processing, it was unclear whether the settings are saved on a per-page basis or globally. It would be nice if we could first (a) specify default settings for the entire document, and then also (b) select individual pages for processing differently because they have different requirements.

For example, my PDF’s cover (of a scanned book) is dark red with black text. The default settings are fine for the black-and-white pages, but not for the cover. Of course I don’t need the front cover to be searchable, per se,  but other PDFs could have internal pages that require settings other than the default. So, I wish we could select the “unusual” pages, change the settings, then click a checkbox for “Apply to this page only.” Then, for similar pages, it would be nice to have a “Use last settings” checkbox.

Thanks.  grin

Profile
 
 
Posted: 25 September 2013 11:04 AM   [ Ignore ]   [ # 1 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  475
Joined  2007-03-23

• My 1,300-page document took several hours to process, but before I was able to save it, something else caused my Mac to freeze, requiring a restart. I expected to have lost the OCR processing, but to my surprise, even after restarting, the OCR data was still there! Nicely done!

Ha. I’m glad that worked out. I’m sure it’s more to do with OS X than with PDF Nomad though. grin

• Can PDFN repaginate (renumberPDFs to match visible page numbers

It can and it can’t. Within PDF Nomad it can renumber the page labels and do so efficiently. Due to what I consider a Mac OS X PDF library implementation bug though, when the PDF is saved, the numbering is lost. This has always been so in OS X, including in Preview, with PDF editors that use the OS X system software to handle PDFs; and Apple have shown themselves unwilling to fix this. The feature in PDF Nomad is therefore mainly useful to relabel pages visually before printing, but not to adjust the PDF document such that it shows the altered page numbers in PDF readers. I would urge you to request Apple directly to fix this. The they learn that this is an issue for users (and developers) the more likely they are to come round to fix this.

• Once OCR processing is complete, is that data saved with the PDF, such that it’s searchable by any PDF viewer? Or is PDF Nomad required?

Yes. Once you save the PDF, after having completed the OCR process, it becomes a searchable PDF regardless of viewer. Note that PDF Nomad offers various options when performing OCR, but creating a searchable PDF with the original scanned pages intact is the default option.

• The search function needs case-sensitivity. When a given word appears hundreds of times in one document, narrowing by case-sensitivity is a must. (In my current document, section headings are in all caps. So, being able to search for “STORY” [5 results] vs. “story” [over 300 results] is critical.)

Thanks for the suggestion. I’ll look into the feasibility of implementing this.

• Why is OCR processing modal? In other words, why must the entire interface lock up, while processing? This seems a bit archaic. Why can’t the processing be done in the background? I can understand preventing the user from modifying the document during processing, but we should be able to at least navigate and view the document (without modifying it).

OCR processing is document modal, not application modal. So, indeed, you are prevented to handle the document during the process. Doing what you request may be feasible, but is needs time and care in its implementation. Currently I’m more interested in speeding up the OCR. Note also that there is a great speed penalty by activating multiple languages for recognition. If the documents you scan are in a single language, then make sure to turn off any other languages you may have had active.

• Finally, when selecting the brightness and contrast settings for OCR processing, it was unclear whether the settings are saved on a per-page basis or globally. It would be nice if we could first (a) specify default settings for the entire document, and then also (b) select individual pages for processing differently because they have different requirements.

You can apply the settings to selected pages only, if you wish. You can, in your case, choose to, say, first select the cover, and apply settings appropriate to it; then after OCRing the cover, select the rest of the pages and OCR with adjusted settings for those.

 Signature 

António Nunes
SintraWorks

Profile
 
 
Posted: 25 September 2013 10:55 PM   [ Ignore ]   [ # 2 ]
Newbie
Rank
Total Posts:  10
Joined  2013-09-22

Fantastic replies and great information. Thanks!

Profile
 
 
Posted: 30 July 2014 01:47 AM   [ Ignore ]   [ # 3 ]
Newbie
Rank
Total Posts:  8
Joined  2012-07-31

Hi Antonio, it’s been a while since I had any suggestions for PDF Nomad because it fits my workflow so perfectly. One feature request I would like is a print selectable area option. Right now to print an area of a large pdf drawing, I have to crop it, print it, then not save it. It would be really REALLY great to have a print selection option.

Profile
 
 
   
 
 
‹‹ Creating books      OCR Questions ››