(Before I list my main questions, is there a PDF manual for PDF Nomad? The built-in OSX help mechanism is painfully slow, and some topics [such as “Auto Deskew” and “Resolution for searchable pages”] aren’t covered.)
I’ve OCRed a scanned book from the 60s, and I’m encountering various issues while attempting to edit the OCRed text. While the onboard help mentions that text can be edited, it offers no details on the procedures for doing so. The PDF is of decent, but not stellar, quality. So I do believe the software did a decent job, given the source material. (And let me add that the book I OCRed a few weeks ago was processed essentially flawlessly. So, I’m pleased with the technology in general.) However, I’m experiencing some hiccups with the editing process.
1. I figured out that I can select words in the body of the page and correct them. I also figured out that in the area to the right, words that are unrecognized are underlined in red. That’s a nice touch. However, it’s sometimes difficult to manually scan and find the words on the page that correspond to the underlined words at the right. It would be nice to be able to click on an underlined word at the right and have the corresponding word automatically highlight on the page, ready for correction.
2. Due to the way this book was type-set, and due to the scan quality, PDF Nomad had trouble with the spacing of a number of words. For example, PDF Nomad interprets the word “THIRTEENTHS” precisely like this:
TH I RTEENTHS
So, PDFN thinks they’re three, separate words. As such, how does one join the segments into one word?
UPDATE: While typing this post, I was experimenting with PDFN, and I discovered you can select and delete segments, and you can elongate segments. So, I deleted the 2nd and 3rd segments above, then elongated the 1st one, so I could type the full word. But this is cumbersome. I potentially have hundreds of such corrections to make—and highlighting the smaller segments is tricky. When approaching the edge of a segment, the cursor changes to the “extend segment” cursor. As a result, when a segment is the length of one letter, it’s nearly impossible to select it.
Suggestion: To join segments, we should be able to drag a selection box around them (which we can already do), then issue a command to join them (which I don’t believe we can already do).
3. It would also be nice to be able to adjust PDFN’s threshold of spacing in a document like this. If we had a slider to tell it “A real space is at least this wide, and anything smaller than this is not a space,” that would fix this problem. If we could do this after the document’s been scanned, we could issue that command and have PDFN reprocess the current data with much more accurate results. (I think I’ll reprocess this document using the “Background Level” setting to make the letters thicker. Perhaps that will help.)
(By the way, the manual states: “Lowering the background level often has the effect of making the text heavier and fuller [and vice versa].” However, I found the results to be the opposite: A higher level makes text heavier, and a lower level makes text lighter.)
4. When PDFN completely misses a word, how does one select that word, then enter the missing text? In my current document, I’ll need to do quite a bit of this, but haven’t yet found a way to do so.
5. Finally, after editing text, the enter key doesn’t confirm the window. So, one has to manually mouse up to click the button each time, which is time-consuming. Please make the enter key confirm the window.