Legal documents are archived by scanning. Scanned documents just contain an image for each page. Therefor it is impossible to search for specific text.
That means you have to spend lot of time groping in the dark trying to find a small paragraph in long documents.
It is possible to make searchable PDFs using Adobe Acrobat as well as many other third party applications. However, most documents are usually scanned as just standard images.
Don’t worry. There is an solution available. In fact you may have had the solution on your PC for upto 10 years.. but you may not have known it.
The solution is OneNote OCR – a part of Microsoft Office since 2007.
How to check if you have OneNote?
Press Windows key with R key. Run dialog will open.
Type onenote and press Enter.
It it is installed it will open.
By the way, this method can be used to run other Office tools and other Windows applications as well. Here are the exact phrases winword, excel, powerpnt, outlook, mspub, paint, …
Using OneNote to search a scanned document
Open the PDF file containing the scan
Choose File – Print – Send to OneNote (The version number ).
Choose the desired notebook. For testing purpose just choose the current notebook.
It will print the pages to a new page in OneNote. Each page from the scanned document will be one image in the OneNote page.
Now press Ctrl F to go to Find on Page
Type search text and see what happens…
OneNote highlights the words which start with the characters you are searching for. This is very helpful in finding various word forms. The up and down arrows allow you to navigate instances of search results quickly. Of course, you can also scroll manually and check visually.
Want the text?
Of course you will want it sooner or later. So don’t worry. That is also available.
Right click on any page image and choose if you want it from that page or all pages (thoughtful feature… is it not? That is called User Focus).
Now you can paste it anywhere.
Of course the recognition depends upon scan quality.
- The page must be in vertical layout
- Slanted text is difficult to recognize so rotate the image as required
- OCR is available in multiple languages, depending upon which Office language packs are installed.
- In case of multi lingual text, you can choose the OCR language by right clicking on the image
- Handwriting can also be recognized if it is clear and legible. Like text, handwriting also needs to be horizontal. Tilted handwriting is more difficult to recognize