fbpx

How to search scanned documents with OneNote OCR

The Problem

Legal documents are archived by scanning. Scanned documents just contain an image for each page. Therefor it is impossible to search for specific text.

That means you have to spend lot of time groping in the dark trying to find a small paragraph in long documents.

It is possible to make searchable PDFs using Adobe Acrobat as well as many other third party applications. However, most documents are usually scanned as just standard images.

Don’t worry. There is an solution available. In fact you may have had the solution on your PC for upto 10 years.. but you may not have known it.

The solution is OneNote OCR – a part of Microsoft Office since 2007.

How to check if you have OneNote?

Press Windows key with R key. Run dialog will open.

Type onenote and press Enter.

It it is installed it will open.

By the way, this method can be used to run other Office tools and other Windows applications as well. Here are the exact phrases winword, excel, powerpnt, outlook, mspub, paint, …

Using OneNote to search a scanned document

Open the PDF file containing the scan

image

Choose File – Print – Send to OneNote (The version number ).
Choose the desired notebook. For testing purpose just choose the current notebook.

It will print the pages to a new page in OneNote. Each page from the scanned document will be one image in the OneNote page.

Now press Ctrl F to go to Find on Page

search for text inside scanned documents with OneNote OCR

Type search text and see what happens…

OneNote highlights the words which start with the characters you are searching for. This is very helpful in finding various word forms. The up and down arrows allow you to navigate instances of search results quickly. Of course, you can also scroll manually and check visually.

Want the text?

Of course you will want it sooner or later. So don’t worry. That is also available.

Right click on any page image and choose if you want it from that page or all pages (thoughtful feature… is it not? That is called User Focus).

Now you can paste it anywhere.

Of course the recognition depends upon scan quality.

Practical considerations

  • The page must be in vertical layout
  • Slanted text is difficult to recognize so rotate the image as required
  • OCR is available in multiple languages, depending upon which Office language packs are installed.
  • In case of multi lingual text, you can choose the OCR language by right clicking on the image
  • Handwriting can also be recognized if it is clear and legible. Like text, handwriting also needs to be horizontal. Tilted handwriting is more difficult to recognize

2 Responses

  1. Doctor, your blog is amazing. I am just starting out with OneNote and am really intrigued. I noticed something quite useful, if you take a pdf straight out of explorer, press CTRL C and then move to one CTRL V to paste – it gives you an option to “Print it” or just link the file. Think it is more efficient than opening, printing, send to onenote. All about efficiency!

    1. Hi GDH
      You are right. Print to Onenote gives better control. Use whatever is more convenient.

      Step 1..Learn features. There is lot of content available in this category.
      Step 2.. Apply it in YOUR context.
      There is very little content available in this category. Why? Because there could be infinite scenarios.

      Efficiency is all about discovering your benefit hidden behind every feature.

Queries | Comments | Suggestions | Wish list