OCR (Optical Character Recognition)
The technology that turns a picture of text — like a scanned document or a photo of a receipt — into editable, searchable digital text.
What OCR actually does
OCR takes a raster image — a JPEG of a receipt, a scanned PDF page, a screenshot of a slide — and produces a string of Unicode characters. The image is just pixels to a computer until OCR runs. After OCR you can copy, search, translate, or paste the text anywhere.
How modern OCR engines work
The pipeline has three stages. First, segmentation: the engine finds blocks of text, then lines, then individual word and character boxes. Second, character classification: a neural network (usually an LSTM these days) looks at each glyph and predicts what letter it is. Third, a language model cleans up the output — if the classifier sees rnbut the language model knows the surrounding word is "morning", it can correct an obvious misread.
Tesseract and browser-side OCR
Tesseract is the open-source OCR engine originally built at HP, now maintained by Google. It ships with trained data files for 100+ languages including Hindi, Tamil, Bengali, Gujarati, and other Indic scripts. The Toolkiya Image to Text tool runs Tesseract compiled to WebAssembly directly in your browser, which means your file never uploads anywhere — the pixels stay on your device and you get the text back locally.
What makes OCR accurate (or not)
- Resolution: 300 DPI is the floor for printed text. Mobile camera shots at 8+ MP usually clear this.
- Contrast: dark text on a plain light background reads cleanest. Coloured paper or photocopies of photocopies hurt accuracy.
- Skew and warp: a tilted or curved page (think the spine of an open book) needs deskew preprocessing before classification.
- Font: standard serif and sans-serif fonts score >98% on clean scans. Handwriting and decorative display fonts crater that number.
- Language pack: pointing Tesseract at
engwhen the page is Hindi will return garbage. Match the trained data to the script.
Common use cases
People use OCR to digitise paper receipts and bills before filing them, to make scanned PDFs searchable (so Ctrl+F actually works), to pull a phone number off a screenshot, to extract text from a slide deck PDF for translation, or to convert old printed notes into something they can edit. For Indian users, OCR also handles printed Aadhaar/PAN copies, ration cards, and bilingual government forms where the same page mixes Devanagari and Latin script.
Privacy footnote
Cloud OCR services upload your document to a server, run the engine there, and return text. That works, but the scan — possibly with your name, salary, or address on it — now lives in someone else's logs. Browser-side OCR avoids that entirely. The file is read into memory, processed by WebAssembly, and discarded when you close the tab.
Related Toolkiya tools
Browse the full glossary
Plain-English explanations for the technical terms behind everyday online tools.
See all entries