Wednesday, December 18, 2024

Show HN: Adventures in OCR https://ift.tt/6M8P5KF

Show HN: Adventures in OCR Hello HN! In a recent "Ask HN: What are you working on?" thread, I mentioned I was working on OCRing a large book: https://ift.tt/rhIHo8Q The post generated some interest so I thought I would keep HN posted. The book is Saint-Simon’s Memoirs -- an invaluable historical account of the French court under Louis XIV, full of wit, sharp observations, and of incredible literary value. I'm OCRing the edition of reference made between 1879-1930, that contains a lot of comments and footnotes: 45 volumes, ~27,000 pages. Here's a link to a blog post that describes the techniques used so far (the project is still ongoing): https://ift.tt/WdGS9gN But you may also directly access the result here: https://ift.tt/et6FZuP This web app (not optimized for mobile, sorry) solves a tricky problem of preloading images efficiently. In short: preloading the next image isn't enough, since browsers will repaint if an image is moved, or scaled. Or browsers won't paint at all if visibility is hidden or opacity is zero, and will paint only when those values change. On an average, slow machine, this takes visible time. But if an image is simply behind another element, it will be painted, and the removal of the covering element or changing the z-index will not trigger a repaint. (Preloading is important because it lets one review results fast; if one has to wait 150-200 ms between images it's simply discouraging). Would love to hear feedback; happy to answer any question! https://ift.tt/WdGS9gN December 17, 2024 at 10:30PM

No comments:

Post a Comment

Show HN: Pocket2Linkding – Migrate from Mozilla Pocket to Linkding https://ift.tt/IwYJfju

Show HN: Pocket2Linkding – Migrate from Mozilla Pocket to Linkding With the Mozilla Pocket shutdown coming up in about two weeks, I thought ...