Wednesday, December 18, 2024

Show HN: Adventures in OCR https://ift.tt/6M8P5KF

Show HN: Adventures in OCR Hello HN! In a recent "Ask HN: What are you working on?" thread, I mentioned I was working on OCRing a large book: https://ift.tt/rhIHo8Q The post generated some interest so I thought I would keep HN posted. The book is Saint-Simon’s Memoirs -- an invaluable historical account of the French court under Louis XIV, full of wit, sharp observations, and of incredible literary value. I'm OCRing the edition of reference made between 1879-1930, that contains a lot of comments and footnotes: 45 volumes, ~27,000 pages. Here's a link to a blog post that describes the techniques used so far (the project is still ongoing): https://ift.tt/WdGS9gN But you may also directly access the result here: https://ift.tt/et6FZuP This web app (not optimized for mobile, sorry) solves a tricky problem of preloading images efficiently. In short: preloading the next image isn't enough, since browsers will repaint if an image is moved, or scaled. Or browsers won't paint at all if visibility is hidden or opacity is zero, and will paint only when those values change. On an average, slow machine, this takes visible time. But if an image is simply behind another element, it will be painted, and the removal of the covering element or changing the z-index will not trigger a repaint. (Preloading is important because it lets one review results fast; if one has to wait 150-200 ms between images it's simply discouraging). Would love to hear feedback; happy to answer any question! https://ift.tt/WdGS9gN December 17, 2024 at 10:30PM

Show HN: I built an open-source data pipeline tool in Go https://ift.tt/VIvb7yM

Show HN: I built an open-source data pipeline tool in Go Every data pipeline job I had to tackle required quite a few components to set up: - One tool to ingest data - Another one to transform it - If you wanted to run Python, set up an orchestrator - If you need to check the data, a data quality tool Let alone this being hard to set up and taking time, it is also pretty high-maintenance. I had to do a lot of infra work, and while this being billable hours for me I didn’t enjoy the work at all. For some parts of it, there were nice solutions like dbt, but in the end for an end-to-end workflow, it didn’t work. That’s why I decided to build an end-to-end solution that could take care of data ingestion, transformation, and Python stuff. Initially, it was just for our own usage, but in the end, we thought this could be a useful tool for everyone. In its core, Bruin is a data framework that consists of a CLI application written in Golang, and a VS Code extension that supports it with a local UI. Bruin supports quite a few stuff: - Data ingestion using ingestr ( https://ift.tt/6anTcAZ ) - Data transformation in SQL & Python, similar to dbt - Python env management using uv - Built-in data quality checks - Secrets management - Query validation & SQL parsing - Built-in templates for common scenarios, e.g. Shopify, Notion, Gorgias, BigQuery, etc This means that you can write end-to-end pipelines within the same framework and get it running with a single command. You can run it on your own computer, on GitHub Actions, or in an EC2 instance somewhere. Using the templates, you can also have ready-to-go pipelines with modeled data for your data warehouse in seconds. It includes an open-source VS Code extension as well, which allows working with the data pipelines locally, in a more visual way. The resulting changes are all in code, which means everything is version-controlled regardless, it just adds a nice layer. Bruin can run SQL, Python, and data ingestion workflows, as well as quality checks. For Python stuff, we use the awesome (and it really is awesome!) uv under the hood, install dependencies in an isolated environment, and install and manage the Python versions locally, all in a cross-platform way. Then in order to manage data uploads to the data warehouse, it uses dlt under the hood to upload the data to the destination. It also uses Arrow’s memory-mapped files to easily access the data between the processes before uploading them to the destination. We went with Golang because of its speed and strong concurrency primitives, but more importantly, I knew Go better than the other languages available to me and I enjoy writing Go, so there’s also that. We had a small pool of beta testers for quite some time and I am really excited to launch Bruin CLI to the rest of the world and get feedback from you all. I know it is not often to build data tooling in Go but I believe we found ourselves in a nice spot in terms of features, speed, and stability. https://ift.tt/FNXuYVM I’d love to hear your feedback and learn more about how we can make data pipelines easier and better to work with, looking forward to your thoughts! Best, Burak https://ift.tt/FNXuYVM December 17, 2024 at 10:10PM

Tuesday, December 17, 2024

Show HN: I made a multiplayer crossword game https://ift.tt/fHvr8s9

Show HN: I made a multiplayer crossword game Hey HN, I’ve been working on this multiplayer crossword for a while now. There’s still so much more on my todo list for it, but I think it’s time to launch and get some feedback with what I have. Every hour, a new crossword (13×13 or 15×15) is generated at https://ift.tt/w3gRHUL If you prefer smaller/faster, every ten minutes, a new mini crossword (7×7 or 11×11) is generated at https://ift.tt/GmHKxo0 You’re playing each crossword at the same time as everyone else, racing to complete it first. You can’t see what you’ve got right, until you correctly complete the entire grid. But you get some fun feedback on what other players are doing: a cell turns green if one other player has correctly solved it, orange if two other players have, and red if three or more other players have entered the correct letter in that cell. Chat is emoji-only for the first half of the game (i.e. 30 minutes for the front-page, 5 minutes for the mini). After that, it unlocks and you can chat freely. If you’re not done when the next crossword is generated, you can just stay on the current page for as long as you like and keep working to solve it. I didn’t manage to get user accounts done before my arbitrarily-imposed launch date, so everyone is anonymous for now. I definitely want to build accounts, streaks, trophies, etc. The other big thing I’m excited to build is a “team mode”. You should be able to play on a team, where you can chat freely with your team-mates, and the cell colors indicate what the other team has (collaboratively) got correct. I think that would be a lot of fun. Thanks for reading, checking it out, and for any feedback. Feel free to ask me anything, of course. https://ift.tt/aVn3N7w December 17, 2024 at 12:35AM

Show HN: NCompass Technologies – yet another AI Inference API, but hear us out https://ift.tt/dhvFyUH

Show HN: NCompass Technologies – yet another AI Inference API, but hear us out Hello HackerNews! I’m excited to share what we’ve been working on at nCompass Technologies: an AI inference platform that gives you a scalable and reliable API to access any open-source AI model — with no rate limits. We don't have rate limits as optimizations we made to our AI model serving software enable us to support a high number of concurrent requests without degrading quality of service for you as a user. If you’re thinking, well aren’t there a bunch of these already? So were we when we started nCompass. When using other APIs, we found that they weren’t reliable enough to be able to use open source models in production environments. To resolve this, we're building an AI inference engine that enable you, as an end user, to reliably use open source models in production. Underlying this API, we’re building optimizations at the hosting, scheduling and kernel levels with the single goal of minimizing the number of GPUs required to maximize the number of concurrent requests you can serve, without degrading quality of service. We’re still building a lot of our optimizations, but we’ve released what we have so far via our API. Compared to vLLM, we currently keep time-to-first-token (TTFT) 2-4x lower than vLLM at the equivalent concurrent request rate. You can check out a demo of our API here: https://ift.tt/CJv6SVG As a result of the optimizations we’ve rolled out so far, we’re releasing a few unique features on our API: 1. Rate-Limits: we don’t have any Most other API’s out there have strict rate limits and can be rather unreliable. We don’t want API’s for open source models to remain as a solution for prototypes only. We want people to use these APIs like they do OpenAI’s or Anthropic’s and actually make production grade products on top of open source models. 2. Underserved models: we have them There are a ton of models out there, but not all of them are readily available for people to use if they don’t have access to GPUs. We envision our API becoming a system where anyone can launch any custom model of their choice with minimal cold starts and run the model as a simple API call. Our cold starts for any 8B or 70B model are only 40s and we’ll keep improving this. Towards this goal, we already have models like `ai4bharat/hercule-hi` hosted on our API to support non-english language use cases and models like `Qwen/QwQ-32B-Preview` to support reasoning based use cases. You can find the other models that we host here: https://ift.tt/yRJF2Cv. We’d love for you to try out our API by following the steps here: https://ift.tt/6eD0FVJ . We provide $100 of free credit on sign up to run models, and like we said, go crazy with your requests, we’d love to see if you can break our system :) We’re still actively building out features and optimizations and your input can help shape the future of nCompass. If you have thoughts on our platform or want us to host a specific model, let us know at hello@ncompass.tech. Happy Hacking! https://ift.tt/REGAHID December 16, 2024 at 05:37PM

Monday, December 16, 2024

Show HN: GitHub Stars Semantic Search - Find Your Starred Projects https://ift.tt/7JH6sdz

Show HN: GitHub Stars Semantic Search - Find Your Starred Projects https://ift.tt/IvknyAK December 16, 2024 at 09:06AM

Show HN: Dbine – Auxiliary tools related to databases https://ift.tt/SojlUI5

Show HN: Dbine – Auxiliary tools related to databases https://ift.tt/BhAtpdD December 15, 2024 at 11:02PM

Show HN: SmartHome – An Adventure Game https://ift.tt/ONzrBij

Show HN: SmartHome – An Adventure Game SmartHome is a free, browser-based game written in vanilla JavaScript and no libraries. I don't want to spoil anything about the gameplay, but if you like text adventures, point-and-click adventure games, puzzle games, escape room games, art games, incremental games, cozy games, and/or RPGs, then this might be your speed. If you find it too hard and don't mind some mild spoilers, then check out the hints page: https://smarthome.steviep.xyz/help Enjoy! https://smarthome.steviep.xyz December 15, 2024 at 10:35PM

Show HN: Do You Know RGB? https://ift.tt/t8kUpbO

Show HN: Do You Know RGB? https://ift.tt/OWhvmMT June 24, 2025 at 01:49PM