Thursday, December 19, 2024

Show HN: Musoq – Query Anything with SQL Syntax (Git, C#, CSV, Can DBC) https://ift.tt/LAGVZIW

Show HN: Musoq – Query Anything with SQL Syntax (Git, C#, CSV, Can DBC) Hey, For those of you who don't know my little tool Musoq, I wanted to introduce it as a small tool that allows you to query with SQL-like syntax without any database. It allows you to query various things from niche ones like CAN DBC files, weird ones like C# code, interesting ones with Git querying to regular stuff like CSV, TSV and various others. I am quite a bit experimenting with various things so I'm hybridizing the engine with LLMs or doing other weird stuff that are more or less practical :-) I wanted also to share some recent developments in this little project as I hope it might be interesting to some of you. New Experimental Plugins: * Git Plugin (Beta) : I've been working on Git repository querying - managed to test it on the EF Core repo (16k commits) and it seems to work okay * Roslyn Plugin (Beta) : Added basic C# code analysis capabilities For the very first time: I've extended CROSS APPLY to use computed results as arguments! Now the operator can use values from the current row as inputs. Here's an example: SELECT f.DirectoryName, f.FileName FROM #os.directories('/some/path', false) d CROSS APPLY #os.files(d.FullName, true) f WHERE d.Name IN ('Folder1', 'Folder2') After another pack of fixes I'm finally able to query multiple git repositories AT ONCE! with ProjectsToAnalyze as ( select dir2.FullName as FullName from #os.directories('D:\repos', false) dir1 cross apply #os.directories(dir1.FullName, false) dir2 where dir2.Name = '.git' ) select c.Message, c.Author, c.CommittedWhen from ProjectsToAnalyze p cross apply #git.repository(p.FullName) r cross apply r.Commits c where c.AuthorEmail = 'my-email@email.ok' order by c.CommittedWhen desc Under the Hood: - Added a Buckets feature for memory management (currently just testing it with the Roslyn plugin) - Moved to .NET 8 - Added CROSS/OUTER APPLY operators - Made some improvements to error messages and runtime behavior New piping features: I've been experimenting with piping capabilities: * Image Analysis with LLMs : ./Musoq.exe image encode "image.jpg" | ./Musoq.exe run query "select s.Shop, s.ProductName, s.Price from ..." * Text Data Extraction : Get-Content "ticket.txt" | ./Musoq.exe run query "select t.TicketNumber, t.CustomerName ... from #stdin.text('Ollama', 'llama3.1') t" * Data Source Combination : { docker image ls; ./Musoq.exe separator; docker container ls } | ./Musoq.exe run query "..." I'm working on comprehensive documentation: I encourage you especially to look at section "Practical Examples and Applications" and "Data Sources" where you can look at all the tables the tool currently provides. < https://puchaczov.github.io/Musoq/ > Other Changes: - Made some improvements to OS and Archive data sources (OS can now query metadata like EXIF) - Added a few fields to CAN DBC plugin - Command outputs can now be used as inputs for queries I'm hoping to: - Improve stability and add more tests - Flesh out the documentation - Work on package distribution (Scoop, Ubuntu packages) - Share some examples of source code querying with Roslyn Ideas for later: - WHERE robust analysis and optimizations - DISTINCT operator implementation - PROTOBUF schema support - Performance improvements - Query parallelization - Recursive CTEs - Subqueries I'd really appreciate any thoughts or feedback! The documentation section where I write a short analysis of EF Core with git plugin: < https://puchaczov.github.io/Musoq/practical-examples-and-app... > https://ift.tt/bLdIGv2 December 19, 2024 at 12:32AM

Show HN: Bodo – high-performance compute engine for Python data processing https://ift.tt/69tya5J

Show HN: Bodo – high-performance compute engine for Python data processing Hello HN, I’m excited to share Bodo, an open-source compute engine designed for large-scale data processing in native Python. Bodo is powered by an auto-parallelizing JIT compiler and an HPC backend, enabling it to generate highly optimized, parallel binaries (MPI) for Pandas and NumPy code—all without requiring any code rewrites. Our latest benchmark demonstrates 20x to 240x speedup over traditional distributed computing frameworks like Spark, Ray, and Dask (code and details in repo). The inspiration for Bodo came from my background in HPC, when I saw how extremely slow and hard to use Spark was (has gotten better over the years but still not great). Of course, a compiler has its own limitations (e.g. not all Python is compilable), but I think it’s leaps and bounds better. Let me know what you think. https://ift.tt/mYFluZL December 18, 2024 at 11:10PM

Show HN: I spent 4 years bootstrapping a financial planning tool to 30k MAUs https://ift.tt/PvZxK3k

Show HN: I spent 4 years bootstrapping a financial planning tool to 30k MAUs Hey everyone! I'm back with an update on this post [0]. Last year, I quit my corporate job and went full-time on ProjectionLab, the long-term financial planning app I've been building for the past 4 years, which some of you may recognize. The decision to go all-in felt like a huge leap. But it was the right call, and it's been a good year. And without the HN community, it would not have happened. As I mentioned last time [0], the feedback on my original Show HN is THE reason I'm still here working on this. I'm really grateful for that. And I hope the way I’ve grown PL -- staying bootstrapped and focused on users -- resonates with the early supporters who helped to shape it. For now I'm still the only engineer, burning the candle at both ends, but luckily I'm not feeling burnt out myself! It's been a fun and memorable year: - 6,139 commits, 221,484 insertions, 116,255 deletions - Shared my story on the ChooseFI podcast [1] (one of the original sources of inspiration for this project) - Started building a team (2 team members for customer success, 1 leading growth & marketing) - Doubled our customer base - Took no external funding, keeping our interests as aligned with users as possible Okay, but what did I actually do since last time? [2] Here's a quick cross-section: - Compare mode upgrades to explore what-if scenarios overlaid on the same chart with visual deltas/diffs - Launched ProjectionLab for Employers [3]: offer PL as a benefit, or get your employer to pick up the tab - Major tech stack migrations: Vue 2 -> Vue 3, Vue CLI -> Vite, Vuetify 2 -> Vuetify 3, Vuex -> Pinia, Jest -> Vitest, Firebase Namespaced API -> Modular API, Vike + SSG for marketing site - Advanced visualization features (1-click-plot any metric, interactive event icons in charts, etc) - Improved tax estimation & tax analytics - Simultaneous editing on multiple devices - MFA support - Rebuilt the help center, added more educational content and YouTube tutorial videos - Made it possible to book a 1-on-1 session for educational/training purposes - Converted ~65% of the codebase from JavaScript to TypeScript - And more! [2] I never saw myself as an entrepreneur/founder type. But apparently I've now spent 4 years turning a side project into a real business. I couldn't have done it without the initial support from this community, and I'd love to hear what you think of the updates and where you'd like to see things go from here. --Kyle [0] https://ift.tt/BSpfWhr [1] https://ift.tt/D8GupEw... [2] https://ift.tt/lqLhoQH [3] https://ift.tt/YSjzrtR https://ift.tt/Fs9WkL2 December 18, 2024 at 08:27PM

Wednesday, December 18, 2024

Show HN: Adventures in OCR https://ift.tt/6M8P5KF

Show HN: Adventures in OCR Hello HN! In a recent "Ask HN: What are you working on?" thread, I mentioned I was working on OCRing a large book: https://ift.tt/rhIHo8Q The post generated some interest so I thought I would keep HN posted. The book is Saint-Simon’s Memoirs -- an invaluable historical account of the French court under Louis XIV, full of wit, sharp observations, and of incredible literary value. I'm OCRing the edition of reference made between 1879-1930, that contains a lot of comments and footnotes: 45 volumes, ~27,000 pages. Here's a link to a blog post that describes the techniques used so far (the project is still ongoing): https://ift.tt/WdGS9gN But you may also directly access the result here: https://ift.tt/et6FZuP This web app (not optimized for mobile, sorry) solves a tricky problem of preloading images efficiently. In short: preloading the next image isn't enough, since browsers will repaint if an image is moved, or scaled. Or browsers won't paint at all if visibility is hidden or opacity is zero, and will paint only when those values change. On an average, slow machine, this takes visible time. But if an image is simply behind another element, it will be painted, and the removal of the covering element or changing the z-index will not trigger a repaint. (Preloading is important because it lets one review results fast; if one has to wait 150-200 ms between images it's simply discouraging). Would love to hear feedback; happy to answer any question! https://ift.tt/WdGS9gN December 17, 2024 at 10:30PM

Show HN: I built an open-source data pipeline tool in Go https://ift.tt/VIvb7yM

Show HN: I built an open-source data pipeline tool in Go Every data pipeline job I had to tackle required quite a few components to set up: - One tool to ingest data - Another one to transform it - If you wanted to run Python, set up an orchestrator - If you need to check the data, a data quality tool Let alone this being hard to set up and taking time, it is also pretty high-maintenance. I had to do a lot of infra work, and while this being billable hours for me I didn’t enjoy the work at all. For some parts of it, there were nice solutions like dbt, but in the end for an end-to-end workflow, it didn’t work. That’s why I decided to build an end-to-end solution that could take care of data ingestion, transformation, and Python stuff. Initially, it was just for our own usage, but in the end, we thought this could be a useful tool for everyone. In its core, Bruin is a data framework that consists of a CLI application written in Golang, and a VS Code extension that supports it with a local UI. Bruin supports quite a few stuff: - Data ingestion using ingestr ( https://ift.tt/6anTcAZ ) - Data transformation in SQL & Python, similar to dbt - Python env management using uv - Built-in data quality checks - Secrets management - Query validation & SQL parsing - Built-in templates for common scenarios, e.g. Shopify, Notion, Gorgias, BigQuery, etc This means that you can write end-to-end pipelines within the same framework and get it running with a single command. You can run it on your own computer, on GitHub Actions, or in an EC2 instance somewhere. Using the templates, you can also have ready-to-go pipelines with modeled data for your data warehouse in seconds. It includes an open-source VS Code extension as well, which allows working with the data pipelines locally, in a more visual way. The resulting changes are all in code, which means everything is version-controlled regardless, it just adds a nice layer. Bruin can run SQL, Python, and data ingestion workflows, as well as quality checks. For Python stuff, we use the awesome (and it really is awesome!) uv under the hood, install dependencies in an isolated environment, and install and manage the Python versions locally, all in a cross-platform way. Then in order to manage data uploads to the data warehouse, it uses dlt under the hood to upload the data to the destination. It also uses Arrow’s memory-mapped files to easily access the data between the processes before uploading them to the destination. We went with Golang because of its speed and strong concurrency primitives, but more importantly, I knew Go better than the other languages available to me and I enjoy writing Go, so there’s also that. We had a small pool of beta testers for quite some time and I am really excited to launch Bruin CLI to the rest of the world and get feedback from you all. I know it is not often to build data tooling in Go but I believe we found ourselves in a nice spot in terms of features, speed, and stability. https://ift.tt/FNXuYVM I’d love to hear your feedback and learn more about how we can make data pipelines easier and better to work with, looking forward to your thoughts! Best, Burak https://ift.tt/FNXuYVM December 17, 2024 at 10:10PM

Tuesday, December 17, 2024

Show HN: I made a multiplayer crossword game https://ift.tt/fHvr8s9

Show HN: I made a multiplayer crossword game Hey HN, I’ve been working on this multiplayer crossword for a while now. There’s still so much more on my todo list for it, but I think it’s time to launch and get some feedback with what I have. Every hour, a new crossword (13×13 or 15×15) is generated at https://ift.tt/w3gRHUL If you prefer smaller/faster, every ten minutes, a new mini crossword (7×7 or 11×11) is generated at https://ift.tt/GmHKxo0 You’re playing each crossword at the same time as everyone else, racing to complete it first. You can’t see what you’ve got right, until you correctly complete the entire grid. But you get some fun feedback on what other players are doing: a cell turns green if one other player has correctly solved it, orange if two other players have, and red if three or more other players have entered the correct letter in that cell. Chat is emoji-only for the first half of the game (i.e. 30 minutes for the front-page, 5 minutes for the mini). After that, it unlocks and you can chat freely. If you’re not done when the next crossword is generated, you can just stay on the current page for as long as you like and keep working to solve it. I didn’t manage to get user accounts done before my arbitrarily-imposed launch date, so everyone is anonymous for now. I definitely want to build accounts, streaks, trophies, etc. The other big thing I’m excited to build is a “team mode”. You should be able to play on a team, where you can chat freely with your team-mates, and the cell colors indicate what the other team has (collaboratively) got correct. I think that would be a lot of fun. Thanks for reading, checking it out, and for any feedback. Feel free to ask me anything, of course. https://ift.tt/aVn3N7w December 17, 2024 at 12:35AM

Show HN: NCompass Technologies – yet another AI Inference API, but hear us out https://ift.tt/dhvFyUH

Show HN: NCompass Technologies – yet another AI Inference API, but hear us out Hello HackerNews! I’m excited to share what we’ve been working on at nCompass Technologies: an AI inference platform that gives you a scalable and reliable API to access any open-source AI model — with no rate limits. We don't have rate limits as optimizations we made to our AI model serving software enable us to support a high number of concurrent requests without degrading quality of service for you as a user. If you’re thinking, well aren’t there a bunch of these already? So were we when we started nCompass. When using other APIs, we found that they weren’t reliable enough to be able to use open source models in production environments. To resolve this, we're building an AI inference engine that enable you, as an end user, to reliably use open source models in production. Underlying this API, we’re building optimizations at the hosting, scheduling and kernel levels with the single goal of minimizing the number of GPUs required to maximize the number of concurrent requests you can serve, without degrading quality of service. We’re still building a lot of our optimizations, but we’ve released what we have so far via our API. Compared to vLLM, we currently keep time-to-first-token (TTFT) 2-4x lower than vLLM at the equivalent concurrent request rate. You can check out a demo of our API here: https://ift.tt/CJv6SVG As a result of the optimizations we’ve rolled out so far, we’re releasing a few unique features on our API: 1. Rate-Limits: we don’t have any Most other API’s out there have strict rate limits and can be rather unreliable. We don’t want API’s for open source models to remain as a solution for prototypes only. We want people to use these APIs like they do OpenAI’s or Anthropic’s and actually make production grade products on top of open source models. 2. Underserved models: we have them There are a ton of models out there, but not all of them are readily available for people to use if they don’t have access to GPUs. We envision our API becoming a system where anyone can launch any custom model of their choice with minimal cold starts and run the model as a simple API call. Our cold starts for any 8B or 70B model are only 40s and we’ll keep improving this. Towards this goal, we already have models like `ai4bharat/hercule-hi` hosted on our API to support non-english language use cases and models like `Qwen/QwQ-32B-Preview` to support reasoning based use cases. You can find the other models that we host here: https://ift.tt/yRJF2Cv. We’d love for you to try out our API by following the steps here: https://ift.tt/6eD0FVJ . We provide $100 of free credit on sign up to run models, and like we said, go crazy with your requests, we’d love to see if you can break our system :) We’re still actively building out features and optimizations and your input can help shape the future of nCompass. If you have thoughts on our platform or want us to host a specific model, let us know at hello@ncompass.tech. Happy Hacking! https://ift.tt/REGAHID December 16, 2024 at 05:37PM

Show HN: Free OSS transcription app I made and found it's faster than wispr flow https://ift.tt/jXQh9Tk

Show HN: Free OSS transcription app I made and found it's faster than wispr flow title doesn't let nuance, ofc it's not the app ...