Growing India News, world news, nation news, our news, people's news, grow news, entertainment, fashion, movies, tech, automobile and many more..
Thursday, November 13, 2025
Show HN: Cancer diagnosis makes for an interesting RL environment for LLMs https://ift.tt/legimbw
Show HN: Cancer diagnosis makes for an interesting RL environment for LLMs Hey HN, this is David from Aluna (YC S24). We work with diagnostic labs to build datasets and evals for oncology tasks. I wanted to share a simple RL environment I built that gave frontier LLMs a set of tools that lets it zoom and pan across a digitized pathology slide to find the relevant regions to make a diagnosis. Here are some videos of the LLM performing diagnosis on a few slides: ( https://www.youtube.com/watch?v=k7ixTWswT5c ): traces of an LLM choosing different regions to view before making a diagnosis on a case of small-cell carcinoma of the lung ( https://youtube.com/watch?v=0cMbqLnKkGU ): traces of an LLM choosing different regions to view before making a diagnosis on a case of benign fibroadenoma of the breast Why I built this: Pathology slides are the backbone of modern cancer diagnosis. Tissue from a biopsy is sliced, stained, and mounted on glass for a pathologist to examine abnormalities. Today, many of these slides are digitized into whole-slide images (WSIs)in TIF or SVS format and are several gigabytes in size. While there exists several pathology-focused AI models, I was curious to test whether frontier LLMs can perform well on pathology-based tasks. The main challenge is that WSIs are too large to fit into an LLM’s context window. The standard workaround, splitting them into thousands of smaller tiles, is inefficient for large frontier LLMs. Inspired by how pathologists zoom and pan under a microscope, I built a set of tools that let LLMs control magnification and coordinates, viewing small regions at a time and deciding where to look next. This ended up resulting in some interesting behaviors, and actually seemed to yield pretty good results with prompt engineering: - GPT 5: explored up to ~30 regions before deciding (concurred with an expert pathologist on 4 out of 6 cancer subtyping tasks and 3 out of 5 IHC scoring tasks) - Claude 4.5: Typically used 10–15 views but similar accuracy as GPT-5 (concurred with the pathologist on 3 out of 6 cancer subtyping tasks and 4 out of 5 IHC scoring tasks) - Smaller models (GPT 4o, Claude 3.5 Haiku): examined ~8 frames and were less accurate overall (1 out of 6 cancer subtytping tasks and 1 out of 5 IHC scoring tasks) Obviously, this was a small sample set, so we are working on creating a larger benchmark suite with more cases and types of tasks, but I thought this was cool that it even worked so I wanted to share with HN! November 12, 2025 at 10:31PM
Wednesday, November 12, 2025
Show HN: Vexor – A semantic grep that finds files by meaning, not by text https://ift.tt/wZu8Dag
Show HN: Vexor – A semantic grep that finds files by meaning, not by text Vexor is a small CLI that lets you search files by meaning – like grep, but semantic. https://ift.tt/sUXDJ3q November 11, 2025 at 11:03PM
Tuesday, November 11, 2025
Show HN: Tracking AI Code with Git AI https://ift.tt/4wE9m2c
Show HN: Tracking AI Code with Git AI Git AI is a side project I created to track AI-generated code in our repos from development, through PRs, and into production. It does not just count lines, it keeps track of them as your code evolves, gets refactored and the git history gets rewritten. Think 'git blame' but for AI code. There's a lot about how it works in the post, but wanted to share how it's been impacting me + my team: - I find I review AI code very differently than human code. Being able to see the prompts my colleagues used, what the AI wrote, and where they stepped in to override has been extraordinarily helpful. This is still very manual today, but hope to build more UI around it soon. - “Why is this here?” — more than once I’ve giving my coding agent access to the past prompts that generated code I’m looking at, which lets the Agent know what my colleague was thinking when they made the change. Engineers talk to AI all day now…their prompts are sort of like a log of thoughts :) - I pay a lot of attention to the lines generated for every 1 accepted ratio. If it gets up over 4 or 5 it means I’m well outside the AI’s distribution or prompting poorly — either way, it’s a good cause for reflection and I’ve learned a lot about collaborating with LLMs. This has been really fun to build, especially because some amazing contributors who were working on similar projects came together and directed their efforts towards Git AI shine. We hope you like it. https://ift.tt/knIqNts November 10, 2025 at 10:56PM
Show HN: Tiny Diffusion – A character-level text diffusion model from scratch https://ift.tt/jPCtJUI
Show HN: Tiny Diffusion – A character-level text diffusion model from scratch https://ift.tt/mjGTCQa November 10, 2025 at 08:43PM
Monday, November 10, 2025
Show HN: Trilogy Studio, open-source browser-based SQL editor and visualizer https://ift.tt/32WVvxZ
Show HN: Trilogy Studio, open-source browser-based SQL editor and visualizer SQL-first analytic IDE; similar to Redash/Metabase. Aims to solve reuse/composability at the code layer with modified syntax, Trilogy, that includes a semantic layer directly in the SQL-like language. Status: experiment; feedback and contributions welcome! Built to solve 3 problems I have with SQL as my primary iterative analysis language: 1. Adjusting queries/analysis takes a lot of boilerplate. Solve with queries that operate on the semantic layer, not tables. Also eliminates the need for CTEs. 2. Sources of truth change all the time. I hate updating reports to reference new tables. Also solved by the semantic layer, since data bindings can be updated without changing dashboards or queries. 3. Getting from SQL to visuals is too much work in many tools; make it as streamlined as possible. Surprise - solve with the semantic layer; add in more expressive typing to get better defaults;also use it to wire up automatic drilldowns/cross filtering. Supports: bigquery, duckdb, snowflake. Links [1] https://ift.tt/5eBqOQ4 (language info) Git links: [Frontend] https://ift.tt/1XMPJRB [Language] https://ift.tt/79Hci5y Previously: https://ift.tt/0bf29XS (significant UX/feature reworks since) https://ift.tt/LtA4uBZ https://ift.tt/3if9weF November 10, 2025 at 04:56AM
Show HN: Alignmenter – Measure brand voice and consistency across model versions https://ift.tt/HDAsFNR
Show HN: Alignmenter – Measure brand voice and consistency across model versions I built a framework for measuring persona alignment in conversational AI systems. *Problem:* When you ship an AI copilot, you need it to maintain a consistent brand voice across model versions. But "sounds right" is subjective. How do you make it measurable? *Approach:* Alignmenter scores three dimensions: 1. *Authenticity*: Style similarity (embeddings) + trait patterns (logistic regression) + lexicon compliance + optional LLM Judge 2. *Safety*: Keyword rules + offline classifier (distilroberta) + optional LLM judge 3. *Stability*: Cosine variance across response distributions The interesting part is calibration: you can train persona-specific models on labeled data. Grid search over component weights, estimate normalization bounds, and optimize for ROC-AUC. *Validation:* We published a full case study using Wendy's Twitter voice: - Dataset: 235 turns, 64 on-brand / 72 off-brand (balanced) - Baseline (uncalibrated): 0.733 ROC-AUC - Calibrated: 1.0 ROC-AUC - 1.0 f1 - Learned: Style > traits > lexicon (0.5/0.4/0.1 weights) Full methodology: https://ift.tt/ewYQsnC There's a full walkthrough so you can reproduce the results yourself. *Practical use:* pip install alignmenter[safety] alignmenter run --model openai:gpt-4o --dataset my_data.jsonl It's Apache 2.0, works offline, and designed for CI/CD integration. GitHub: https://ift.tt/3iSEyqa Interested in feedback on the calibration methodology and whether this problem resonates with others. https://ift.tt/jsBq1yw November 10, 2025 at 05:23AM
Show HN: I'm a pastor/dev and built a 200M token generative Bible ($0.67/report) https://ift.tt/KdfvoBx
Show HN: I'm a pastor/dev and built a 200M token generative Bible ($0.67/report) https://ift.tt/UT3veDA November 10, 2025 at 01:41AM
Subscribe to:
Posts (Atom)
Show HN: The independent guide to agent orchestrators https://ift.tt/a6OnejT
Show HN: The independent guide to agent orchestrators Hey HN! I built AgentMGMT.dev today to keep track of all those agent orchestration too...
-
Show HN: An AI logo generator that can also generate SVG logos Hey everyone, I've spent the past 2 weeks building an AI logo generator, ...
-
Show HN: Simple Gantt Chart Software https://ift.tt/sa3dQKF May 7, 2022 at 12:39PM
-
Breaking #FoxNews Alert : Number of dead rises after devastating tornadoes, Kentucky governor announces — R Karthickeyan (@RKarthickeyan1)...