Growing India News, world news, nation news, our news, people's news, grow news, entertainment, fashion, movies, tech, automobile and many more..
Sunday, June 11, 2023
Show HN: Reddit Archiving Tool https://ift.tt/nXSLgJY
Show HN: Reddit Archiving Tool Inspired by the ongoing call-to-action by the Internet Archive team over at /r/DataHoarder [1], I've decided I want to try to preserve all cybersecurity related subreddits. [2] For people that don't know what's going on: There's a likelihood that the try to monetize the Reddit API will lead to a lot of moderators quitting the platform, and it could be that a lot of subreddits are going to be set on private and/or their threads are going to be deleted. At least that's kind of the fear from the ongoing moderator strike. In my case I learned a LOT from reddits' discussions about malware, exploits and how they work, and without those I certainly wouldn't be where I am today ... so I'm trying to preserve them. As the Archive Warrior only scrapes the HTML directly to the Web Archive, I'm trying to preserve the data itself directly as JSON files; with intent to store it later on IPFS (having been inspired a couple days ago by the-eye-team's effort to archive RARBG on IPFS). I just wanted to let people know here about the tool, and in case you want to archive your favorite subreddits, feel free to modify it. There are some limitations though, because listings (new/hot/top/search) are all limited to 1000 entries, which means that the discovery of old threads is quite limited. Keyword search increases the discovery of old threads. In my case I'm searching for a lot of keywords (like CVE, RCE, vulnerability etc) in order to discover more threads. Would love to hear feedback, currently it's just a prototypical quick n' dirty tool because the threat of my favorite subreddits going dark is quite immediate. I tried to reduce as much noise from the schema as possible, and the tool is only archiving the subreddit threads and comments, with the idea to be able to scrape the websites/blog articles at a later point in time. [1] https://ift.tt/u9wBQRg [2] https://ift.tt/nKU07Gw June 11, 2023 at 04:27AM
Subscribe to:
Post Comments (Atom)
Show HN: Pocket2Linkding – Migrate from Mozilla Pocket to Linkding https://ift.tt/IwYJfju
Show HN: Pocket2Linkding – Migrate from Mozilla Pocket to Linkding With the Mozilla Pocket shutdown coming up in about two weeks, I thought ...
-
Show HN: An AI logo generator that can also generate SVG logos Hey everyone, I've spent the past 2 weeks building an AI logo generator, ...
-
Show HN: Snap Scope – Visualize Lens Focal Length Distribution from EXIF Data https://ift.tt/yrqHZtDShow HN: Snap Scope – Visualize Lens Focal Length Distribution from EXIF Data Hey HN, I built this tool because I wanted to understand which...
-
Show HN: Federated IndieAuth Server implemented as a notebook https://ift.tt/32IC633 April 27, 2021 at 04:37PM
No comments:
Post a Comment