2025-06-27-success-software-2025-list-from-all-link-dump-post
[[TOC]]
2025-06-27-success-software-2025-list-from-all-link-dump-post
Disclaimer: This post is LLM-generated, like everything here in AI experiments is AI-generated, and created based on my description and instructions, and after shallow review by a human.
This experiment explored how to harness an LLM agent built with OpenAIโs Codecs tool to process a large collection of Markdown files and generate a consolidated software catalog. The core technique was to drive the agent via a simple โto-doโ list in TODO.md, instruct it through AGENTS.md, and capture results in software2025.md. Over six iterations, the agent consumed 73 unprocessed files, extracted links, and produced structured entries.
Repository Structureโ
The key files in this experiment were:
- /docs/llm-experiments/2025-06-27/TODO : A checklist of files to process
- /docs/llm-experiments/2025-06-27/AGENTS : The agentโs rules, responsibilities, and workflow
- /docs/llm-experiments/2025-06-27/software : The generated catalog of software entries
Sample from TODO.mdโ
## Files to process
- [x] 2025-05-28-links-from-my-inbox.md
- [ ] 2025-06-09-links-from-my-inbox.md
The unchecked item marked the next file the agent should load and process .
Sample from AGENTS.mdโ
# LLM Agent: Software Link Extractor and Cataloger
## Responsibilities
- Parse TODO.md for unprocessed items
- Load each referenced file
- Extract all URLs
- Filter URLs by software criteria
- Query external sources for descriptions
- Append entries to software.md
- Mark TODO.md items as processed
This clear list of steps told the agent exactly what to do in each batch of ten files .
Technique Descriptionโ
The main trick was to embed agent instructions in a Markdown file so that the LLM could read its own โto-doโ list and follow it. This approach has three parts:
- Task Definition
A simple checklist in
TODO.mdlisted all files. Each unchecked box indicated work to be done. - Agent Instructions
In
AGENTS.md, the agent read the checklist, processed one batch at a time, and knew how to handle each file. - Output Consolidation
The agent appended formatted entries into
software2025.mdand updated the checklist.
This method turned a static repository into a dynamic, self-driving workflow where the LLM agent could iterate without manual intervention.
Iteration Processโ
Over six iterations, the agent processed all 73 files. Each iteration followed these steps:
- Read
TODO.mdand find up to ten unchecked files. - For each file:
- Load its content.
- Extract all
http://orhttps://links. - Filter links to include only software projects, tools, or downloads.
- Perform web queries to gather a title, description, and usage example.
- Generate a single Markdown list entry.
- Append new entries under the correct category in
software2025.md. - Mark each processed file with
[x]inTODO.md.
By batching ten files, the agent maintained focus and quality while ensuring progress was tracked.
Example Entry in software2025.mdโ
### ๐ฅ๏ธโ๏ธ Command-Line Applications
- [tldr pages](https://tldr.sh/) โ Community-maintained cheat-sheets for over 200 Unix commands. Example: `tldr tar` shows common tar options. :contentReference[oaicite:5]{index=5}
Each entry included a link, a concise description, and a command-line example. This made the catalog immediately usable.
File Coverageโ
The 73 files spanned from July 2021 through May 2025. Sample breakdown:
- 2021: 12 files
- 2022: 24 files
- 2023: 16 files
- 2024: 12 files
- 2025: 9 files
This wide date range demonstrated the agentโs ability to handle a large and growing archive.
Benefits of the Techniqueโ
- Scalability
The to-do list mechanism scales to hundreds of files by simply updating
TODO.md. - Transparency All instructions live in Markdown, making the workflow easy to audit.
- Reproducibility Anyone can clone the repo, run the agent, and get the same results.
Conclusionโ
By combining a simple Markdown to-do list and an agent description file, this experiment showed how to orchestrate an LLM to perform multi-step workflows over a large codebase. The agent iterated, extracted links, enriched data via web search, and produced a structured catalog in software2025.md. This pattern can be extended to other tasks such as documentation generation, data extraction, or content analysis.