LLM Agent: Software Link Extractor and Cataloger
Overviewβ
This agent parses TODO.md for unprocessed items, reads each referenced file, extracts software-related links, gathers metadata via web search, and updates software.md with categorized entries.
Responsibilitiesβ
- Parse TODO.md and identify β- [ ] filenameβ lines
- Load each referenced markdown file
- Extract all URLs from file content
- Filter URLs by software criteria
- For each filtered URL perform web search to collect description and usage examples
- Generate TLDR entry for each software item
- Categorize entries under precise headings
- Append new entries to software.md without duplications
Batchesβ
from TODO.md, use batches of 10 files. Make sure you are focused on the content and produce best quality output.
software.mdβ
Located in repository root.
Workflowβ
- Read TODO.md from repository root
- Identify lines starting with β- [ ] β indicating unprocessed items
- For each identified filename:
- Load file content from repository
- Extract all URL patterns (http:// or https://)
- Apply software filter rules
- For each retained URL:
- Query external sources for title, description, usage examples
- Create markdown entry with link, TLDR, examples
- Assign to appropriate category based on functionality
- Append all new entries under each category in software.md
- Mark original TODO.md item as processed (change β[ ]β to β[x]β)
Software Filter Rulesβ
- Include URLs pointing to online services, tools, GitHub projects, downloadable software, or games
- Exclude tutorial pages, programming language how-to guides, blog posts without direct software links
- Include modules or plugins that can be installed and run
- Include video games and command-line utilities
- Reject purely educational or conceptual content
Entry Format for software.mdβ
- Top-level heading per category (for example, β## Command-Line Toolsβ)
- Under each heading, single-level list entries:
[Name](URL) β Short description. Usage example.
- Ensure each entry is self-contained and concise
External Data Gatheringβ
- Use web search API to retrieve official project page or repository
- Extract project name, summary description, minimum usage example
- Cite source URLs or APIs in entry metadata comments if needed
Failure Handlingβ
- Log files that cannot be opened or parsed
- Skip URLs with invalid format or unreachable hosts
- Report summary of skipped items at end of run
Update Trackingβ
- After successful entry creation, update TODO.md by replacing β- [ ]β with β- [x]β for that filename
- Commit changes to both software.md and TODO.md
Output and Update Instructionsβ
- With each run, the agent must parse new entries from unprocessed files listed in
TODO.md
and append new software records intosoftware.md
. - If a software project already exists in
software.md
(based on URL or name), do not add a duplicate. - Each newly added item must match the tone, formatting, and quality of existing entries in
software.md
. - If a suitable category for the software does not exist, the agent must create a new category section and assign the project appropriately using best judgment.
- Descriptions must be clear, self-contained, and informative enough that users understand what the software does without needing to follow the link.
- Final output must preserve the structural integrity and order of
software.md
, following markdown syntax conventions and using proper subcategorization where needed.