# NitroWebfetch Extract web content, cleanly. **NitroWebfetch – the developer‑friendly web content extractor with CSS selectors.** This project is in alpha phase. ## Features - Extracts content from web pages using CSS selectors - Converts HTML to clean Markdown format - Fallback selectors for maximum compatibility - Command-line interface with various options - Built on Playwright for reliable web scraping - Completely free (open source, MIT license) ## Ideas for next steps - Add support for multiple output formats (JSON, plain text) - Batch processing for multiple URLs - Custom user-agent and headers configuration - Integration with NitroDigest for web page summarization - Support for authentication and cookies - Content filtering and cleaning options --- ## Usage ### Prerequisites To run this tool, you need to have [Python](https://www.python.org/downloads/) installed on your local machine. ### Installation Install NitroWebfetch via pip: ```bash pip install nitrowebfetch-cli playwright install firefox ``` For development installation: ```bash cd Projects/Nitrowebfetch pip install -e . playwright install firefox ``` ### Basic Usage Run NitroWebfetch to extract content from web pages: ```bash nitrowebfetch <url> > <output_file> ``` #### Examples Extract article content from a webpage and save it to a file: ```bash nitrowebfetch https://example.com/article > article.md ``` Extract content using a custom CSS selector: ```bash nitrowebfetch https://example.com --selector ".main-content" > content.md ``` Get HTML output instead of Markdown: ```bash nitrowebfetch https://example.com --format html > content.html ``` ### Command Line Arguments You can customize the extraction process using command line arguments: ```bash nitrowebfetch \ --selector ".article-body" \ --format md \ https://example.com ``` Available arguments: - `url`: URL to fetch content from (required) - `--selector`: CSS selector to use for content extraction (default: article) - `--format`: Format of output content - 'md' for Markdown or 'html' for raw HTML (default: md) ### Fallback Selectors If the primary selector doesn't match any elements, NitroWebfetch automatically tries these alternatives: - `article` - `main` - `.article` - `.content` - `#content` - `.post` - `.entry-content` --- ## Contributing Do you want to contribute to this tool? Check the Contributing page: [Getting started](../../Contributing.md) ## Report an issue Found an issue? You can easily report it here: [https://github.com/Frodigo/garage/issues/new](https://github.com/Frodigo/garage/issues/new) ## License This project is licensed under the MIT License - see the LICENSE file for details.