pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/HasData/ecommerce-price-scraper

rigin="anonymous" media="all" rel="stylesheet" href="https://github.githubassets.com/assets/primer-b69241e157469407.css" /> GitHub - HasData/ecommerce-price-scraper: Python toolkit for web scraping product prices from e-commerce sites. Features locale-aware normalization, currency detection, anti-bot techniques, price drop alerts, and LLM extraction for complex layouts. · GitHub
Skip to content

HasData/ecommerce-price-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python

Price Scraping Toolkit

HasData_bannner

A production-grade collection of Python scripts for extracting, normalizing, and monitoring e-commerce pricing data.

Features

  • Multi-locale price normalization (US/EU formats)
  • Marketing noise removal ("Was $X", "Save Y%")
  • Currency detection with geo-context
  • Hierarchical selector strategies (JSON-LD → microdata → CSS)
  • API interception via Playwright
  • AI-powered extraction for complex layouts
  • Price drop monitoring with SQLite

Project Structure

examples/
├── 01_price_normalization.py    # Handle "1,234.56" vs "1.234,56"
├── 02_marketing_cleanup.py      # Remove "Was $X Now $Y" noise
├── 03_currency_detection.py     # Resolve $ → USD/CAD/AUD via geo-hints
├── 04_selector_hierarchy.py     # Fallback strategy for robust extraction
├── 05_api_interception.py       # Capture Nike's internal API calls
├── 06_ai_extraction.py          # LLM-based multi-variant extraction
├── 07_price_monitoring.py       # Track price drops over time
└── 08_geo_pricing_audit.py      # Compare prices across regions

Quick Start

Installation

pip install -r requirements.txt

Example 1: Normalize International Prices

from decimal import Decimal
from examples.price_normalization import normalize_price

# US format
price_us = normalize_price("$1,234.56", locale_hint="US")
# → Decimal('1234.56')

# EU format
price_eu = normalize_price("€ 1.234,56", locale_hint="EU")
# → Decimal('1234.56')

# Auto-detection
price_auto = normalize_price("1.234,56", locale_hint="AUTO")
# → Decimal('1234.56') (detects EU from comma placement)

Example 2: Clean Marketing Noise

from examples.marketing_cleanup import extract_clean_price

html = "Was $129.99 Now $99.99 (Save $30)"
clean_price = extract_clean_price(html)
# → Decimal('99.99')

Example 3: Monitor Price Drops

from examples.price_monitoring import PriceTracker

tracker = PriceTracker()
tracker.save("https://demo.nopcommerce.com/camera-photo", Decimal("249.99"))
tracker.save("https://demo.nopcommerce.com/camera-photo", Decimal("199.99"))

alert = tracker.check_drop("https://demo.nopcommerce.com/camera-photo", threshold_percent=10)
if alert:
    print(f"Price dropped {alert['discount']:.1f}%!")
    # → "Price dropped 20.0%!"

Configuration

For HasData API Examples

Replace YOUR_HASDATA_API_KEY in scripts with your actual key:

API_KEY = "YOUR_HASDATA_API_KEY"

For Geo-Pricing Audits

Specify target markets in 08_geo_pricing_audit.py:

TARGET_REGIONS = ["US", "DE", "IN", "BR"]

Use Cases

Script Best For Key Technique
01_price_normalization.py Multi-region stores Locale-aware parsing
02_marketing_cleanup.py Deal/coupon sites Regex noise removal
03_currency_detection.py Global marketplaces Symbol + geo mapping
04_selector_hierarchy.py Resilient scraping Structured data fallbacks
05_api_interception.py React/Vue SPAs Network request capture
06_ai_extraction.py Complex variants LLM schema extraction
07_price_monitoring.py Deal alerts Time-series analysis
08_geo_pricing_audit.py Price discrimination Residential proxy rotation

Important Notes

Financial Precision

Always use Decimal for price calculations, never float:

# ❌ BAD
price = 19.99 * 0.85  # → 16.991499999999997

# ✅ GOOD
from decimal import Decimal
price = Decimal("19.99") * Decimal("0.85")  # → 16.9915

Tech Stack

  • Requests - HTTP client
  • BeautifulSoup4 - HTML parsing
  • Playwright - Browser automation
  • SQLite - Price history storage
  • HasData API - Proxy & AI extraction

Disclaimer

These scripts are for educational purposes only. Check our legal guidance on web scraping.

Notes

  • Use random delays to mimic human behavior and avoid blocks.
  • Proxy support helps reduce rate limits and IP bans.
  • Scrapers export data in JSON format, ready to parse for further use.
  • Adjust max pages and URLs according to your scraping needs.

📎 More Resources

About

Python toolkit for web scraping product prices from e-commerce sites. Features locale-aware normalization, currency detection, anti-bot techniques, price drop alerts, and LLM extraction for complex layouts.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 
pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy