A Java 8 & Maven web app that crawls a user-provided URL, discovers same-domain subpages, and returns a JSON array of image URLs found across the crawl. Built for the Eulerity Hackathon Challenge.
- β Crawl a starting URL and extract all image URLs from discovered pages
- β Crawl sub-pages to find additional images
- β Multi-threaded crawling to process multiple pages concurrently
- β Same-domain enforcement (wonβt crawl outside the input URLβs domain)
- β No duplicate visits (tracks visited pages to avoid re-crawling)
- Java 8
required - Maven 3.5+
- Jetty (mvn jetty:run)
- Jsoup
HTML parsing + crawling
Make sure you have:
- Java 8 (exactly) β Java 9+ will fail the build
- Maven 3.5+
Check versions:
java -version
mvn -version- Build From the project root:
mvn packageTo clean build artifacts:
mvn clean- RUN Start the server:
mvn clean test package jetty:runNote: Open the local host and test it with the links in test-links.txt file.
Disclaimer Educational/demo project, not intended as production-grade web scraping infrastructure.
Kelvin Ihezue