Skip to content

Kelony11/image_finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ImageFinder β€” Multithreaded Web Crawler & Image URL Extractor

A Java 8 & Maven web app that crawls a user-provided URL, discovers same-domain subpages, and returns a JSON array of image URLs found across the crawl. Built for the Eulerity Hackathon Challenge.


Features

Core requirements

  • βœ… Crawl a starting URL and extract all image URLs from discovered pages
  • βœ… Crawl sub-pages to find additional images
  • βœ… Multi-threaded crawling to process multiple pages concurrently
  • βœ… Same-domain enforcement (won’t crawl outside the input URL’s domain)
  • βœ… No duplicate visits (tracks visited pages to avoid re-crawling)

Tech Stack

  • Java 8 required
  • Maven 3.5+
  • Jetty (mvn jetty:run)
  • Jsoup HTML parsing + crawling

Getting Started

Requirements

Make sure you have:

  • Java 8 (exactly) β€” Java 9+ will fail the build
  • Maven 3.5+

Check versions:

java -version
mvn -version
  • Build From the project root:
mvn package

To clean build artifacts:

mvn clean
  • RUN Start the server:
mvn clean test package jetty:run

Note: Open the local host and test it with the links in test-links.txt file.


Disclaimer Educational/demo project, not intended as production-grade web scraping infrastructure.


Author

Kelvin Ihezue

About

πŸŒƒ Multithreaded Java web crawler that scans a URL and same-domain pages to extract image links, returning results as JSON

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors