An automated job scraping tool that collects job listings from LinkedIn, Indeed, and Skillsire, filters them by role keywords, removes duplicates, and saves results to CSV files with optional email notifications.
- Multi-Platform Scraping: Scrapes jobs from LinkedIn, Indeed, and Skillsire
- Smart Filtering: Filter jobs by role keywords (e.g., "Backend", "Frontend", "Developer")
- Deduplication: Automatically removes duplicate job postings
- CSV Export: Saves results in organized CSV files with timestamps
- Email Notifications: Optional email delivery with CSV attachments
- Continuous Operation: Runs on a configurable schedule to catch new postings
- Rate Limiting: Built-in delays to avoid IP blocking
- Python 3.7 or higher
- pip (Python package manager)
git clone https://github.com/Samirasimha/JobScraperUltimate.git
cd JobScraperUltimateInstall all required packages using the provided requirements file:
pip install -r requirements.txtThis will install:
python-jobspy- Job scraping library for LinkedIn and Indeedpandas- Data manipulation and CSV handlingrequests- HTTP library for Skillsire API calls
Open JobScraperUltimate.py and configure the settings at the top of the file:
# How many hours back to search for jobs (first run only)
hours = 1
# Time in seconds to wait between scraping sessions
# Recommended: 600-3600 (10 minutes to 1 hour)
sleep_time = 10
# Job portals to scrape (remove any you don't want)
scrape_from = ["linkedin", "indeed", "skillsire"]
# Your job search query
search_term = "software engineer"
# Maximum number of results to fetch per scraping session
# Note: Higher numbers may increase risk of IP blocking
results_fetch_count = 300
# Country filter (for Indeed only)
country_to_search = 'USA'
# Optional prefix for CSV filenames
file_name_prefix = ''Filter jobs by keywords in the job title. Jobs matching ANY of these keywords will be included:
roles_of_interest = [
"Backend",
"Frontend",
"Developer",
"Engineer",
# Add more keywords as needed
]To receive job listings via email, configure these settings:
email_send = True # Set to True to enable email notifications
from_email = 'your.email@gmail.com'
email_password = 'your_app_password' # See note below about app passwords
to_email = 'recipient@example.com'
email_smtp = 'smtp.gmail.com' # Gmail SMTP server| Email Provider | SMTP Server |
|---|---|
| Gmail | smtp.gmail.com |
| Outlook/Hotmail | smtp-mail.outlook.com |
| Yahoo | smtp.mail.yahoo.com |
| iCloud | smtp.mail.me.com |
For Gmail, you'll need to create an "App Password" instead of using your regular password:
- Go to your Google Account settings
- Select Security → 2-Step Verification (enable if not already)
- Scroll to App passwords
- Generate a new app password for "Mail"
- Use this 16-character password in the
email_passwordsetting
Once configured, start the scraper:
python JobScraperUltimate.pyOr on some systems:
python3 JobScraperUltimate.py- First Run: Searches for jobs posted in the last
hourshours (as configured) - Subsequent Runs: Searches for jobs posted in the last 1 hour only
- Filtering: Applies your
roles_of_interestkeywords - Deduplication: Removes jobs already saved in today's CSV
- Saves Results: Appends new jobs to a timestamped CSV file
- Email (Optional): Sends the CSV via email if enabled
- Sleeps: Waits for
sleep_timeseconds before the next run - Repeats: Continues indefinitely until stopped
Press Ctrl + C in the terminal to stop the scraper gracefully.
CSV files are automatically generated with descriptive names:
Format: [prefix_]jobs_MonthName_Day_TimeOfDay.csv
Examples:
jobs_January_21_morning.csvmy_jobs_December_25_afternoon.csv
Time of Day Labels:
overnight: 12:00 AM - 8:59 AMmorning: 9:00 AM - 11:59 AMafternoon: 12:00 PM - 3:59 PMevening: 4:00 PM - 8:59 PMnight: 9:00 PM - 11:59 PM
Each scraping run adds a timestamp separator in the CSV, so you can see when each batch of jobs was found.
| Column | Description |
|---|---|
job_url |
Direct link to the job posting |
title |
Job title |
company |
Company name |
location |
Job location |
- Keep
results_fetch_countat 300 or below - Set
sleep_timeto at least 600 seconds (10 minutes) for production use - Don't run multiple instances of the scraper simultaneously
- Use specific keywords in
search_term(e.g., "Python developer" instead of "developer") - Add specific role keywords to
roles_of_interestto reduce false positives - Review the first few runs and adjust filters as needed
- Ensure 2-factor authentication is enabled for Gmail
- Use app-specific passwords, not your regular account password
- Check your spam folder if emails aren't arriving
- Some email providers may require additional security settings
Solution: Install dependencies:
pip install -r requirements.txtSolution:
- Verify your email and password are correct
- For Gmail, use an App Password instead of your regular password
- Check that 2-factor authentication is enabled
Solution:
- Increase
sleep_timeto 1800 or 3600 seconds (30-60 minutes) - Reduce
results_fetch_countto 100 or less - Use a VPN or wait 24 hours before trying again
Solution:
- Make your
search_termmore general (e.g., "engineer" instead of "senior backend engineer") - Remove or broaden your
roles_of_interestfilters - Increase
hoursto search further back in time - Check if the job portals are accessible from your location
Contributions are welcome! Feel free to:
- Report bugs or issues
- Suggest new features
- Submit pull requests
- Improve documentation
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or feedback, contact: samirasimha.r@gmail.com
Made with ❤️ by Samirasimha Rajasimha