Himalayas Job & Company Scraper

Extract structured remote job listings and company profiles from Himalayas search pages and company libraries in one consistent dataset. Built for teams that need reliable Himalayas job data for aggregation, enrichment, and hiring intelligence without manual copying.

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for himalayas-job-company-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

This project collects remote job listings and company profiles from Himalayas using a single search URL input for either Jobs or Companies. It helps turn messy browsing into structured, reusable data for recruitment workflows, market research, and enrichment pipelines. It’s designed for job boards, HR tech teams, analysts, and automation builders who need repeatable extraction with clean fields.

Built for remote hiring pipelines

Accepts a single search URL for job results or company directories
Extracts job + company details into normalized objects (easy to store and dedupe)
Captures salary ranges, tech stacks, benefits, and social profiles when available
Supports frequent re-runs to track updates, expirations, and newly posted roles
Outputs both human-readable text fields and rich HTML where useful

Features

Feature	Description
Job search URL support	Pull job listings from any supported search results page with filters already applied.
Company directory URL support	Extract company profiles from company library pages or filtered directories.
Salary range capture	Collects min/max salary and currency when published.
Tech stack extraction	Reads technologies, tools, and stack categories for each company when listed.
Benefits cataloging	Extracts company benefits with category and description for benchmarking.
Social & website enrichment	Collects website and social account links for outreach and enrichment.
Job metadata normalization	Captures created/updated timestamps, expiration, categories, skills, and restrictions.
Promoted job detection	Flags stickied/promoted listings so they can be filtered or labeled in downstream tools.
Canonical URL + GUID fields	Stores stable identifiers to prevent duplicates and support incremental syncs.

What Data This Scraper Extracts

Field Name	Field Description
slug	URL-safe identifier for the job or company.
title	Job title (for jobs) or company name (for companies).
employmentType	Job type such as full-time, contract, etc.
minSalary	Minimum salary value when available.
maxSalary	Maximum salary value when available.
currency	Salary currency code (e.g., USD).
applicationLink	Direct application URL when present.
locationRestrictions	Allowed/limited locations listed on the job.
timezoneRestrictions	Allowed/limited timezones (often numeric offsets).
createdAt	Original posting time when available.
updatedAt	Last update timestamp (job edits, listing updates).
expiryDate	Expiration date when provided.
isStickied	Whether the job is promoted/stickied.
parentCategories	Higher-level category grouping for the role.
categories	Job categories/tags as listed.
skills	Skills/technologies required or recommended.
guid	Canonical URL identifier for stable reference.
description_html	Rich job description HTML.
description_text	Cleaned plain-text job description.
company.name	Employer name associated with the job.
company.employeeRange	Employee count band (e.g., 1-10, 1001-5000).
company.summary	Short company summary/one-liner.
company.about	Long-form company description (HTML).
company.externalLink	Company website URL when available.
company.internalLink	Company profile URL.
company.logo	Company logo URL.
company.yearFounded	Year founded when listed.
company.ceo	CEO name when listed.
company.locations	Country/region objects where the company operates.
company.markets	Markets/industries tags for the company.
company.isVerified	Whether the company profile is verified.
company.liveJobsCount	Number of active job listings for the company.
company.liveJobSlugs	Slugs for active jobs tied to the company.
company.benefits	Benefits array with title, description, category.
company.stacks	Tech stack array with title, summary, logo, category.
company.twitter	Twitter/X profile URL if present.
company.linkedin	LinkedIn profile URL if present.
company.facebook	Facebook profile URL if present.
company.instagram	Instagram profile URL if present.

Example Output

[
      {
        "slug": "remote-administrative-assistant",
        "title": "Remote - Administrative Assistant",
        "employmentType": "Full Time",
        "minSalary": 20000,
        "maxSalary": 25000,
        "currency": "USD",
        "applicationLink": "https://himalayas.app/apply/fgpvm",
        "locationRestrictions": [],
        "timezoneRestrictions": [ -8, -7, -6, -5, -4 ],
        "createdAt": "2024-10-16 12:53:19",
        "updatedAt": "2024-10-28 07:30:07",
        "expiryDate": "2024-11-15 12:50:41",
        "isStickied": true,
        "parentCategories": [ "Human Resources" ],
        "categories": [
              "Remote-Administrative-Assistant",
              "Administrative-Assistant",
              "Virtual-Assistant",
              "Executive-Assistant"
        ],
        "skills": [
              "Administrative-Support",
              "Project-Management",
              "MS-Office-Suite",
              "Remote-Collaboration"
        ],
        "guid": "https://himalayas.app/companies/infrasync-technology-services/jobs/remote-administrative-assistant",
        "company": {
              "name": "Infrasync Technology Services",
              "slug": "infrasync-technology-services",
              "employeeRange": "1-10",
              "isVerified": true,
              "logo": "https://cdn-images.himalayas.app/l6y9d2uqx7o85917rznbk97ucczm",
              "internalLink": "https://himalayas.app/companies/infrasync-technology-services",
              "externalLink": "https://infrasync.com?utm_source=himalayas.app&utm_medium=himalayas.app&utm_campaign=himalayas.app&ref=himalayas.app&source=himalayas.app",
              "yearFounded": 2024,
              "ceo": "Andrew Swirsky",
              "liveJobsCount": 1,
              "liveJobSlugs": [ "remote-administrative-assistant" ],
              "linkedin": "https://www.linkedin.com/company/98777116"
        }
      }
]

Directory Structure Tree

Himalayas Job & Company Scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Himalayas Job & Company Scraper )/
├── src/
│   ├── main.py
│   ├── runner.py
│   ├── cli.py
│   ├── config/
│   │   ├── settings.py
│   │   └── logging.yaml
│   ├── core/
│   │   ├── browser.py
│   │   ├── routes.py
│   │   ├── validators.py
│   │   └── retry.py
│   ├── extractors/
│   │   ├── jobs_extractor.py
│   │   ├── companies_extractor.py
│   │   ├── job_detail_parser.py
│   │   ├── company_detail_parser.py
│   │   └── html_to_text.py
│   ├── models/
│   │   ├── job.py
│   │   ├── company.py
│   │   └── common.py
│   ├── normalization/
│   │   ├── salary.py
│   │   ├── tags.py
│   │   ├── dates.py
│   │   └── dedupe.py
│   ├── outputs/
│   │   ├── dataset_writer.py
│   │   ├── jsonl_exporter.py
│   │   └── csv_exporter.py
│   └── utils/
│       ├── urls.py
│       ├── hashing.py
│       └── timers.py
├── tests/
│   ├── test_jobs_parser.py
│   ├── test_company_parser.py
│   └── test_normalization.py
├── data/
│   ├── input.sample.json
│   └── output.sample.json
├── .env.example
├── requirements.txt
├── pyproject.toml
├── README.md
└── LICENSE

Use Cases

Job board operators use it to collect Himalayas job data at scale, so they can publish searchable listings with consistent fields and fewer duplicates.
Recruitment agencies use it to extract company profiles and open roles, so they can enrich leads and speed up outreach.
HR tech teams use it to feed ATS/CRM pipelines, so they can automate sourcing and keep listings fresh with scheduled re-runs.
Market analysts use it to track salaries, skills, and hiring trends, so they can compare demand across roles, regions, and timezones.
SaaS enrichment workflows use it to capture websites and social profiles, so they can build better firmographic datasets for sales ops.

FAQs

Q1) What input should I provide to start extracting data? Provide a single Himalayas search URL for either job search results or the company directory. Use the website filters (role, category, location, timezone) first, then paste the filtered URL so the extractor mirrors your selection.

Q2) Why are some jobs missing salary or some companies missing benefits/tech stacks? Not every listing publishes salary ranges, and some company profiles are incomplete. The extractor returns null/empty values when fields are not available so downstream systems can handle partial enrichment safely.

Q3) How do I avoid duplicates when I run this frequently? Use stable identifiers like guid, slug, and company.slug as primary keys. Store company IDs/slugs and perform upserts instead of inserts. For job updates, use updatedAt and expiryDate to sync changes and remove expired roles.

Q4) Can I extract only companies or only jobs? Yes. Use a company directory/search URL to focus on companies, or a jobs search URL to focus on jobs. If your workflow needs both, run two inputs and join on company.slug or company.internalLink.

Performance Benchmarks and Results

Primary Metric: ~35–70 listings/min on typical search pages, depending on filters and the number of detail pages required per item.

Reliability Metric: 96–99% successful item completion on stable runs, with automatic retries handling intermittent navigation and network blips.

Efficiency Metric: ~250–450 MB RAM average during active extraction, with throughput scaling mainly by concurrent page processing and the number of detail pages visited.

Quality Metric: 85–95% field completeness for core fields (slug, title, URLs, timestamps), while optional enrichment fields (salary, benefits, stacks, socials) vary based on profile completeness.

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Himalayas Job & Company Scraper

Introduction

Built for remote hiring pipelines

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Himalayas Job & Company Scraper

Introduction

Built for remote hiring pipelines

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages