The project has three main business goals:
- Maximize yearly profit for a specific Airbnb listing through optimization.
- Create a dynamic pricing tool that adjusts daily prices to remain competitive with local listings.
- Design a repeatable model that can be expanded to other locations.
To achieve these, three analytics goals were established:
- Determine demand in relation to price using k-means clustering of similar properties.
- Predict the intrinsic value of a property using kNN regression based on its attributes.
- Optimize daily pricing through a model that maximizes yearly profit by analyzing demand, property attributes, and revenue vs. costs.
Overall Model:
- Competing listings are other Airbnbs, not hotels, assuming customers have already chosen Airbnb.
- Prices vary by month, weekday, weekend, and special events but remain constant within those periods, so the data isn’t time-sensitive.
- No partial rentals; a listing can only be rented by one party at a time.
- Only one reservation per rental period, following Airbnb’s policy.
- Sellers are assumed to already own the property and have sufficient data for the model.
Variables:
- Parking capacity matches the property’s capacity, so excess parking needs aren’t considered.
- Costs increase by $X per person for parties larger than two.
- Monthly upkeep includes mortgage, utilities, cleaning, HOA fees, etc., as a single cost estimate.
Airbnb Listing information http://insideairbnb.com/get-the-data.html
An additional dataset that is from the San Francisco public safety database for all reported police department incidents to model safety scores https://data.sfgov.org/Public-Safety/Police-Department-Incidents/tmnf-yvry
Our project focuses on determining the optimal daily listing price to maximize long-term profit. Initially, we debated between time series regression and optimization modeling. Time series regression captures daily price fluctuations and factors like property type and amenities but doesn’t guarantee an optimal profit-maximizing price. Optimization is ideal for finding the best price but becomes complex when accounting for 365 days and property characteristics. We combined both approaches by using regression in variable creation and simplifying the model with assumptions.
The system architecture separates data from two sources: a GeoJSON file for listing locations and historical data on facility details and bookings. We used clustering analysis to group similar listings within neighborhoods, identified competing properties, and performed regression analysis to create a demand function. Using kNN regression, we established a baseline price for each listing, which was incorporated into the optimization model to generate the best daily price based on booking probabilities.
Based on Airbnb’s report for San Francisco, we assumed an average stay of 4.2 nights per booking. Since the guest review rate wasn't available, we assumed it to be 0.5. Additionally, we capped the maximum occupancy rate at 0.95 to account for occasional unrented nights. These assumptions were used to formulate the estimated occupancy rate.
Initially, we used separate models to predict listing price and demand. One model used k-Means clustering to group nearby similar properties, estimating monthly demand from the average occupancy rate. Another kNN regression model predicted daily listing price based on property attributes like amenities and safety. Monthly profit was estimated by multiplying the average price by predicted demand.
However, since demand and price are strongly correlated, we combined them into a single model. We prioritized modeling demand first and then incorporated it into the price function, accounting for competitor factors, listing characteristics, and time fluctuations.
We model the competitor factor by using the same approach that we initially planned to apply for our monthly demand estimation. Customers tends to choose their Airbnb with a specific location in mind, so all listings that are located close together (within 3-mile radius) will be more likely to compete with each other. Going beyond this, the characteristics and quality of the listing should have an almost equal, and occasionally greater impact on determining occupancy rates relative to those properties competing with one another. From these guides we elected to use k-Means to cluster similar properties within a 3-mile radius distance.
For each cluster, we have a set of data with the X-variable being the listing price and the Y-variable being the demand represented by the occupancy rate. From this we will fit either a linear or polynomial regression model onto this dataset to find the best-fitted function of demand
We then put this demand function into the optimization model. The objective is to maximize profit in one year, so the formula to calculate profit for each day is the listing price on that day multiplied by the demand function on that day, represented as
We first set out to incorporate the property-specific characteristics into the model to customize the pricing model for each listing; as mentioned previously if we include too many variables into an optimization, we are less likely to have a valid solution. Instead we used our regression model to determine a “baseline” price, which we define as a price mark that indicates the intrinsic value of an Airbnb property without considering time series fluctuation (seasonalities, weekend vs weekday). Then in the optimization model we display the daily price
The optimization model will find the optimal coefficient ij, and consequently the optimal daily listing price xij.
Additionally we used the k Nearest Neighbor (kNN) regression to determine the most accurate fitted baseline price for each listing, based on attributes that we decided will determine the value of a listing such as location, amenities, review scores, safety, etc. The reason we selected the kNN algorithm is because it determines the response variable Y based on the values of X-variables from k neighbors, which in our case is the values of nearby Airbnb competitors. This is a good model fit for our data because in real life, it is usually the case that real estate, hotel, and Airbnb values are heavily influenced by their surrounding competitors as a function of their location.
In attempting to find the daily optimal price, we determined that our optimization model will have up to 365 price variables x1,..., x365. We originally simplified the model by assuming that price is relatively consistent throughout one month. However when further analyzing past data, we observed that Airbnb’s demand is generally higher during weekend than weekday, so we further delineated our model into two separate optimization models for both weekends and weekdays. From this we arrived at our model’s assumptions that listing price should be constant for all weekends within a month and for all weekdays within a month.
- ij: binary variable which is 1 if customers decide to book Airbnb property on day j of month i, equal to 0 otherwise
-
$$x_{ij}$$ : price of Airbnb property on day j of month i Parameters: -
$$C_V$$ is variable cost that includes cleaning fee, utilities, guest-included fee, deterioration fee on amenities. For simplicity of our model, we will let users independently input$$C_V$$ to the application, allowing us to treat$$C_V$$ as a constant. Moreover, we assume$$C_V$$ will be incurred only on the days where bookings happened. -
$$I_O$$ is initial investment the property owner supplies, which potentially includes the real estate or reimbursement costs, amenity purchases, maintenance cost or any other upfront cost. For simplicity of our model, we will let users input the estimate initial investment on their property, allowing us to treat this as a fixed cost throughout our model. -
$$D_{ij}$$ : demand of day j on month i, represented by the probability that the listing is actually booked on a specific day, given the listing price xij.$$D_{ij}$$ is the function of$$x_{ij}$$ and is determined by performing regression analysis on the cluster that the specific Airbnb property belongs to.
At this point we had constructed the components necessary for us to begin formulating our final optimization model. As defined above, our demand function will indicate the probability that the property is booked on day i, month j; if this “booked” probability is bigger than 50%, then we assume the property is marked as booked in our model on day i month j. This then allows the host to collect the optimal listing price,
If occupancy of that month is 45%, then the expected number of days booked in that month is 45% * 30 days = 14 days. We can then retrieve the exact date of that month where j = 1 or D(xj) > 50%, and sum up their corresponding predicted prices to get the expected profit of that month.
The objective function is to maximize yearly profit, which equals to the revenue subtracted from the cost. The revenue is the sum of optimal listings prices on the days that bookings happen, or in the words, where ij = 1. We take the summation of both weekend and weekday (determined by index j) and of 12 months (determined by index i).
We tested two example listings in our optimization model:
-
Boutique Hotel in Bayview: Classified as a "high-end" listing based on high safety (score > 7) and luxury amenities. The kNN regression predicted an intrinsic value of $132. The model provided 24 price recommendations for weekdays and weekends over 12 months. The projected yearly profit for this listing was $25,378.
-
Bed and Breakfast in Chinatown: This "medium-category" listing had lower safety and amenity scores, with lower demand despite a lower price. Following the same process, the expected yearly profit for this listing was $19,764.
-
Issue 1: Users must fill in all listing attributes, including optional ones, as the regression model requires them. Missing attributes will cause the model to malfunction.
- Monitoring: A script will check new listings for missing attributes and notify users if any fields are left empty.
-
Issue 2: The model assumes uniform weekday and weekend pricing within a month, which may not reflect fluctuating demand throughout the week.
- Monitoring: Monitor air traffic to San Francisco as an indicator of travel flow. If flight numbers fluctuate significantly (using a moving average benchmark), the system will send a notification.
-
Issue 1: Since missing attributes are optional, we can auto-fill them using the most common values for categorical variables and median values for numerical ones from the Airbnb dataset. For example, if reviews are missing, we will autofill with the median review score from other listings.
-
Issue 2: If a notification indicates significant demand fluctuations (based on travel data), the engineering team will analyze the trend. If notifications become frequent, we could switch from a clustering-based model to a time-series regression model to better predict daily demand based on listing characteristics.
When comparing our two examples, we found that despite price differences and cluster variations, both listings had a similar demand (around 50%). This indicates a stable customer base for each market segment: some customers seek high-end, luxury Airbnbs, while others look for affordable options. Understanding where your property fits within the market is crucial for pricing. For high-end listings, rather than lowering prices to attract more customers, it’s better to invest in amenities to appeal to the right segment and avoid reducing profit margins.
There are several future directions for this project. Adding new variables, such as seasonality, will improve accuracy, helping predict demand and pricing more precisely on a weekly basis. Expanding to new cities will require adapting variables and gathering additional data, but the methodology remains the same. Over time, our models will improve with more historical data and insights into successes and failures.
Engaging customers will also be key. Customer feedback through reviews helps identify areas for improvement and boosts property credibility. Additionally, targeted advertising can help us reach the right audience, giving us a competitive edge.