Seattle Airbnb Overview

4 min readApr 20, 2021

Introduction

This is my first blog post and it gives a general overview of the Seattle Airbnb data from December 2020 to December 2021. This short blog is part of one of the projects in Udacity’s Data Science Nanodegree program. I’ve explored the Airbnb dataset for Seattle. Through this introductory project, I’ve done a quick overview of the Seattle Airbnb Data (here) while using the CRISP-DM process (CRoss Industry Standard Process for Data Mining). This process comprises of six phases that describe the data science life cycle as mentioned below:

Business Understanding: What is the business need to do the analysis? What questions are we trying to answer?
Data Understanding: Do we have the necessary data to help us answer the questions? If we do, how can we use it? Does it need cleaning?
Data Preparation: Clean the datasets, merge them if needed and make them ready for analysis
Modeling: What modeling techniques can we apply?
Evaluation: Which model will give us an accurate picture of the business outcomes?
Deployment: How can we present this to leadership? How can they access it?

Business Understanding

Before going through the datasets, I wanted to ask the following questions:

What is the distribution of rental type in Seattle?
Which are the most expensive and cheapest neighborhoods in Seattle?
What time of the year is the busiest and most expensive in Seattle?
Is there any correlation between the number of listings and the average price of the listings in a particular area?

Data Understanding

The Airbnb data for Seattle contains the following datasets:

calendar.csv: Detailed Calendar Data for listings in Seattle
listings.csv: Detailed Listings data for Seattle
reviews.csv: Detailed Review Data for listings in Seattle

While exploring these datasets, I realized the calendar.csv and listings.csv datasets needed some cleaning before I could do the analysis. reviews.csv dataset didn’t have any valuable information so I decided to leave it out for this project.

Data Preparation

To prepare the data for analysis, I used to clean the calendar.csv and listings.csv datasets and converted them into dataframes. I decided to write a function to clean the datasets so I can use them again later for a deep-dive

Data Analysis

Although this stage is technically called Modeling, I decided to rename it since I’m not using a particular model to predict something. Instead I’m just doing some exploratory data analysis to find the answers to my questions.

What is the distribution of rental type in Seattle?

Entire home/apt is the most popular type of room listing (3,344 (81.4%) out of 4,107 listings fall under this category).

2. What is the most expensive and the cheapest neighborhood in Seattle?

Magnolia is the most expensive neighborhood with the average price per night per listing around 186. Interbay is the least expensive neighborhood with the average price per night per listing around 85.

3. What time of the year is most expensive in Seattle?

June, July and August seem to be the most expensive period to visit Seattle. This makes sense since this is right between Summer and Fall when the days are the longest in Seattle and it does’t get dark around 3pm like it does during the winter months (Jan — Mar)

4. Is there any correlation between the number of listings and the average price of the listings in a particular area?

Below is the correlation matrix between the number of listings and the average listing price. As you can see, there’s a slight positive correlation with the number of listings and the average price of the listings in a particular area.

Since we didn’t go through the modeling route, we’ll skip the evaluation and the deployment phases.

In fact, I’m using this blog to give a quick overview of the datasets. This is just a start and I hope to do an in-depth analysis and eventually predict the price of an Airbnb given some information about its features (neighborhood, occupancy, reviews, etc.)

The Jupyter notebook of the analysis is here as a reference.