Seattle Airbnb Overview


This is my first blog post and it gives a general overview of the Seattle Airbnb data from December 2020 to December 2021. This short blog is part of one of the projects in Udacity’s Data Science Nanodegree program. I’ve explored the Airbnb dataset for Seattle. Through this introductory project, I’ve done a quick overview of the Seattle Airbnb Data (here) while using the CRISP-DM process (CRoss Industry Standard Process for Data Mining). This process comprises of six phases that describe the data science life cycle as mentioned below:

  1. Data Understanding: Do we have the necessary data to help us answer the questions? If we do, how can we use it? Does it need cleaning?
  2. Data Preparation: Clean the datasets, merge them if needed and make them ready for analysis
  3. Modeling: What modeling techniques can we apply?
  4. Evaluation: Which model will give us an accurate picture of the business outcomes?
  5. Deployment: How can we present this to leadership? How can they access it?

Business Understanding

Before going through the datasets, I wanted to ask the following questions:

  1. Which are the most expensive and cheapest neighborhoods in Seattle?
  2. What time of the year is the busiest and most expensive in Seattle?
  3. Is there any correlation between the number of listings and the average price of the listings in a particular area?

Data Understanding

The Airbnb data for Seattle contains the following datasets:

  1. listings.csv: Detailed Listings data for Seattle
  2. reviews.csv: Detailed Review Data for listings in Seattle

Data Preparation

To prepare the data for analysis, I used to clean the calendar.csv and listings.csv datasets and converted them into dataframes. I decided to write a function to clean the datasets so I can use them again later for a deep-dive

Data Analysis

Although this stage is technically called Modeling, I decided to rename it since I’m not using a particular model to predict something. Instead I’m just doing some exploratory data analysis to find the answers to my questions.

Distribution of Room Type
Price comparison across neighborhoods
Price comparison across months
Correlation Matrix