The Influence of Neighborhood Characteristics on Property Sales

Tl;dr: Designed the Influence of Neighborhood Characteristics on Property Sales, a data visualization that allows a user to discover correlations between a neighborhood's characteristics (schools, retail locations, etc...) and property sales.
Dates: November 2017 - December 2017
Team: Scott Dombkowski
Work Type: academic

The Influence of Neighborhood Characteristics on Property Sales allows a user to discover correlations between a neighborhood's characteristics (schools, retail locations, etc...) and the property sales in that neighborhood.

Initially, I thought this visualization would utilize a cartesian and geographic coordinate system. After thinking about alternative ways to visualize how features of a neighborhood affect property value, I ended up with a sizeable departure from what I was initially thinking.

Early Concept

A user enters the visualization. They see a grid of empty city blocks in the shape of Pittsburgh. (Note: This image is a very zoomed in version of what I was thinking).

Neighborhoods would come into view. Neighborhood boundaries would be depicted with roads colored a different hue from a regular road. (Note: Two important things to point out are that no square is part of two neighborhoods and the # of squares a neighborhood takes up is relative to its actual size).

Property values would come into view and be depicted by color. (Note: In the image below, red depicts higher value, while blue represents lower value.)

A user would have the ability to select from five different features of a neighborhood. These include: retail, recreation, education, transportation, and medical. (Note: Each feature has its own unique pattern and its own unique size to represent the space each of these features takes up. For example, retail takes up 1/4 of a square, while recreation takes up a whole square.)

Users would be able to see the correlation between retail and property value by the # of cars on the road. For example, if there are lots of cars, there is a high correlation. (Note: This is meant to represent that the high number of retail locations is in fact a reason for high property values thus the large amount of cars and foot traffic in the area. An area would be overvalued if it was colored red, had high traffic, and very little retail. A neighborhood with high traffic, a high # of features, and low value would represent an underpriced area.)

Users would be able to select multiple features to see correlations between all of them and property value. Each of these features would have its own unique size and pattern.

Final Refined Concept

I further refined the concept into the images and prototypes below.

Data Sources

To achieve this, I pulled data from seven different datasets: property sales, neighborhoods, bus stops, schools (specifically public), hospitals, parks, and retail businesses.

Property Sales Data Source

Neighborhood Data Source

I looked solely at quantity (how many of these different types of places exist). I acknowledge that I am missing some data. For example, I only have public schools and no private schools. I also have no measure of how effective a location is (for example: how good a restaurant or high school is).

Cleaning of Sources

To work with this data, I had to do some cleaning.

As you can see by the above screenshot, the data was messy. Most of the data points did not have coordinates. What I did have was addresses. In order to organize the data by neighborhood, I geocoded around 10,000 addresses and created a script to determine the geographic polygon those coordinates were located in. This allowed me to determine the neighborhood a home, place of business, or park belongs to.

Opening the Visualization

This is an example of what you would see when the data visualization is opened. The visualization is built on a grid meant to resemble city blocks and streets. I separated the different blocks into neighborhoods, based on what neighborhood the majority of a block fit on. Once the sequence is finished a user is able to see a tutorial that they can access at any time.

Data Visualization Startup

Property/House Prices

House prices are color coded, based on how many standard deviations they are away from the mean, assuming a normal distribution.

Price Color Coding

Characteristics of a Neighborhood

Characteristics of a neighborhood are depicted through patterns. I tried to create patterns that are easily discernible from each other, but at the same time compliment one and other and are abstract.

I worked off a 3 by 3 line pattern (similar to generative logo), so if additional characteristics were added a new pattern could be made.

Characteristic Patterns

These characteristics were placed by neighborhood, allowing you to see the general locations of where these specific characteristics are located.

Characteristic Placement

To determine how many of these patterns to put on the map, I took the average of these characteristics, assuming that the average would represent the characteristics of an optimal neighborhood.

Characteristic Placements and Numbers

This resulted in 1 health location being equivalent to 273 retail locations. So, if a neighborhood had 1 health location and 273 retail locations, they would get 1 health location pattern and 1 retail location pattern.

You can see how this all comes together below. You can also see how users have the ability to zoom in and out and utilize the available mini-map.

Data Visualization Zoom

Correlation

I measured the linear dependence between these variables and home prices and again assumed a normal distribution.

I got a p-value, which allowed me to determine how significant the relationship is between variables. The higher that value, the greater the correlation and if the p-value was less than .1, there was virtually no correlation at all.

I translated the p-values to what you see below. As you can see there is a low correlation between health locations and house prices, but a high correlation between retail and house prices.

I took the average of the same p-values to determine the correlation between house price and more than one characteristic. So, the correlation between health locations, retail locations, and property value would be what I classify as mid-range.

You can see what a high correlation would look like below.

Data Visualization High Correlation

The lines are meant to represent cars, again playing off the idea that if there is a high correlation, people recognize it and want to be involved thus they are willing to deal with traffic.

Whereas this would be a situation with a medium to low correlation (less cars, less traffic on the road).

Data Visualization Low Correlation

Different Combinations of Characteristics

You can also click different combinations of characteristics and see how that affects the visualization.

Clicking Characteristics On and Off

Neighborhood Depiction

I decided to get rid of neighborhoods' identifications entirely. I could not find a way to visualize where one neighborhood ends and another starts, without interfering with other parts of my visualization.

By doing this, I allow for deeper exploration of my visualization. If I were to have the different neighborhoods' boundaries visualized, a user may go directly to their neighborhood and not explore the different parts of the chart. By not having the neighborhood boundaries visualized, this becomes much more difficult.

Highlighting a Neighborhood

You can highlight a neighborhood to focus on that specific neighborhood. You will also see that neighborhood's name in the top left. While a user does not see the neighborhood name beforehand, highlighting allows them to clarify their exploration.

Highlighting a Neighborhood

Side-by-Side

Lastly, you can go into side by side mode to compare two different neighborhoods or the same neighborhood with different characteristics turned on.

Side by Side Mode

Reflection

One area I struggled with while creating this visualization was the depiction of characteristics. I settled on a generative logo'ish pattern, it was noted that it could be improved if I found patterns that were more distinct and rooted in what they were representing.

I gave the patterns another try and came up with what you see below. Instead of a 3 by 3 pattern, these patterns were based off a 5 by 5 pattern.

After receiving this feedback, I had a conversation with Gray Crawford. He mentioned that I could play with color to represent the different characteristics. Therefore, patterns could be more easily discernible. I tried this out, you can see the result below.

Visualization with Characteristics in Different Colors