This morning before church, I wrote some code to build a contour plot of schools in New Jersey. The “hot” reddish regions have the best schools and the cooler regions don’t.
Tech Stuff: How it’s made
This wasn’t that easy. Since I had all the code from my previous post, it should have been very straightforward to make another contour map. However, to get this map to close, I had four tech challenges to overcome: the GreatSchools API omitted schools when searching from geo-coördinates, a number of locations didn’t have any schools, and I had to mix python and ruby code in a way that exchanged values.
Fixing GreatSchools Omitted Schools
One of the biggest challenges came from the Great Schools API. While the GS nearby API provides schools within a specified radius of a geographic position, I noticed that schools weren’t showing up within the radius specified. Fortunately, I was familiar with Google’s Directions API, which provides a reverse geocode feature that provides a town name for a given spot. This was far from straightforward since google provides multiple layers of data for each coordinate. On top of that, town names can be locality, postal_town, administrative_area_level_2, administrative_area_level_1. This necessitated the following code from Geoff Boeing:
if (not geocode_data is None) and ('address_components' in geocode_data):
for component in geocode_data['address_components']:
if 'locality' in component['types']:
elif 'postal_town' in component['types']:
elif 'administrative_area_level_2' in component['types']:
elif 'administrative_area_level_1' in component['types']:
Fixing locations with no schools
Additionally, many points were not associated with a school. While bodies of water naturally had no school, the Great Schools API failed to report any school for many locations. Since these data didn’t exist, I populated the matrix with NumPy null values or “NaNs”. These showed up as blank regions on the map and wrecked havoc with the contour lines. To fix this, I interpolated in two dimensions using NumPy’s ma.masked_invalid feature followed by SciPy’s interpolate.griddata capability with cubic interpolation. (Wow, Python has a lot of math available on demand.)
Mixed Python and Ruby
The final challenge was connecting to the Great Schools API. I could connect with Python’s default tools, and parse the result with lxml, but that meant building each request. Fortunately, there was a ruby wrapper ready for use, but I had to call it from Python. Python’s subprocess with check_output did the trick, but due to the fickle nature of the API, I had to add a bunch of exception handling.
I’m hoping to make a map like this for northern Virginia soon.
We are moving to the greater New York City area this summer where Tim joins the leadership team of Colgate-Palmolive. As tempting as it is to spend all our time on Trulia, Estately or Zillow looking at specific houses, we knew that our focus was best spent on understanding the different areas and the trades they presented. I’m an analyst at heart, and always try to do the analysis at the right level of detail. At this stage, this means a map that incorporates (in order) schools, commute times, and lifestyle options. As an advisor to spatial.ai, Tim’s been inspired to create insightful mashups. Maps are pretty much the most excellent example of useful data where one can quickly do analysis without any voicetrack. The right map can serve as a common medium for discussion with friends, realtors and our own understanding as we try to hone in on the right area. With a good contextualized spatial understanding, we can be ready to make the quick decisions that house-hunting presents.
This is why a large number of sites display helpful data geared towards house-hunters. As we started looking at different map-based real estate search engines online, we found different merits to each one but no site gave us the commute, school information and the lifestyle options we care about in one interface. Estately was the most promising. The product side was clearly driven by developers with clean url lookups and clever metrics like walkability. Trulia is the most full featured with some really cool features, like price heatmaps that would be useful if they didn’t have so many blank regions. I enjoy Trulia the most, but it doesn’t have the latest listings.
Zillow has an awesome api but legally can’t provide anything that can be called "bulk data". Redfin’s killer feature is the ability to search by school district. This is pretty much critical, since the school district doesn’t often match the town name and we started falling in love with houses online that we had to give up once we found out it wasn’t in a school district we were ok with.
In Alexandria, we love our house, elementary school, church and community. In order to find the best school system possible, we relied heavily on the rankings assigned by njmonthly.com. Their ranking was a composite of school environment, student performance and student outcomes. These scores were based entirely on data reported by the schools to the state Department of Education (DOE) and published in the School Performance Reports section of the DOE website. You can read more about their methodology at njmonthly.com. We also looked at Great Schools to crosscheck the list. Tim used python, the google geocoding API and google sheets to get geocoordinates for each school. He then was able to pull these into google maps builder and assign a color corresponding to the schools’ rank. While there is a lot more work in the future to better understand the potential at each school, the map below was very helpful for us.
Ok, this is the fun part where Tim’s gets to use his ninja programming skillz. Tim is going to be traveling a lot, but when he is home he will often be in Piscataway, NJ and Manhattan. Nothing online would show the average, maximum or minimum commute times for multiple locations. Additionally, we wanted combined different traffic patterns and the optimal route found by comparing public transit and driving. In order to build this, Tim build a python script that used the google directions api and the associated python library to provide transportation times. He then used matplotlib and basemap to put a grid across the region of interest and then used the contour features to generate contour lines for regions that were 20, 30, 40, 50, 60, and 70 minutes away. This produced lots of plots that helped get a feel of the major transportation routes and how traffic varied by time of day.
Of course, Tim did excursions over time of day and built maps that looked at optimistic- and worst-case scenarios. In the end, it worked best to make each excursion a map layer and to bring in different data sets as we had questions. The most helpful map presented the contour lines made from averaging the best commute from each grid point (in this case a 15 x 15 grid):
How much does commute vary?
The sparkline in each row below shows the commute time for 12 times between 4:30am to 10am each morning. Transit options weren’t possible to Colgate’s technology center, but they generally were to NYC. Commute times below are in minutes. I’m was expecting to see more variance in the commute time. This is either an error in my code or Google really isn’t forecasting commute times based on historical traffic.
West Windsor-Plainsboro South, Mercer
West Windsor-Plainsboro North, Mercer
Haddonfield Memorial, Camden
New Providence, Union
Ridge (Basking Ridge), Somerset
Watchung Hills Regional (Warren), Somerset
Glen Rock, Bergen
Our social structure revolves around our church, then around fitness (CrossFit and Rock Climbing Gyms) and other town-centered options (like shopping at Whole Foods, or a charming downtown). We wanted to live as close to the city as possible, while still able to find a nice home with the right school. The most helpful way to incorporate this information was to build several lists and then use the google geocoding API to get the geocoordinates. From here, it was simple to export to CSV and upload into the mashup. This produced this insanely cool composite map.
Results: Potential Locations
Princeton, Montgomery, West Windsor
We love the downtown, schools and academic atmosphere of Princeton. It is close to cool companies like SRI and major research centers. It also has nice neighborhoods and is very walkable. It has a train to NYC and has good church options. It is much farther from the city than we want to be and the house prices are much higher in Princeton proper when compared with the local area.
Westfield, Milburn, Short Hills, Livingston, Monclair
There was another cluster much closer to the city. We also like the option of attending Redeemer Church of Montclair. However, we hate to give up the university town and high tech feel of the town.
In all, we now look forward to visiting in person and getting a feel for these neighborhoods. I feel like we have a good map that we can update as we get to know the area better. Hats of to Google for making so much accessible through APIs and for making such nice interfaces to view everything with. Open standards just plain rock.
We put this post together to collect our thoughts, share our code and methodology, but also to help the dialogue with our friends. If you have any thoughts on the above, please email us at email@example.com.
Please feel free to use, modify and enjoy the code we wrote for this. Feel free to see and edit our spreadsheet