Congressional District GIS Exploration

MathJax.Hub.Config({tex2jax: {inlineMath: [[‘$’,’$’], [‘(‘,’)’]]}});

I wanted to be able to do this with my ip address:

I do development and consulting in two areas I really love: fitness and political engagement. For some work I’m doing on the latter, I needed to figure out the latitude and longitude of the geographic center for each of the 437 US congressional districts in order to display the districts using the google maps API. “GovTrack”:http://www.govtrack.us/congress/findyourreps.xpd has done the hard work to get the data from the census to KML and as a WMS service so congressional districts can be displayed on Google maps, but there was no source online to get the geographic centers for these regions and google maps requires a zoom level and geographic center for the map.

The first question I had to face, was what a geographic centroid really is. Matlab has a “meanm”:http://www.mathworks.com/help/toolbox/map/ref/meanm.html function, which can find the center of a number of points on an arbitrary geodetic ellipsoid (e.g. WGS84), but I really felt the point should be inside the district. My first thought was to write a quick genetic algorithm that found the point that minimized the average distance to the exterior of the district, but it took forever to run and I was on ground that smelled as if smarter folks had trod here. This suspicion led me to look into some research done by the Royal Melbourne Institute of Technology which cataloged how to compute geographic centroids a number of different ways, the most relevant being the @moment@ @centroid@. This took a good bit of time to play with an implement, but I’ll refer the discussion to their paper. The end result was that *meanm* was good enough and the genetic algorithm was canned. (Email me if you want the code.)

From their paper:
The New Shorter Oxford English Dictionary (SOED 1993) defines the centroid as: “A point defined in relation to a given figure in a manner analogous to the centre of mass of a corresponding body.” Using this definition, and regarding the body as a plane area A of uniformly thin material, its centroid is
$$begin{matrix} {bar y} = frac{M_y}{A} = frac{M_x}{A} end{matrix}$$
are (first) moments with respect to the x- and y-axes respectively.
[The moment $M_L$ of a plane area with respect to a line L is the product of the area and the perpendicular distance of its centroid from the line.] The centroid computed using this method has a physical characteristic that is intuitively reassuring. That is, if we cut out a shape from uniformly thin material (say thin cardboard) and suspend it freely on a string connected to its centroid, the shape will lie horizontal in the earth’s gravity field.

I’m sharing a bit of my code, in the hope that some of the other folks in the OpenGovernment movement will find these datapoints useful. The first challenge was to find the (very poorly documented) “source data”:http://www.census.gov/geo/www/cob/cd110.html and get it into Matlab. The source code below is fairly straightforward: some data conditioning then find the geographic centers based on *meanm*. The moment centroid was interesting, but took too long to compute for my timeline, and the trival solution seems to be working well, the remaining challenge was to work with the data, something my day job has me practice a good bit.

One of the biggest challenges was to get the US census data into a standard two letter postal abbreviation format, the first part of my primary key (XXYY), where XX is the state and YY is the district number. The census had an arbitrary number assigned to each state without any documentation that I could find, so I had to resort to using regular expressions with ruby on the file names to pull out the data which would create a nice Matlab cell array, so I could then get a good ruby hash of the centroids from the Matlab script. Those two codes follow for any other data conditioners out there who might find them useful. I had to two challenges that required fancy regular expressions: to get the census number to US state, then to get the full state name to the two letter code.

And, in conclusion, I provide the geographic centers of all 437 congressional districts:

So this is really pretty boring, but now that I am working with this dataset, I hope to have some future posts on interesting GIS computations which combine demographics with members voting patterns. Please let me know if there are any questions or interesting ideas for exploration.