Werner's Blog — Opinion, Analysis, Commentary
Area calculations with longitude and latitude

Many geographic databases come as boundary files with longitude and latitude. Is is then up to the user to project these suitable into rectangular coordinates with "northing" and "easting". But sometimes this is rather cumbersome when all one wants to do is measure the size of areas. I have encountered such a problem when I needed to measure the size of postal Forward Sortation Areas (FSAs) in Canada. Statistics Canada provides a digital boundary file free of charge on their web site in a number of different formats.

Canadian postal codes are made up of six alphanumeric letters, such as V6T 1Z2. The first three characters are the Forward Sortation Area (FSA). FSAs vary hugely in size but tend to cover comparable number of people. For Canada Post, FSAs are useful delineations of service areas.

With SAS, it is quite easy to read in the ArcGIS version of the boundary file and extract the polygon sequences, which are sometimes several segments for any given FSA. The SAS code below shows how to process the data, which is identified in the "geofsa" output data set by the FSA code (CFSAUID), the province number (PRUID) and the province name (PRNAME). I find it more convenient to work with two-letter province codes (e.g., ON for Ontario), and thus the first step processing the data was to convert province numbers to province codes, and renaming some of the variables more conveniently.

The critical processing step is the fsa_area data step. Here, we need to be careful to process one FSA segment at a time. For that, some house keeping is necessary to determine the boundaries of segments. Also, we need to keep a memory of the last set of longitudes and latitudes so that we can calculate partial areas that combine two longitudes and two latitudes, points (x0, y0) and (x1, y1). This necessitates a little bit of juggling. When we encounter the first data row in a new segment, we first check if we have to finish up a previous segment. If yes, we close the segment by using the saved location of the first point in the segment. We calculate the size of the area and output it. Then we initialize the memory-keeping variables and save the first point as (x9, y9). Next we run through all points in the segment and calculate the partial areas, and when done save the current data point (x1, y1) to (x0, y0). SAS applies trigonometric functions to values expressed in radians, so all longitudes and latitudes must be converted from degrees to radians.

The heart of the calculation happens in the macro "NEXTPOINT". It is useful to put the code in a macro statement because it is used twice in the data step. This is because we have to skip the first data row in the calculation and we have to use NEXTPOINT twice in the last data row. The area calculation are based on a spherical polygon, which is a generalization of the spherical triangle. There are several implementation of suitable algorithms available, and the one shown here make use of haversine functions.

The last data step takes the sorted output from fsa_area and tries to make sense of the data by classifying segments either as as urban or rural. Without looking at a map or using data on population densities, this is actually quite tricky and a bit of a judgment call. I treat all FSAs as rural if they have a zero in the middle, which identifies FSAs with rural delivery by Canada Post. That is easy. However, sometimes FSAs contain both urban and rural segments. For example, V6T encompasses the University of British Columbia but also rural areas that are many kilometers away. For that reason, large areas (over 50 square kilometers) are assumed to be rural in nature. Essentially, segment size corresponds inversely to population density. Some FSAs are huge, such as Y0B in the Yukon: more than 312,000 square kilometers. On the other hand, many urban FSAs are less than one square kilometre in size.

Perhaps you are not interested at all in how to calculate areas but you are just keen on seeing the output or using it for your own research. In that case I have Stata dictionary file for you: fsa_area_2011.dct. Please feel free to download and use this file. The file has columns for the two-letter province code and three-letter FSA code, and numeric columns for the total FSA area, urban area, and rural area. If you have suggestions on how to improve the urban/rural designation, I encourage you to contact me.

*--------------------------------------------------------------+ | Canadian Postal Forward Sortation Area (FSA) | | Cartographic boundary file from Canada 2011 census | | http://www12.statcan.gc.ca/census-recensement/2011/geo/ | | bound-limit/bound-limit-2011-eng.cfm | | Calculation of spatial extent | +--------------------------------------------------------------; options nocenter ls=80; libname here '.'; proc mapimport out=geofsa datafile='gfsa000b11a_e.shp'; id CFSAUID PRUID PRNAME; run; data geofsa; set geofsa; length province $2; if pruid='10' then province='NL'; else if pruid='11' then province='PE'; else if pruid='12' then province='NS'; else if pruid='13' then province='NB'; else if pruid='24' then province='QC'; else if pruid='35' then province='ON'; else if pruid='46' then province='MB'; else if pruid='47' then province='SK'; else if pruid='48' then province='AB'; else if pruid='59' then province='BC'; else if pruid='60' then province='YT'; else if pruid='61' then province='NT'; else if pruid='62' then province='NU'; keep cfsauid province segment x y; rename x=longitude y=latitude; run; %LET ra=6378.137; %LET rb=6356.752; %LET half_pi=(constant('PI')/2); %LET radians=(180/constant('PI')); %MACRO NEXTPOINT; cos0=cos(y0); cos1=cos(y1); if x0^=y0 then do; h = ((1.0-cos(y1-y0))/2)+cos0*cos1*((1.0-cos(x1-x0))/2); a = 2*arsin(sqrt(h)); b = &half_pi - y1; c = &half_pi - y0; s = (a+b+c)/2; t = tan(s/2)*tan((s-a)/2)*tan((s-b)/2)*tan((s-c)/2); u = abs(4*atan(sqrt(abs(t)))); if x1<x0 then u=-u; sum=sum+u; end; %MEND; data fsa_area; set geofsa end=verylast; where latitude^=. and longitude^=.; length fsa last_fsa $3; length prv last_province $2; retain last_fsa; retain last_segment; retain last_province; retain x0 y0 x1 y1 x9 y9 sum; if _N_=1 then do; last_fsa='Z0Z'; last_segment=(-1); sum=.; end; if verylast or cfsauid^=last_fsa or segment^=last_segment then do; if sum^=. then do; x1=x9; y1=y9; %NEXTPOINT; FSA_area=abs(sum)*(&ra)*(&rb); prv=last_province; fsa=last_fsa; fsa_part=last_segment; *put FSA $3. ' ' fsa_part 4.0 ' ' FSA_area 10.3; output; end; x0=longitude/&radians; y0=latitude/&radians; x9=x0; y9=y0; sum=0; end; else do; x1=longitude/&radians; y1=latitude/&radians; %NEXTPOINT; x0=x1; y0=y1; end; last_segment=segment; last_fsa=cfsauid; last_province=province; keep PRV FSA FSA_part FSA_area; label PRV='Province' FSA='Forward Sortation Area' FSA_part='FSA Segment/Part' FSA_area='FSA Area [sq.km.]'; proc sort data=fsa_area; by PRV FSA FSA_part; data here.fsa_area; set fsa_area; by PRV FSA FSA_part; retain FSA_urban FSA_rural FSA_total; if first.FSA then do; FSA_urban=0; FSA_rural=0; FSA_total=0; end; FSA_total=FSA_total+FSA_area; if substr(FSA,2,1)='0' or FSA_area<50 then FSA_rural=FSA_rural+FSA_area; else FSA_urban=FSA_urban+FSA_area; if last.FSA then output; keep PRV FSA FSA_total FSA_rural FSA_urban; label FSA_total='Total area of FSA [sq.km.]' FSA_urban='Urban area of FSA [sq.km.]' FSA_rural='Rural area of FSA [sq.km.]'; run;
Posted on Wednesday, September 30, 2015 at 11:30 — #Econometrics
[print]
© 2024  Prof. Werner Antweiler, University of British Columbia.
[Sauder School of Business] [The University of British Columbia]