Census Exercise
What this exercise covers:
- Classify data field types for CSV
- Import CSV into QGIS
- Import vector layer
- Join CSV to vector layer
- Save vector layer with joined CSV
- Field calculator
- Symbolize vector
Data
Data for this exercise is located at iPlant Datastore: Community Data/aegis/Spatial-bootcamp/Exercise_2/
- home-values.zip
- housing-units.zip
- tl_2014_53_tracts.zip
Procedures
- Download Exercise_2 data and unpack each zip file:
- Import Census Tracts shapefile tl_2014_53_tract.shp .
- Rename tl_2014_53_tract to census-tracts:
Right-click layer > Rename > census-tracts - Open the census-tracts Attribute table and locate a column that could possibly be unique to each Census tract used for joining (HINT: it’s GEOID):
Right-click census-tracts (layer list) > Attribute table
Close the Attribute table once you’ve located the primary key.
- Navigate to your Exercise_2 directory and open housing-units.csv and inspect the data for columns to use for our joins (primary key). HINT: it’s GEO.id2.
Reminder: We can only join data (tabular join) with exact same columns. See the tables below:
- Clean the data. For the purposes of this exercise, we do not need the removed data. Our objective is to join data and visualize (with minimal data wrangling). In LibreOffice, highlight the row/column, right-click and Delete Selected Columns. Excel should be similar.
For housing-units.csv:- Remove the second row containing the annotations (subheader).
- Delete the columns: GEO.id, GEO.display-lable, and HD02_VD01.
- Your file should look like the image below:
- Rename header HD01_VD01 to TotalUnitsHome. Save your document, be sure it’s stored as a Text CSV . Excel users may have to select Text CSV from a drop-down list. Renaming the headers reduces confusion down the road. Also, notice how there are no spaces in the header name. Removing (or not having) spaces is in the top 10 list of best naming convention practices (on all computer systems.. everywhere in the digital world.. you should never have spaces in your file names or directories). Close the document.
- Do the same for home-values.csv:
- Navigate to document
- Delete second row with annotations (subheader)
- Delete columns: GEO.id, GEO.display-label, HD02_VD01
- Rename HD01_VD01 to MedianValueHome
- Save your document in Text CSV format
- You’re not quite finished with wrangling your data yet. If you have a quick look at your home-values.csv, you’ll notice there are some invalid fields - number fields that contain non-numeric values. Lines 501 and 502 read 1,000,000+ and should be changed to 1000000.
Aslo, there are some median home value records with ‘-‘ and if you read your metadata you would know that:
An ‘-‘ entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or ratio of medians cannot be calculated because of one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.
Let’s cleanse this now:
LibreOffice and Excel users: Find and Replace ‘-‘ (without single quotes) with NULL
Also, after making these changes the data type of the incorrect records have not changed - they’re still string/text fields in a numeric universe (only in LibreOffice/Excel, however). We will instruct QGIS on how to read the column data types with a CSVT. Be sure to save your home-values.csv in Text CSV format . - Just wait right there, one does not simply join a string (tract’s GEOID) to a number (housing data’s GEO.id2).
We now have to create our CSVTs (one for each home-values.csv and housing-units.csv). Think of a CSVT as a ‘comma-separated value data type’ file. As a reminder, these indicate to QGIS the data types of each column of repsective CSVs.
Our goal here is to join GEOID from tracts (shapefile) to GEO.id2 from home-values.csv and housing-units.csv.
Go back into QGIS and open the census-tracts properties and navigate to Fields. Locate GEOID in the table and notice its Type name - String (see image below). With less wrangling we can make GEO.id2 a string for joining purposes. Close the properties window.
The following steps outline how to create the CSVTs.
- Open a text editr (gedit, VIM, notepad, notepad++) and enter the following:
"String","Real(12.0)"
What this means:
There are two columns in this file separated by a common. The first column indicates string and the second is a real number with length of 12 and precision of 0. While preparing this exercise, QGIS didn’t like multiplying integers (Field Calculator results with NULL), so we’re using real number type. 0 precision removes unneccessary zeros after the decimal from real number types, also there are no fractions in the datasets.
- Save this file as home-values.csvt within the same directory as home-values.csv
- Save this file as home-values.csvt within the same directory as home-values.csv
- Luckily, we wrangled both CSVs into the same format, so copy home-values.csvt and paste it into the same directory as housing-units.csv. Rename this pasted file as housing-units.csvt. Data types, lengths, and precisions will be the same for both files.
- Import CSVs into QGIS.
Don’t move just yet, there a little quirk about importing CSVs with CSVTs into QGIS. You cannot use the Add Delimited Text Layer tool, this does not recognize the CSVT precision and length.
The best way to import with CSVTs is the drag-and-drop method.
Drag-and-drop your housing-units.csv and home-values.csv into QGIS. Don’t touch the associated CSVTs, QGIS reads these during the import process, just don’t move these files. Just as we keep all shapefile files in one directory.
- Your workspace should look like the image below. Save your workspace now.
- Now we are able to join our Census tabular data to the tracts shapefile. Join data by opening the Properties of the census-tracts layer and navigating to the Joins tab.
- Start the first join by clicking the green plus sign . We’re joining home-values and housing-units to census-tracts with:
Join field: GEO.id2
Target field: GEOID
Cache join layer in virtual memory: (checked)
Apply and OK onced both joined
- Open census-tracts Attribute Table and locate the MeanValueHome and TotalUnitsHome fields. Notice they’ve been renamed and cut-down. We can still decrypt these headings so we’ll continue on. Close the attributes table.
- You have joined the data but it’s currently being stored in the virtual memory, so we need to save census-tracts (with joined data) as a new shapefile:
Right-click shapefile > Save As
Format: ESRI Shapefile
Save as: tracts-values-units (save in your project directory)
Add saved file to map: (checked)
- We need to convert home-value to thousands of dollars to prevent integer overflow:
- Open tracts-value-units attributes table > Toggle editing mode > Field Calculator
- Calculate a new field:
Create new field: (checked)
Output field name: HmValThnds
Output field type: Whole number (integer)
Output field width: 10
Expression: “home-value” / 1000
- Open tracts-value-units attributes table > Toggle editing mode > Field Calculator
- Let’s now calculate total housing value per census tract (median home value X total housing units).
- Configure Field Calculator inputs as follows:
Create a new field: (checked)
Output field name: TotalValue
Output field type: Decimal number (real)
Output field width: 20
Precision: 0
Expression: “HmValThnds” * “housing-un”
- Configure Field Calculator inputs as follows:
- Locate TotalValue and confirm the calculation - total home value per Census tract in thoudsands of dollars.
NOTE: you should still have NULL values given that home-values.csv contains NULL values.
Be sure to Save edits and disable editing mode (Toggle editing mode) . The editing options should now be grayed-out and disabled.
- You’ve successfully joined two sets of Census tabular data to a shapefile.
- Symbolize tracts-values-units by categorizing by TotalValue and select a color ramp.
Open style properties: Right-click tract-values-units (layers list) > Properties > Style
Style by: Categorized
Column: TotalValue
Color ramp: (select ramp)
Classify - “Classification would yield 1422 entries which might not be expected. Continue?” Yes
QGIS will inform you that you have a high number of classes, click OK to confirm the categorization.
We’ll need to remove NULL values. Locate NULL within the category table (HINT: it contains no values and is blank - look towards the bottom), highlight and click ‘Delete’ (not Delete all!) - ‘Delete’ removes only selected rows.
Click Apply then OK to confirm changes.
- Deactivate census-tracts layer by unchecking its activate box in the layers list.
Your your results should resemble the image below
Save your workspace.