The following is a list of common questions that come up when discussing Emsi’s new tract-level data.
Our original plan was to map the Census Bureau’s Zip Business Patterns (ZBP) dataset onto census tracts using a 2-digit ZIP-tract mapping from LODES in combination with establishment counts data.
However, beginning with 2017 ZBP, the Census Bureau began using a new suppression method called differential privacy, and the IRS redefined the criteria for confidentiality of establishment counts. Both of these privacy measures combined had the effect of suppressing 90% of the ZBP dataset, rendering it useless for creating tract data.
Our next idea was to bypass ZIP codes and use LODES to disaggregate county-level data straight to census tracts. However, we ran into problems with LODES that rendered it ineffective for creating breakouts as well.
First, we found that LODES is not consistent with QCEW, despite the fact that both datasets are based on the same administrative records. The Census Bureau was unable to explain why LODES is inconsistent with QCEW, but it is presumably because their algorithms for disclosure avoidance make random mutations to the data. When combined with other factors, this lessened our trust in the accuracy of LODES data, especially at more detailed geographical levels.
Additionally, LODES is only available at the 2-digit NAICS level for each census tract. The result of building a breakout using LODES data is lack of geographical precision in assigning detailed NAICS to tracts within a county. Instead of employment counts for a 6-digit industry being assigned precisely to the tract to which they belong, these counts can be smeared across multiple tracts in the county.
Imagine for example a county in which there are many specialty hospitals and other medical establishments. In this imaginary county, there are one or more establishments in the following detailed NAICS:
All of these establishments fall into the broad NAICS category 62 (Healthcare and Social Assistance). For this county, LODES should indicate that many tracts within the county contain employment for NAICS 62. However, without any additional NAICS detail, it is impossible to precisely assign employment for each 6-digit NAICS to the appropriate tracts. If there is only one Kidney Dialysis Center, LODES does not know this, and has no way of knowing the proportion of all NAICS 62 employment that that one establishment represents, much less which one tract that establishment is located in. Rather, a simple LODES-based breakout will spread all employment within a 2-digit NAICS across all tracts in the county.
In order to create an accurate 6-digit breakout, a more detailed source than LODES is necessary. For this reason, we used business listings data in creating a tract-level breakout.
Both Emsi and some of our users ground-truthed data for known regions.
Emsi uses business listings data from DatabaseUSA to create percentage breakouts for tract-level data.
No. Population data is by place of residence, whereas employment is by place of work. Therefore, population data is not a valid tool to use to model employment counts.
It is always ideal to use government data to create tract-level breakouts wherever possible, since government data is more representative and reliable. However, very few 6-digit NAICS have tract-level data available from a government source. The following are the few we were able to find:
Simply put, the above methodology (using business listings to create a proportional breakout) is not trustworthy enough to push it all the way down to the census block level. There are 42,000 ZIP codes, 73,000 census tracts, and 11 million census blocks in the United States. The move from ZIP codes to census tracts roughly doubles the number of geographies and roughly (generally) halves the size of the geographies. The move from tracts to blocks is a 150x increase in the number of geographies and a 150x decrease in the size of geographies, making it much more difficult to ensure precise placement of jobs.
Moving from ZIPs to tracts halves the size of the bullseye but moving from tracts all the way down to blocks would make the bullseye 1/150th as big. The smaller the bullseye, the harder it is to hit.
Establishment-level data is required to make an informed estimate of employment or earnings at the census block level. Establishment-level data allows us to pin a business, with its address, NAICS code, employment assignment and any other data, to a point on a map. That point and its associated data can then be assigned to any geography type–a tract, a county, a block, or an arbitrary, user-defined shape. DatabaseUSA, which Emsi uses to move county-level data down to census tract, is establishment-level data, but is not reliable enough in NAICS or employment assignments to be usable much below the tract level. If and when a reliable establishment-level dataset exists, this could be used to create block-level data.
We do not offer confidence intervals for our estimates. We do ensure that tract-level estimates are consistent with county-level estimates and county-level disclosed figures. This means that all tracts within a county, summed up, will equal employment for the county. County-level data matches government sources, most notably the BLS’s QCEW. The only way for tract-level data to be incorrect is for it to be mis-proportioned among tracts within a county.
Because there is very little data available at the census tract level, and because the little data that is available is not very detailed, “ground truth” with which to compare estimates does not exist.
Part of the value add of Emsi (and any other labor market provider) is the creation of estimates/imputations for non-existent data. However, the result is that we have created estimates/imputations for which no data is available to confirm or deny the truth of the estimate. This is a reality for Emsi and any labor market provider.
No, population demographics are calculated very differently. See this article for more information.
There are several more data points that have been developed at the tract level but are not available in Emsi software:
Census tracts cannot be searched by city. The parent geography of a census tract is a county, and in Emsi tools, the name of the county is embedded in the name of the census tract. Beyond this, it is not possible to name-search a region and get a list of tracts matching that region.
We advise that users who need to find a list of tracts to approximate their region use the Census’s selection map tool. This tool can be accessed by clicking into any one of the items under the “Maps and Visualizations” section:
Toggle the map to census tracts. From there, zoom in on a region to show the tracts in the region.
Emsi tracts are named similar to Census Bureau conventions, e.g. “201.00 (in Autauga County, AL). Emsi adds a unique identifier number that is a combination of the county code in which the tract exists and the Census Bureau’s code. Census tract 201.00 (in Autauga County, AL) is tract number 1001020100: county 1001, tract 201.00 (020100).
Let us know what specific questions we can help you with (we may even add your question to our knowledge base).