← Back to Results

ZIP-Level Employment Data


Emsi offers employment data by industry and occupation at the ZIP code level. We begin with Emsi’s final county-level industry data. ZIP-level industry data is created by disaggregating industry county-level data down to the ZIP level with the help of several outside sources. ZIP-level occupation data is created by applying staffing patterns to ZIP-level industry data.

ZIP data should be used while keeping certain cautions in mind. We outline these after describing the process by which Emsi’s ZIP-level estimates are created.

Creation of ZIP-Level Data

The first section describes the creation of industry estimates, and the second section outlines occupation estimates.

Modeling Industry Data from County to ZIP

The backbone of ZIP-level data is Emsi county-level data, which is built using the BLS’s Quarterly Census of Employment and Wages (QCEW) dataset, the most complete and trustworthy source of employment data available in the United States. We use these numbers as the foundation for ZIP-level data, ensuring that employment at the ZIP level exactly matches employment at the county level.

To model the industry county data down to the ZIP level, we use the ZBP dataset to create percentages of employment among ZIPs and industries within a county. For instance, if Emsi county data shows that a 3-ZIP county has employment of 200 in industry x, and that ZBP shows employment ratios of 57%, 43%, and 0% for that industry in the ZIPs in that county, we will assign 114 jobs, 85 jobs, and 0 jobs for that industry to each ZIP in the county, respectively.

If Emsi’s county-level data contains employment for an industry, but ZBP shows no employment for the industry, we move up to the parent 5-digit NAICS and check ZBP again. This happens up to the 2-digit NAICS level, as necessary to find data in ZBP.

We use USPS’s DelStat dataset to create default fallback proportions for each county in case no ZBP data is available for that county-industry combination. DelStat provides business address counts by ZIP. We create a default proportion for each county by counting the number of business addresses in each ZIP within the county. This means we create a unique business address percentage mapping of each county, showing what percent of the county’s businesses are in each ZIP. If the initial method of using ZBP to assign employment for an industry to ZIPs doesn’t work, we fall back to the county’s default percentage map to distribute employment for that industry. The fallback method is only necessary in 0.5% of cases.

Modeling ZIP Occupation Data from ZIP Industry Data

Emsi ZIP occupation data is created in the same way as Emsi county occupation data—we use regionalized staffing patterns created from the BLS’s Occupational Employment Statistics (OES) dataset. OES provides a national-level staffing pattern, which we regionalize using regional industry and occupation data for each OES substate region. These staffing patterns are then applied to Emsi county-level industry data, producing county-level occupation data, and are also applied to ZIP-level industry data, producing ZIP-level occupation data.

Cautions About ZIP-Level Data

Users should keep several things in mind when using Emsi data. First, ZIP codes are not official geographically bounded or distinguished areas, unlike states and counties. Second, Emsi uses the Post Office’s monthly-updated ZIP code definitions. Third, no source of complete ZIP-level data exists. Finally, Emsi ZIP-level data is not a time series.

ZIPs Are Not Geographies

ZIP codes are collections of addresses used by the Post Office to efficiently deliver mail. Many ZIPs in the United States are points. For instance, a Post Office building, a large apartment complex, or a business may have its own ZIP code. The U.S. even has one floating ZIP code.

Many institutions have their own flavor of ZIP codes (Census Bureau, Dept. of Housing & Urban Development) with different updating schedules and therefore very different definitions at any given time about what constitutes a ZIP code. Because there is no official definition, ZIP code data rarely matches between any two given sources. Differences often come down to the underlying ZIP definitions used, as well as what source was used to render those ZIP codes visually in the case of GIS or mapping software.

USPS Monthly ZIP Update Files

The Post Office’s ZIP definitions change monthly as carrier routes morph. Emsi defines ZIP codes using the latest available definition available from the Post Office at the time of each quarterly datarun. Many other sources of ZIP code data use ZIP code definitions from sources other than the Post Office, or their definitions are out of sync with the version currently used in Emsi data at any given time.

Complete ZIP Data Does Not Exist

Because ZIP codes are not official geographies (unlike Census Tracts and Blocks), complete data for ZIP codes does not exist in the United States. The Census’ LEHD LODES dataset is available at the ZIP level, but only provides data for 2-digit NAICS. The Census’s ZIP Code Business Patterns (ZBP) dataset is the closest thing to a complete set, but even it is fairly incomplete.

ZIP Data Not a Time Series

Finally, Emsi ZIP code data is not a time series. We take a time series of county-level data and apply breakout percentages based on latest-year ZBP to each year in the county-level time series. For example, when 2016 ZBP was released, its breakout percentages were applied to 2018 county-level jobs, 2017 county-level jobs, 2016 county-level jobs, etc., back to 2001. The 2016 ZBP breakout was also used when released to create breakouts for projected years of data, e.g. the 2016 breakout was applied to 2019 county-level data through 2028 county-level data. When 2017 ZBP is released, the 2017 ZBP breakout will be applied to all historical and projected years of data, creating a new “time series” internally consistent with 2017 ZBP.

This results in ZIP-level employment data across all years being changed each year when ZBP is updated. Therefore, each new year’s ZIP-level data is a snapshot rather than a time series and should be treated as such. Since ZBP is a volatile dataset, volatility in employment between dataruns in which ZBP is updated is expected.

Submit a Question

Let us know what specific questions we can help you with (we may even add your question to our knowledge base).


Submit a Question

Let us know what specific questions we can help you with (we may even add your question to our knowledge base).