Emsi Burning Glass Deduplication Process:
Emsi Burning Glass’s database is a full reflection of job listings posted across the Internet, as such robust processes are required to identify and remove duplicate listings. Emsi Burning Glass applies a unique two-step approach to deduplication that results in up to 80% of all jobs we collect being deduplicated. The initial deduplication screen is undertaken on a source-level basis, with intelligence contained within the spiders themselves to identify and refrain from collecting records that have previously been aggregated. However, because duplicates can occur across sources, our next phase involves a thorough and ongoing analysis of the full database of aggregated content. This deduplication analysis is possible because of advanced parsing engines that extract and normalize a number of data elements from each job listing, each of which can function as an individual duplicate screen or in concert with other variables, e.g. job title, job ID, source, posting date, employer name, location, etc.
For deduplication, Emsi Burning Glass uses a 60 day rule to identify duplicates. For example, if there is a job for a Marketing Specialist at Google posted for the first time on March 1st, Emsi Burning Glass considers this as the ‘original posting’ then for the next 60 days Emsi Burning Glass removes all possible duplicates. If Google continues to actively post the Marketing Specialist ad after 60 days, approximately May 1st, then we will count the ad as a new posting and start tracking a new 60 days. In theory, if Google posts the same ad every day for the entire year Emsi Burning Glass will count it 6 times.
Emsi Burning Glass updates real-time job postings data on a daily basis. Jobs captured by Emsi Burning Glass are then processed and loaded into the product database typically within one to two days.
Let us know what specific questions we can help you with (we may even add your question to our knowledge base).
Let us know what specific questions we can help you with (we may even add your question to our knowledge base).