Emsi Burning Glass (now Lightcast) Job Data

Burning Glass Technologies (now Lightcast) is an IT service and consulting firm specializing in utilizing artificial intelligence to provide job market analytics. Lightcast has one of the world’s largest real-time, proprietary databases of job openings and career histories, with data collected from more than 40,000 sources daily and in more than 30 countries.


Requesting and Retrieving Lightcast Data

Lightcast license limits its use to specific Stevens Institute researchers working with Professor Jeff Nickerson. For potential collaboration please reach out directly to him.

Lightcast Data Documentation (fscstor03)

Note: If you received access to the Lightcast database on the blade01 server, please scroll down to the next section for documentation.

The Hanlon Lab has two PostgreSQL servers with a Lightcast database. In the "fscstor03" server, the Lightcast database is called "burning_glass_xml," and it contains 3 tables: certs, jobs, and skills. These tables are shown in the screenshot below from pgAdmin.

Each table has a unique set of columns of data, with each row referring to a specific job posting. A brief description of each table is listed below.

  1. jobs: the base table, contains 56 columns of job descriptors such as the Lightcast Technologies Job ID, date the posting was acquired, industry classifications, educational requirements, etc.
  2. certs: lists certifications required from job postings, contains five columns (more documentation below)
  3. skills: lists skills associated with job postings, contains six columns (more documentation below)

To extract data from any one of these tables, you can run SQL queries in the "Query Tool" in pgAdmin. A simple query is shown below that returns all of the columns for the first 100 rows from the jobs table (only the first 11 columns and the first 9 rows are shown in the screenshot).


All three tables contain two basic columns: job_id and job_year. Brief descriptions of these columns are listed below.

  1. job_id: unique ID generated by Lightcast which identifies the parsed posting
  2. job_year: year the job was posted

The jobs table contains the job_id and job_year columns, along with 50 other columns of job descriptors. Brief descriptions of these columns are listed below, in the order they appear in the database. Closely related columns are combined into one description.

  1. clean_job_title: contains the posted job title after removing any extraneous text and/or noise from posted job title (Ex: “Registered Nurse NJ $$$” in the job posting’s title field is provided as “Registered Nurse”)
  2. job_domain: domain from which the posting was acquired (Ex: www.usajobs.gov)
  3. canon_city/canon_state/canon_country: canonicalized city/state/country; for canon_city, if the city’s alias is specified in the job posting, canon_city provides the canonicalized city name (Ex: Anderson Acres is canonicalized into Reno)
  4. job_date: date the posting was acquired
  5. job_text: contains the text of the job posting
  6. job_url: URL from which the posting was acquired
  7. posting_html: raw content of the job posting, including HTML tags
  8. source: indicates the source of the job (Ex: Job Board, Company, Recruiter etc.)
  9. job_reference_id: company-specific job reference number listed in the job posting
  10. email: contact email specified in the job posting
  11. canon_employer: standardized version of employer names so that variants of an employer name are grouped together (Ex: Burning Glass, Burning Glass Technologies, and Burning Glass International, Inc. are standardized to Burning Glass Technologies)
  12. stock_ticker: stock ticker for the employer where available
  13. latitude: latitude of the canonicalized location
  14. longitude: longitude for the canonicalized location
  15. canon_intermediary: if the job posting was acquired from an intermediary (Ex: a recruiter), the intermediary is listed here (and not in canon_employer)
  16. telephone: contact phone number specified in the job posting
  17. canon_job_title: standardized version of the job title listed in the posting to enable improved search and categorization (Ex: Oracle Financial Analyst and Financial Analyst/Decision Support are standardized to Financial Analyst)
  18. canon_county: canonicalized county
  19. division_code: Metropolitan/Micropolitan Divisions, as defined by the Office of Management and Budget 2009 MSA lookup, listed with Area Type
  20. msa: Metropolitan Statistical Area (MSA) code (more information here)
  21. lma: Labor Market Area as defined by the 2010 LMA Directory from the Bureau of Labor Statistics (BLS)
  22. internship_flag: indicates if the job is an internship
  23. consolidated_onet: Occupational Information Network (O*NET) occupation code (see #33: bgt_occ)
  24. is_duplicate/is_duplciate_of: is_duplicate indicates whether the posting is a duplicate of another posting already received in a current/past feed file; when is_duplicate is true, is_duplicate_of provides the Job ID of the first posting received
  25. canon_maximum_degree/canon_minimum_degree: canonicalized maximum/minimum degree level, specified in the job posting
  26. canon_preferred_degrees: canonicalized preferred degree level, specified in the job posting
  27. canon_required_degrees: canonicalized required degree level, specified in the job posting
  28. canon_other_degrees: canonicalized other degree level, specified in the job posting
  29. cip_code: lists the Classification of Instructional Program (CIP) codes for the field of study associated with the job (more info here)
  30. standard_major: the standardized form of majors extracted from the job posting
  31. max_experience/min_experience: maximum/minimum required experience, in years, specified in the job posting
  32. consolidated_inferred_naics: North American Industry Classification System (NAICS) code (more digits indicates more specificity, search NAICS codes here)
  33. bgt_occ: Burning Glass Technologies (BGT) occupation job classification/name, derived from the Bureau of Labor Statistic’s (BLS) SOC and O*NET codes (see this BLS page and this page about the O*NET-SOC Taxonomy)
  34. max_annual_salary/max_hourly_salary: maximum annual/hourly salary specified in the job posting (Note: if a max hourly/annual rate is specified in the job posting, the other rate is derived by multiplying/dividing that max rate by 2,080)
  35. min_annual_salary/min_hourly_salary: minimum annual/hourly salary specified in the job posting (Note: if a min hourly/annual rate is specified in the job posting, the other rate is derived by multiplying/dividing that min rate by 2,080)
  36. year_of_experience: the amount of experience required for the job, as specified in the job posting
  37. canon_job_hours: a canonicalized value of the time requirements specified in the job posting (Ex: fulltime)
  38. canon_job_type: a canonicalized value of the type of employment specified in the job posting (Ex: permanent, temporary)
  39. canon_postal_code: ZIP code of the canonicalized location
  40. canon_years_of_experience_level/canon_years_of_experience_canon_level: canonicalized descriptor/value of the amount of experience required, in amount/years, as specified in the job posting (Ex: low/1-6 years experience)
  41. consolidated_title: Burning Glass’ best title for the position described in the job posting
  42. job_language: detected language in which the job posting was written
  43. bgt_sub_occ: specialized occupation for the position specified by the job posting
  44. consolidated_degree_levels: canonicalized value, in years, of the amount of education required which is specified in the job posting
  45. max_degree_level/min_degree_level: canonicalized value of the maximum/minimum amount of education required, specified in the job posting

The certs table contains the job_id and job_year columns, and three other columns describing a certification required for each job. Brief descriptions of these columns are listed below.

Note: Job postings can list multiple certifications required for the position. Therefore, different rows in the certs table can refer to the same job (they will have matching job_id values).

  1. certs_id: unique ID assigned to each certification for each job
  2. cert_name: name of certification
  3. cert_type: type of certification (license, certification, or registered) (Ex: a registered nurse is a registered certification) (Note: some types of certifications are listed as 'License' and others are 'license'; keep this in mind as you run queries)

The skills table contains the job_id and job_year columns, and four other columns describing a skill associated with each job. Brief descriptions of these columns are listed below.

Note: Job postings will often list multiple skills desirable for the position. Therefore, different rows in the skills table can refer to the same job (they will have matching job_id values).

  1. skills_id: unique ID assigned to each skill for each job
  2. skill_cluster: the skill "group" that the skill belongs to (similar skills commonly trained together or are substitutable in many labor market contexts)
  3. canon_skill: canonicalized skill name, specified in the job posting
  4. salary: minimum annual salary

Lightcast Data Documentation (blade01)

The Hanlon Lab's other server, "blade01," manages the second Lightcast database, "burning_glass_csv_3." It contains seven tables: certs, cip, degree, job_text, main, major, and skill. These tables are shown in the screenshot below from pgAdmin (there may be another table named main_2007 when you access the database; please ignore this table and use the main table instead).  

Each table has a unique set of columns of data, with each row referring to a specific job posting. A brief description of each table is listed below.

  1. main: the base table, contains 53 columns of job descriptors such as the Burning Glass Technologies Job ID, date the posting was acquired, industry classifications, educational requirements, etc.
  2. certs: lists certifications (a job posting can list multiple certifications, which will be in different rows) associated with job postings (certification column), contains four columns
  3. cip: lists the Classification of Instructional Program (CIP) codes for the field of study associated with the job (cip column), contains four columns (more info here)
  4. degree: indicates the amount of educational experience required, specified in the job posting (degreelevel column, value is canonicalized in years), contains four columns (in the documentation for the main table below, see #19: edu/maxedu/degree/maxdegree)
  5. job_text: contains the text of the job posting (jobtext column), contains four columns
  6. major: indicates field of study associated with the job (stdmajor column, which stands for standard major, the standardized form of majors extracted from the job posting), contains four columns
  7. skill: lists skills associated with job postings, contains nine columns (more documentation below)

To extract data from any one of these tables, you can run SQL queries in the "Query Tool" in pgAdmin. A simple query is shown below that returns all of the columns for the first 100 rows from the main table (only the first 15 columns and the first 9 rows are shown in the screenshot).


All tables contain three basic columns: bgtjobid, jobdate, and salary. Brief descriptions of these columns are listed below.

  1. bgtjobid: Burning Glass Technologies job identifier, unique to each posting
  2. jobdate: date the posting was acquired
  3. salary: minimum annual salary

All tables with only four columns will contain these three basic columns plus one column specific to that table, whose name is in parentheses in the table descriptions from earlier.


The skill table contains six additional columns besides the three basic ones listed above. A brief description for each is listed below.

Note: Job postings will often list multiple skills desired for the position. Therefore, different rows in the skill table can refer to the same job.

  1. skill: a canonicalized name of a skill listed in the posting
  2. skillcluster: the skill "group" that the skill belongs to (similar skills commonly trained together or are substitutable in many labor market contexts)
  3. skillclusterfamily: the skill "family" that the skill belongs to (the most general layer of the BGT skill taxonomy, each skill and skill cluster belong to exactly one family)
  4. isspecialized: indicates if the skill is job-specific (Ex: welding, software development, financial analysis)
  5. isbaseline: indicates if the skill is a general skill (Ex: communication, problem-solving, creativity)
  6. issoftware: indicates if the skill is a computer-based skill (Ex: Adobe Photoshop, SQL, AutoCAD)

The main table contains the bgtjobid and jobdate columns, and the salary column is included as minsalary (see #21 below). It contains 50 other columns of job descriptors, brief descriptions of which are listed below, in the order they appear in the database. Closely related columns (Ex: columns referring to a classification code/name) are combined into one description.

  1. jobid: unique ID generated by Burning Glass which identifies the parsed posting (not to be confused with bgtjobid)
  2. cleantitle: contains the posted job title after removing any extraneous text and/or noise from posted job title (Ex: “Registered Nurse NJ $$$” in the job posting’s title field is provided as “Registered Nurse”)
  3. canontitle: standardized version of the job title listed in the posting to enable improved search and categorization (Ex: Oracle Financial Analyst and Financial Analyst/Decision Support are standardized to Financial Analyst)
  4. occfam/occfamname: the major occupation family code/name of the job posting, least specific (see bgtocc below)
  5. soc/socname: Standard Occupational Classification (SOC) code/name, more specific than occfam/occfamname but less specific than onet/onetname (see bgtocc below)
  6. onet/onetname: Occupational Information Network (O*NET) occupation code/name, most specific (see bgtocc below)
  7. specialty: general name of the occupation, similar to socname and onetname
  8. bgtocc/bgtoccname: Burning Glass Technologies (BGT) occupation job classification/name, derived from the Bureau of Labor Statistic’s (BLS) SOC and O*NET codes (see this page by the BLS and this page about the O*NET-SOC Taxonomy)
  9. bgtoccgroupname/bgtoccgroupname2: BGT occupation group name(s)
  10. bgtcareerareaname: BGT career area name
  11. employer: standardized version of employer names so that variants of an employer name are grouped together (Ex: postings from Burning Glass, Burning Glass Technologies, and Burning Glass International, Inc. are standardized to Burning Glass Technologies)
  12. sector/sectorname: 2-digit NAICS sector code/name that the company operates in (see below)
  13. naics3/naics4/naics5/naics6: North American Industry Classification System (NAICS) code, where naics3 contains 3 digits, naics4 contains 4, etc.(more digits indicates more specificity, search NAICS codes here)
  14. city/state/county: columns related to the location of the job
  15. fipsstate/fipscounty/fips: Federal Information Processing Standard (FIPS) state/county/combined state and county code for the job (listed here)
  16. lat: latitude of the canonicalized location
  17. lon: longitude of the canonicalized location
  18. bestfitmsa/bestfitmsaname/bestfitmsatype/msa/msaname: Metropolitan Statistical Area (MSA) code/name/type (more information here)
  19. edu/maxedu/degree/maxdegree: canonicalized value/degree level of the educational requirements specified in the job posting
  20. exp/maxexp: amount of experience required for the job, as specified in the job posting
  21. minsalary/maxsalary: minimum/maximum annual salary, as specified in the job posting (Note: if a min/max hourly rate is specified in the job posting, the value is derived by multiplying the min/max hourly rate by 2,080)
  22. minhrlysalary/maxhrlysalary: minimum/maximum hourly salary salary, as specified in the job posting (Note: if a min/max annual rate is specified in the job posting, the value is derived by dividing the maximum annual salary by 2,080)
  23. payfrequency: how often employees are paid (Note: data may contain errors as there are many database entries with "hourly" or "daily" pay frequencies, which is not reasonable)
  24. salarytype: type of salary listed in the job posting (Ex: basepay, bonus, commission)
  25. jobhours: a canonicalized value of the time requirements specified in the job posting (Ex: fulltime, parttime)
  26. taxterm: indicates whether the employer will consider a hired worker for the job as an employee, contractor, or self-employed (and thus, how the company will withhold taxes)
  27. internship: indicates if the job is an internship

For any questions or concerns, please contact us at fscadmin@stevens.edu.