Types

Custom types used in Jamie and data schemas.

class jamie.types.Alert(value)

Alert levels for reporting

class jamie.types.Contract(value)

Contract type: Fixed Term or Permanent

class jamie.types.JobPrediction(prediction)

Represents prediction for a single job

jobid

JobID from jobs.ac.uk

Type

str

job_title

Job title

Type

str

snapshot

Model snapshot used for prediction

Type

str

closes

Close date for job

Type

datetime.date

contract

Contract type

Type

Contract

department

Department of the academic institution that the job is associated with

Type

str

employer

Job employer

Type

str

date

Date of the job. This is usually the same as the posted date, but if that is not available, defaults to the date of job applications closing, or the earliest date found in the job description. This attribute should be used for computing timeseries.

Type

datetime.date

posted

Date job was posted

Type

datetime.date

extra_location

Broad geographical location of job position

Type

str

salary_min

Minimum salary associated with the job. Sometimes jobs have a range of salaries depending on the experience of the applicant.

Type

Optional[int]

salary_max

Maximum salary associated with the job. Sometimes jobs have a range of salaries depending on the experience of the applicant.

Type

Optional[int]

salary_median

Median salary associated with the job.

Type

Optional[int]

probability

Probability that the job is classified in the positive class

Type

float

probability_lower

Lower confidence interval of the probability

Type

float

probability_upper

Upper confidence interval of the probability

Type

float

Parameters

prediction (dict) – Dictionary representing a single prediction from the JSONL file generated by Predict

class jamie.types.JobType(value)

An enumeration.

class jamie.types.PrecisionRecall(value)

An enumeration.

class jamie.types.TrainingData(description: str, job_title: str, aggregate_tags: int, placed_on: datetime.date, jobid: str, job_ref: str, contract: str, department: str, duration_ad_days: int, employer: str, enhanced: str, extra_location: str, final_bool: int, funding_amount: Optional[str], funding_for: Optional[str], hours: str, in_uk: bool, invalid_code: Optional[List[str]], json: Optional[str], location: str, not_student: bool, original: int, original_proba: float, qualification_type: str, reference: str, region: str, run_tag: str, salary: str, salary_max: Optional[float], salary_min: Optional[float], salary_median: Optional[float], subject_area: List[str], tags: List[str], tags_1: str, tags_2: str, tags_3: Optional[str], tag_count: int, agg_tags: float, multi_agg_tags: str, consensus_tags: str, diff_consensus_tags: str)

Schema for the training dataset.

Required columns for model training are ‘description’, ‘job_title’, ‘aggregate_tags’. The attribute ‘placed_on’ is required for timeseries graphs of the training data.

Parameters
  • description (str) – Job description

  • job_title (str) – Job title

  • aggregate_tags (int) – Integer equals 0 or 1

  • placed_on (datetime.date) – Date job was placed on

  • jobid (str) – Unique jobid given by jobs.ac.uk

  • job_ref (str) – Job reference, possibly used internally by the employer

  • contract (str) – Contract type, fixed term or permanent, full-time or part-time

  • department (str) – Department of the employer

  • duration_ad_days (int) – Duration of job advertisment in days from placed_on to closes.

  • employer (str) – Employer name

  • enhanced (str) – HTML content can be “enhanced” or “normal”, which alters the parsing

  • extra_location (str) – Region of UK where job is from

  • final_bool (int) – Unknown boolean type

  • funding_amount (Optional[str]) – Funding amount text if for a PhD position

  • funding_for (Optional[str]) – Specifies whether funding is for UK, EU, international or self-funded students

  • hours (str) – Specifies whether job is full time or part time

  • in_uk (bool) – Specifies whether job is actually in the UK. Some jobs are by UK institutions but located overseas

  • invalid_code (Optional[List[str]]) – List of job attributes that could not be parsed

  • json (Optional[str]) – JSON representation of job

  • location (str) – City where job is located

  • not_student (bool) – Whether job is a PhD level position

  • original (int) – Unknown boolean type

  • original_proba (float) – Unknown probability

  • qualification_type (str) – Type of qualification required for the job in term of education level

  • reference (str) – Unknown field

  • region (str) – Unknown field, possibly country of the UK where job is

  • run_tag (str) – Whether job was classified in first or second run

  • salary (str) – Text fragment which has information on salary

  • salary_max (Optional[float]) – If a salary range is specified, higher end of the salary range, otherwise same as median salary

  • salary_min (Optional[float]) – If a salary range is specified, lower end of the salary range, otherwise same as median salary

  • salary_median (Optional[float]) – Median of salary_min, salary_max if both are present, otherwise equals the salary value

  • subject_area (List[str]) – List of academic fields for the job

  • tags (List[str]) – List of tags (labels) given by coders to the job

  • tags_1 (str) – Label given to job given by coder 1 and 2 respectively, one of {‘No’, ‘Some’, ‘Insufficient Evidence’, ‘Most’} when answering the question “How much time would be spent in this job developing software?”

  • tags_2 (str) – Label given to job given by coder 1 and 2 respectively, one of {‘No’, ‘Some’, ‘Insufficient Evidence’, ‘Most’} when answering the question “How much time would be spent in this job developing software?”

  • tags_3 (Optional[str]) – Label given to job by coder 3 when coder 1 and coder 2 disagreed

  • tag_count (int) – Number of coders who classified the job

  • agg_tags (float) – Aggregate score from coders

  • aggregate_tags – Classification of whether the job is in the target class or not (1 indicating it is, 0 otherwise)

  • multi_agg_tags (str) – Unknown field

  • consensus_tags (str) – [tentative] Consensus of tags_1 and tags_2

  • diff_consensus_tags (str) – Unknown field

static reliability(data, coders=3)

Returns DataFrame which can be used to compute reliability

validate()

Validates a single row of training set data