Models

Machine learning models

jamie.models.get_model(n)

Return model object corresponding to the named parameter.

Parameters

n (str) – The name of the model

Returns

Model object

jamie.models.nested_cross_validation(models, X, y, scoring_value, snapshot, oversampling=False, nbr_folds=5, random_state=100)

Perform nested cross validation and return best model. The set of models is defined in jamie.models. This function is generally not invoked directly, and is called through train().

Parameters
  • models (List[str]) – List of models to use, specify None for all models

  • X (numpy.ndarray) – Feature matrix. This can be obtained by calling fit_transform() on a Features object

  • y (numpy.ndarray) – Binary labels, should have the same number of rows as X

  • scoring_value (score) – Which score type to use, same as in GridSearchCV

  • oversampling (bool) – Whether to perform oversampling to balance the dataset (default: True)

  • nbr_folds (int) – Number of folds for cross validation (default: 5)

  • random_state (int) – Seed to initialise the random state (default: 100)

  • snapshot (str) – Snapshot within which this is being run, used only for logging.

Returns

  • best_params (dict) – Best parameters for the final model

  • final_model (model) – Final model

  • score_for_outer_cv (pd.DataFrame) – Scores for outer cross validation for the various models

jamie.models.parse_model_description(model_description, models=None, random_state=100)

Parse models description. This function expands configuration values such as hyperparameter ranges from a string description to Python objects. The following interpositions are supported for parameter types:

  • =<start>:<stop>[:<step>] becomes range (start,stop,step)

  • =e<start>:<stop>[:<step>] becomes np.logspace (start,stop,num)

Parameters
  • model_description (dict) – Dictionary representing models with their configuration

  • models (Optional[List[str]]) – List of models to parse, by default parses all models

  • random_state (int) – Random state to initialise seed, default: 100

Returns

dict – Model description with parameters interposed using the above substitutions

jamie.models.train(config, snapshot, featureset, models, prediction_field, oversampling, scoring, random_state=100)

Train models, called when using jamie train and save model snapshots.

Parameters
  • config (jamie.config.Config) – Configuration object

  • snapshot (jamie.snapshots.TrainingSnapshot) – Training snapshot to use

  • featureset (str) – Featureset to use

  • models (Optional[List[str]]) – List of models to train on, by default all models are selected

  • prediction_field (str) – Which column of the training set data to use for prediction.

  • oversampling (bool) – Whether to oversample to form a balanced set, passed to nested_cross_validation().

  • scoring (str) – Scoring method to use for grid search, passed to nested_cross_validation().

  • random_state (int) – Seed to initialise the random state (default: 100)