Skip to content

Full Sklearn pipeline as External Model #53

@jp-varela

Description

@jp-varela

Hello,
I am trying to register a custom sklearn in a PiML experience, but I am getting this error:
File "/tmp/ipykernel_35500/19077422.py", line 76, in objective exp.register(piml_pipeline, "pipeline") File "piml/api.py", line 2691, in piml.api.Experiment.register File "piml/workflow/model_train_api.py", line 61, in piml.workflow.model_train_api.ModelAPI.register_model File "piml/workflow/pipeline.py", line 123, in piml.workflow.pipeline.ModelPipeline.get_data ValueError: could not convert string to float: 'DUMMY STR'

It seems like the get_data expect the input data to be preprocessed, however all my preprocessing steps are included in the sklearn pipeline. I want to have the entire pipeline as single object as I am going to test multiple pipelines with distinct preprocessing methods. The point here seems to be that the is a categorical column, that should be a problem I think.

Here is the code I used:

  # Define model
  model_pipeline = Pipeline([("model", CatBoostClassifier(verbose=0, cat_features=cat_features_idxs))])

  pre_processing_pipeline = Pipeline([
    ('inmputers', 
        ColumnTransformer(transformers=[
            ('numerical_imputer', SimpleImputer(missing_values=np.nan, strategy='mean'), NUMERICAL_COLS),
            ('categorical_imputer', SimpleImputer(missing_values=None, strategy='most_frequent'), CATEGORICAL_COLS)
           ])
       ),
   ])

  # Concat Pipelines
  pipeline = Pipeline([
      ('pre_processing', pre_processing_pipeline),
      ('model', model_pipeline)
  ])

    # Fit the pipeline, predict and evaluate
    pipeline.fit(X_train_, y_train_)

    exp = Experiment()
    piml_pipeline = exp.make_pipeline(pipeline, task_type="classification", train_x=X_train_, train_y=y_train_, test_x=X_val_, test_y=y_val_)
    exp.register(piml_pipeline, "pipeline")

Is there a way for me to make it work?
Thanks 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions