Skip to content

Make xgboost optional in synthetic dataset generation#872

Merged
jeongyoonlee merged 4 commits intouber:masterfrom
Si-ra-kri:fix-optional-xgboost
Feb 20, 2026
Merged

Make xgboost optional in synthetic dataset generation#872
jeongyoonlee merged 4 commits intouber:masterfrom
Si-ra-kri:fix-optional-xgboost

Conversation

@Si-ra-kri
Copy link
Copy Markdown
Contributor

This change makes xgboost an optional dependency for synthetic dataset generation. When xgboost is not installed, LinearRegression-based
learners continue to work without import errors.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 7, 2026

CLA assistant check
All committers have signed the CLA.

@Si-ra-kri
Copy link
Copy Markdown
Contributor Author

I’ve fixed the indentation issue that caused the CI failure.
The latest commit resolves the error. Please let me know if you’d like me to trigger CI again.

@Si-ra-kri
Copy link
Copy Markdown
Contributor Author

Hi! Just a gentle follow-up - all CI checks are passing now.
Please let me know if any changes are needed from my side.

@jeongyoonlee
Copy link
Copy Markdown
Collaborator

Code review

Found 1 issue:

  1. Indentation bug in get_synthetic_preds silently drops S, T, and X learner predictions — only R Learner results are ever computed.

models = [(LinearRegression, "LR")] is correctly inside the for base_learner, label_l loop (line 76), but if XGBRegressor is not None: (line 77) and for model, label_m in models: (line 80) are dedented one level — placing them outside the outer loop. After all four base-learner iterations complete, base_learner and label_l hold only the last values (BaseRRegressor / "R"), so only "R Learner (LR)" (and "R Learner (XGB)" if installed) are added to preds_dict. The S, T, and X learner predictions are silently skipped.

for base_learner, label_l in zip(
[BaseSRegressor, BaseTRegressor, BaseXRegressor, BaseRRegressor],
["S", "T", "X", "R"],
):
models = [(LinearRegression, "LR")]
if XGBRegressor is not None:
models.append((XGBRegressor, "XGB"))
for model, label_m in models:
learner = base_learner(model())
model_name = "{} Learner ({})".format(label_l, label_m)

get_synthetic_preds_holdout does not have this bug — all three lines sit correctly inside the outer loop there:

for base_learner, label_l in zip(
[BaseSRegressor, BaseTRegressor, BaseXRegressor, BaseRRegressor],
["S", "T", "X", "R"],
):
models = [(LinearRegression, "LR")]
if XGBRegressor is not None:
models.append((XGBRegressor, "XGB"))
for model, label_m in models:
# RLearner will need to fit on the p_hat
if label_l != "R":
learner = base_learner(model())

The fix is to indent lines 77–81 of get_synthetic_preds by four additional spaces so they match the structure in get_synthetic_preds_holdout.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@Si-ra-kri
Copy link
Copy Markdown
Contributor Author

Thanks for catching that, indentation has been fixed and pushed.
Please let me know if anything else needs adjustment.

Copy link
Copy Markdown
Collaborator

@jeongyoonlee jeongyoonlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeongyoonlee jeongyoonlee merged commit b4c76dd into uber:master Feb 20, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants