fix(task): 优化执行数据内存异常提示#730
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #730 +/- ##
=========================================
Coverage ? 83.25%
=========================================
Files ? 304
Lines ? 17788
Branches ? 0
=========================================
Hits ? 14810
Misses ? 2978
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
PR #730 Review Summary
整体评价:这是一个结构清晰、质量很高的 PR。为 bamboo-engine ServiceActivityHandler.execute 增加 MemoryError 兜底处理的方案合理——独立模块 + 幂等守卫 + @wraps 保持函数签名,monkey-patch 的常见风险都已考虑到。测试覆盖了核心逻辑和 patch 包装两个维度,日志格式、常量命名、License 头均符合项目规范。
未发现 Critical / Important 级别问题。 以下为两条 Minor 级别建议,详见行内评论:
- ✨ 恢复路径在极端内存压力下的健壮性(理论风险,实际概率很低)
- ✨ 建议补充幂等守卫的测试覆盖
| ) | ||
|
|
||
| ex_data = build_memory_error_ex_data(exc) | ||
| handler.runtime.node_execute_fail(process_info.root_pipeline_id, handler.node.id) |
There was a problem hiding this comment.
✨ 这三个 runtime 调用(node_execute_fail → set_state → set_execution_data)在 MemoryError 刚被捕获后执行。虽然实际场景中此时内存通常已可用(失败的大对象分配未成功,不占内存),但在极端情况下如果其中某个调用也异常,节点会处于不一致状态。可以考虑对整个恢复块加一层 try/except Exception 兜底日志,确保即使恢复失败也有迹可循。
| process_info = SimpleNamespace(root_pipeline_id="root-1") | ||
|
|
||
| monkeypatch.setattr(ServiceActivityHandler, "execute", raise_memory_error) | ||
| patch_service_activity_handler() |
There was a problem hiding this comment.
✨ 建议补充一个幂等性断言:连续调用两次 patch_service_activity_handler() 后,ServiceActivityHandler.execute 应该是同一个函数引用(不会被重复包装)。当前的幂等守卫逻辑是正确的,加个测试可以防止未来重构时意外破坏。
变更说明
TaskConfig.ready()中为 bamboo-engineServiceActivityHandler.execute增加幂等 MemoryError 兜底包装。MemoryError时,将节点置为FAILED,并写入紧凑ex_data,提示输入/输出数据过大或 worker 内存不足。验证
bash -lc 'export $(cat tests/engine.env | xargs); /Users/dengyh/Projects/bk-flow/.venv/bin/pytest tests/engine/task/test_engine_patches.py -q --no-cov'git commitpre-commit hooks: black / isort / flake8 / pyupgrade passed关联