Skip to content

fix(task): 优化执行数据内存异常提示#730

Open
dengyh wants to merge 1 commit into
TencentBlueKing:masterfrom
dengyh:fix/execution-data-memory-error-tip
Open

fix(task): 优化执行数据内存异常提示#730
dengyh wants to merge 1 commit into
TencentBlueKing:masterfrom
dengyh:fix/execution-data-memory-error-tip

Conversation

@dengyh
Copy link
Copy Markdown
Collaborator

@dengyh dengyh commented May 13, 2026

变更说明

  • TaskConfig.ready() 中为 bamboo-engine ServiceActivityHandler.execute 增加幂等 MemoryError 兜底包装。
  • 当 ServiceActivity 执行阶段保存执行数据等流程触发 Python MemoryError 时,将节点置为 FAILED,并写入紧凑 ex_data,提示输入/输出数据过大或 worker 内存不足。
  • 新增单测覆盖节点失败状态、紧凑执行数据以及 handler 包装逻辑。

验证

  • bash -lc 'export $(cat tests/engine.env | xargs); /Users/dengyh/Projects/bk-flow/.venv/bin/pytest tests/engine/task/test_engine_patches.py -q --no-cov'
  • git commit pre-commit hooks: black / isort / flake8 / pyupgrade passed

关联

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.87500% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (master@3ee1043). Learn more about missing BASE report.

Files with missing lines Patch % Lines
bkflow/task/engine_patches.py 96.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master     #730   +/-   ##
=========================================
  Coverage          ?   83.25%           
=========================================
  Files             ?      304           
  Lines             ?    17788           
  Branches          ?        0           
=========================================
  Hits              ?    14810           
  Misses            ?     2978           
  Partials          ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #730 Review Summary

整体评价:这是一个结构清晰、质量很高的 PR。为 bamboo-engine ServiceActivityHandler.execute 增加 MemoryError 兜底处理的方案合理——独立模块 + 幂等守卫 + @wraps 保持函数签名,monkey-patch 的常见风险都已考虑到。测试覆盖了核心逻辑和 patch 包装两个维度,日志格式、常量命名、License 头均符合项目规范。

未发现 Critical / Important 级别问题。 以下为两条 Minor 级别建议,详见行内评论:

  1. ✨ 恢复路径在极端内存压力下的健壮性(理论风险,实际概率很低)
  2. ✨ 建议补充幂等守卫的测试覆盖

)

ex_data = build_memory_error_ex_data(exc)
handler.runtime.node_execute_fail(process_info.root_pipeline_id, handler.node.id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ 这三个 runtime 调用(node_execute_failset_stateset_execution_data)在 MemoryError 刚被捕获后执行。虽然实际场景中此时内存通常已可用(失败的大对象分配未成功,不占内存),但在极端情况下如果其中某个调用也异常,节点会处于不一致状态。可以考虑对整个恢复块加一层 try/except Exception 兜底日志,确保即使恢复失败也有迹可循。

process_info = SimpleNamespace(root_pipeline_id="root-1")

monkeypatch.setattr(ServiceActivityHandler, "execute", raise_memory_error)
patch_service_activity_handler()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ 建议补充一个幂等性断言:连续调用两次 patch_service_activity_handler() 后,ServiceActivityHandler.execute 应该是同一个函数引用(不会被重复包装)。当前的幂等守卫逻辑是正确的,加个测试可以防止未来重构时意外破坏。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants