What
The flex production docker job (build / Production Docker (flex)) intermittently fails with:
application not healthy after 10m0s
##[error]Process completed with exit code 1.
Container logs show mysql healthy and openemr started, but the openemr container's healthcheck never passes within the 600-second wait window.
Where it comes from
.github/actions/test-actions-core/action.yml runs:
- name: Run the containers
run: docker compose up --detach --wait --wait-timeout 600 mysql "\${OPENEMR_SERVICE_NAME}"
docker compose up --wait polls the service healthcheck and exits non-zero with that message when the timeout elapses.
Recent occurrence
Run https://github.com/openemr/openemr-devops/actions/runs/24732488650 on master (commit 959f246, the merge of #660). Re-running the failed job alone passed without changes, confirming flake.
Why it likely flakes
The flex image runs composer install at container start. Under CI runner load (cold cache, slow mirror, contention), that can push total boot past 10 min. The wait-timeout is a budget, not a correctness check.
Suggested mitigations (pick one)
- Raise the
--wait-timeout. 20m would absorb most composer-install variance while still catching real hangs.
- Warm the image. Move composer install into the image build (cached layer) rather than container startup, so healthcheck measures runtime boot only.
- Retry the failed step once before failing the job. Cheapest fix, doesn't address root cause but silences the flake.
- Split the healthcheck into cheap (apache up) + slow (installer done) signals, so compose's
--wait succeeds quickly while a separate step verifies post-install state.
Priority
Low — flake is rare and resolved by a single re-run. Open to document the pattern and the mitigation options so the next person who hits it doesn't have to re-derive the diagnosis.
What
The flex production docker job (
build / Production Docker (flex)) intermittently fails with:Container logs show mysql healthy and openemr started, but the openemr container's healthcheck never passes within the 600-second wait window.
Where it comes from
.github/actions/test-actions-core/action.ymlruns:docker compose up --waitpolls the service healthcheck and exits non-zero with that message when the timeout elapses.Recent occurrence
Run https://github.com/openemr/openemr-devops/actions/runs/24732488650 on master (commit 959f246, the merge of #660). Re-running the failed job alone passed without changes, confirming flake.
docker/openemr/8.0.0/*— did not modify the flex container.Why it likely flakes
The flex image runs
composer installat container start. Under CI runner load (cold cache, slow mirror, contention), that can push total boot past 10 min. The wait-timeout is a budget, not a correctness check.Suggested mitigations (pick one)
--wait-timeout. 20m would absorb most composer-install variance while still catching real hangs.--waitsucceeds quickly while a separate step verifies post-install state.Priority
Low — flake is rare and resolved by a single re-run. Open to document the pattern and the mitigation options so the next person who hits it doesn't have to re-derive the diagnosis.