CI flake: Production Docker (flex) sometimes fails with "application not healthy after 10m0s"

## What

The flex production docker job (`build / Production Docker (flex)`) intermittently fails with:

```
application not healthy after 10m0s
##[error]Process completed with exit code 1.
```

Container logs show mysql healthy and openemr started, but the openemr container's healthcheck never passes within the 600-second wait window.

## Where it comes from

`.github/actions/test-actions-core/action.yml` runs:

```yaml
- name: Run the containers
  run: docker compose up --detach --wait --wait-timeout 600 mysql "\${OPENEMR_SERVICE_NAME}"
```

`docker compose up --wait` polls the service healthcheck and exits non-zero with that message when the timeout elapses.

## Recent occurrence

Run https://github.com/openemr/openemr-devops/actions/runs/24732488650 on master (commit 959f246, the merge of #660). Re-running the failed job alone passed without changes, confirming flake.

- #660 only touched `docker/openemr/8.0.0/*` — did not modify the flex container.
- The flex job passed on the PR CI for #660 and on the master push CI for the merge immediately prior (#655, commit 84ca0c3).

## Why it likely flakes

The flex image runs `composer install` at container start. Under CI runner load (cold cache, slow mirror, contention), that can push total boot past 10 min. The wait-timeout is a budget, not a correctness check.

## Suggested mitigations (pick one)

- **Raise the `--wait-timeout`.** 20m would absorb most composer-install variance while still catching real hangs.
- **Warm the image.** Move composer install into the image build (cached layer) rather than container startup, so healthcheck measures runtime boot only.
- **Retry the failed step once** before failing the job. Cheapest fix, doesn't address root cause but silences the flake.
- **Split the healthcheck into cheap (apache up) + slow (installer done) signals**, so compose's `--wait` succeeds quickly while a separate step verifies post-install state.

## Priority

Low — flake is rare and resolved by a single re-run. Open to document the pattern and the mitigation options so the next person who hits it doesn't have to re-derive the diagnosis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI flake: Production Docker (flex) sometimes fails with "application not healthy after 10m0s" #661

What

Where it comes from

Recent occurrence

Why it likely flakes

Suggested mitigations (pick one)

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

CI flake: Production Docker (flex) sometimes fails with "application not healthy after 10m0s" #661

Description

What

Where it comes from

Recent occurrence

Why it likely flakes

Suggested mitigations (pick one)

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions