Skip to content

Commit 27fbe48

Browse files
fix(agent): tolerate empty system_info in heartbeat JSON (v1.8.3)
A self-hosted agent could fail every heartbeat with HTTP 400 `Invalid JSON: Expecting property name enclosed in double quotes` when the system-info block it collects came back empty on an unusual host. The agent builds the heartbeat JSON as text, so an empty `$system_info` collapsed the `$system_info,` line to a bare comma and broke the payload. - Agent script (linux + macos, kept in sync): guard the fragment-form register_agent and send_heartbeat builders so an empty system_info falls back to a valid key and can never emit a bare comma. Uses the most portable bash glob test (no POSIX class / pattern substitution; verified on bash 3.2-5.2 and on Ubuntu/Debian/Rocky/Alpine/Amazon Linux). True no-op for healthy agents. - Backend heartbeat endpoint: parse the body as-is first and only run the malformed-JSON repair when parsing fails, so a valid heartbeat from any agent version is byte-for-byte untouched. The repair (now a testable helper) recovers a leading or doubled comma (the empty-system_info artifact) in addition to the existing empty-value / trailing-comma fixes. No agent version bump; self-upgrade and daemon mode are unaffected. Healthy agents of every version behave identically. Full backend suite green. Addresses #31.
1 parent e86e86a commit 27fbe48

9 files changed

Lines changed: 230 additions & 46 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2415,6 +2415,7 @@ Developed with ❤️ for the HAProxy community
24152415

24162416
## Release Notes
24172417

2418+
- **v1.8.3** (2026-06-25) — **Agent heartbeat JSON fix** (Issue #31): a self-hosted agent could fail every heartbeat with `HTTP 400 Invalid JSON: Expecting property name enclosed in double quotes` when the system-info block it collects came back empty on an unusual host, leaving a stray comma in the hand-built heartbeat JSON. The agent script now substitutes a valid placeholder when that block is empty so it can no longer emit a stray comma, and the backend heartbeat endpoint now parses valid payloads as-is and, only when a body fails to parse, tolerates that specific malformed pattern (a leading or doubled comma) so an already-deployed agent recovers on its next heartbeat after this build is deployed. Backend + agent-script only; healthy agents of every version are byte-for-byte unaffected.
24182419
- **v1.8.2** (2026-06-25) — **ACME nonce fix** (Issue #35 follow-up): the ACME client now scopes the anti-replay nonce **per certificate authority** so a nonce issued by one CA is never sent to another. This fixes ZeroSSL/Google account registration failing with `malformed: The Replay Nonce could not be base64url-decoded` (the client previously shared one nonce across CAs and only auto-retried on `badNonce`). Account registration now always uses a fresh nonce from the target CA, and the retry covers this case too. Backend-only; HTTP-01 and Let's Encrypt are unaffected.
24192420
- **v1.8.1** (2026-06-24) — **ACME DNS-01 fixes** (Issue #35 follow-up): Cloudflare API tokens are now sanitized so a pasted token with quotes/spaces no longer fails with "Invalid request headers"; ZeroSSL/Google **External Account Binding (EAB)** can be entered per-account in the register dialog and EAB-required failures show a clear message; and **Apply Management** now categorizes cluster ACME enable/disable changes under their own "ACME Challenge Routing" section and **Apply/Reject All** correctly process them (previously "Rejected 0 HA/VIP change(s)"), consistent with every other entity. Fully backward compatible.
24202421
- **v1.8.0** (2026-06-23) — **ACME DNS-01 challenge support** (Issue #35): Auto SSL can now validate via a **DNS TXT record** (`_acme-challenge.<domain>`) instead of HTTP-01 on port 80, enabling certificates for **internal/isolated clusters with no public ingress** and **wildcard** certificates (`*.example.com`). Pluggable **per-account DNS provider** (Manual + Cloudflare to start; credentials verified on save and **encrypted at rest**, never returned by the API or logged), the same **PENDING → APPLIED** pipeline, a **bounded automatic retry** on propagation lag, and a **DNS-01 event timeline** in the order detail. **Opt-in** via Settings → ACME (global switch, default off); **HTTP-01 is byte-for-byte unchanged**, with **zero agent or rendered-config changes**. Manual DNS-01 certificates cannot auto-renew unattended; the UI states this and disables auto-renew for them.

backend/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
from datetime import datetime, timedelta
1010

1111
# Build/deploy marker for the v1.8.x (Issue #35, DNS-01) rollout — ensures the pipeline ships this commit's image.
12-
_version_info = {"version": "1.8.2", "releaseName": "ACME nonce fix (ZeroSSL registration)", "releaseDate": "2026-06-25"}
12+
_version_info = {"version": "1.8.3", "releaseName": "Agent heartbeat JSON fix", "releaseDate": "2026-06-25"}
1313
for _vpath in ["/app/version.json", os.path.join(os.path.dirname(__file__), "..", "version.json")]:
1414
try:
1515
with open(_vpath) as _vf:

backend/routers/agent.py

Lines changed: 58 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,40 @@
2929
"linux": "2.0.0"
3030
}
3131

32+
33+
def _sanitize_agent_json(body_str: str):
34+
"""Repair the common malformed-JSON patterns a hand-built agent heartbeat can emit.
35+
36+
Agents assemble their heartbeat JSON as text in bash, so an empty interpolated value can leave
37+
a structurally-invalid comma (issue #31). Returns (possibly_repaired_str, was_changed). The
38+
repairs are conservative and target only structural artifacts an agent produces; they never
39+
alter this endpoint's legitimate string values (the agent emits no string containing ',,' —
40+
haproxy_stats_csv is base64/comma-free and the rest are constrained os/kernel/ip/version text).
41+
"""
42+
import re
43+
sanitized = False
44+
# Fix 1: empty value before a comma ("server_statuses": ,)
45+
if re.search(r':\s*,', body_str):
46+
body_str = re.sub(r':\s*,', ': null,', body_str); sanitized = True
47+
# Fix 2: empty value before a closing brace ("field":})
48+
if re.search(r':\s*}', body_str):
49+
body_str = re.sub(r':\s*}', ': null}', body_str); sanitized = True
50+
# Fix 3: trailing comma before } or ]
51+
if re.search(r',(\s*[}\]])', body_str):
52+
body_str = re.sub(r',(\s*[}\]])', r'\1', body_str); sanitized = True
53+
# Fix 4: leading comma run right after an opening brace/bracket (issue #31): an empty
54+
# $system_info as the first member collapses to '{ , "name": ...'. The ': ,' fix above cannot
55+
# catch this because there is no key/colon before the comma.
56+
if re.search(r'([{\[])(\s*,)+', body_str):
57+
body_str = re.sub(r'([{\[])(\s*,)+', r'\1', body_str); sanitized = True
58+
# Fix 5: a run of commas between members (issue #31): an empty $system_info between two fields
59+
# produces '"version": "x",\n ,\n "haproxy_status": ...'. Runs after Fix 1/3 so only
60+
# structural commas remain; collapse any comma run to a single comma.
61+
if re.search(r',(\s*,)+', body_str):
62+
body_str = re.sub(r',(\s*,)+', ',', body_str); sanitized = True
63+
return body_str, sanitized
64+
65+
3266
def get_platform_key(agent_platform: str) -> str:
3367
"""Convert agent platform to standardized platform key - fixed empty platform fallback"""
3468
platform = agent_platform.lower() if agent_platform else 'unknown'
@@ -1455,47 +1489,33 @@ async def agent_heartbeat_by_name(
14551489
import json
14561490
from pydantic import ValidationError
14571491

1458-
# Read raw body and sanitize common JSON errors from agents
1492+
# Read raw body. Parse VALID JSON as-is (the normal case for every agent version) and only
1493+
# fall back to the malformed-JSON repair when the body does not parse. This guarantees a healthy
1494+
# heartbeat from any agent version is byte-for-byte untouched — the repair regexes can never run
1495+
# against a well-formed payload (issue #31; strictly safer than repairing unconditionally).
14591496
try:
14601497
raw_body = await request.body()
14611498
body_str = raw_body.decode('utf-8')
1462-
1463-
# Sanitize common malformed JSON patterns from agents
1464-
original_body = body_str
1465-
sanitized = False
1466-
1467-
# Fix 1: Empty values before comma (most common: "server_statuses": ,)
1468-
if re.search(r':\s*,', body_str):
1469-
body_str = re.sub(r':\s*,', ': null,', body_str)
1470-
sanitized = True
1471-
1472-
# Fix 2: Empty values before closing brace
1473-
if re.search(r':\s*}', body_str):
1474-
body_str = re.sub(r':\s*}', ': null}', body_str)
1475-
sanitized = True
1476-
1477-
# Fix 3: Trailing commas
1478-
if re.search(r',(\s*[}\]])', body_str):
1479-
body_str = re.sub(r',(\s*[}\]])', r'\1', body_str)
1480-
sanitized = True
1481-
1482-
if sanitized:
1483-
# Extract agent name for logging
1484-
agent_name = "unknown"
1485-
try:
1486-
name_match = re.search(r'"name"\s*:\s*"([^"]+)"', body_str)
1487-
if name_match:
1488-
agent_name = name_match.group(1)
1489-
except:
1490-
pass
1491-
1492-
logger.info(f"Sanitized malformed JSON from agent '{agent_name}' - fixed empty values and trailing commas")
1493-
logger.debug(f"Original JSON (preview): {original_body[:300]}")
1494-
logger.debug(f"Sanitized JSON (preview): {body_str[:300]}")
1495-
1496-
# Parse sanitized JSON into Pydantic model
1497-
heartbeat_dict = json.loads(body_str)
1498-
1499+
1500+
try:
1501+
heartbeat_dict = json.loads(body_str)
1502+
except json.JSONDecodeError:
1503+
# Malformed body (would otherwise be a hard 400). Attempt a conservative repair of the
1504+
# comma artifacts a hand-built agent heartbeat can emit, then re-parse.
1505+
repaired, changed = _sanitize_agent_json(body_str)
1506+
if changed:
1507+
agent_name = "unknown"
1508+
try:
1509+
name_match = re.search(r'"name"\s*:\s*"([^"]+)"', repaired)
1510+
if name_match:
1511+
agent_name = name_match.group(1)
1512+
except Exception:
1513+
pass
1514+
logger.info(f"Repaired malformed JSON from agent '{agent_name}' before parsing")
1515+
logger.debug(f"Original JSON (preview): {body_str[:300]}")
1516+
logger.debug(f"Repaired JSON (preview): {repaired[:300]}")
1517+
heartbeat_dict = json.loads(repaired) # may still raise -> handled as 400 below
1518+
14991519
# DEBUG: Log cluster_id for auto-register troubleshooting
15001520
if heartbeat_dict.get('name'):
15011521
logger.info(f"HEARTBEAT DEBUG: agent={heartbeat_dict.get('name')}, cluster_id={heartbeat_dict.get('cluster_id')}, has_cluster_id={bool(heartbeat_dict.get('cluster_id'))}")
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
"""Issue #31 — agent heartbeat JSON sanitizer.
2+
3+
A self-hosted agent builds its heartbeat JSON as text in bash. When a collected value is empty,
4+
the payload can contain a structurally-invalid comma that broke the heartbeat with
5+
`HTTP 400 Invalid JSON: Expecting property name enclosed in double quotes`. The backend now
6+
repairs that pattern in `_sanitize_agent_json` so an already-deployed agent recovers without a
7+
re-install. These tests pin that behaviour and prove the repair never corrupts a healthy payload.
8+
"""
9+
import json
10+
11+
from routers.agent import _sanitize_agent_json
12+
13+
14+
def _assert_parses(raw: str) -> dict:
15+
out, _ = _sanitize_agent_json(raw)
16+
return json.loads(out) # raises if the repair did not produce valid JSON
17+
18+
19+
def test_reporter_empty_system_info_bare_comma():
20+
# The exact shape the reporter hit: an empty $system_info collapses ' $system_info,' to a
21+
# bare comma between two members -> '"version": "x",\n ,\n "haproxy_status": ...'.
22+
raw = (
23+
'{\n'
24+
' "name": "test",\n'
25+
' "hostname": "h",\n'
26+
' "status": "online",\n'
27+
' "version": "2.0.0",\n'
28+
' ,\n'
29+
' "haproxy_status": "running",\n'
30+
' "cluster_id": 1\n'
31+
'}'
32+
)
33+
parsed = _assert_parses(raw)
34+
assert parsed["name"] == "test"
35+
assert parsed["status"] == "online"
36+
assert parsed["haproxy_status"] == "running"
37+
38+
39+
def test_empty_numeric_subfield_before_comma():
40+
# An empty unquoted numeric ("memory_total": ,) — covered by the pre-existing Fix 1.
41+
raw = '{ "name": "t", "cpu_count": , "memory_total": , "status": "online" }'
42+
parsed = _assert_parses(raw)
43+
assert parsed["cpu_count"] is None and parsed["memory_total"] is None
44+
assert parsed["status"] == "online"
45+
46+
47+
def test_empty_value_before_closing_brace():
48+
raw = '{ "name": "t", "status": "online", "applied_config_version": }'
49+
parsed = _assert_parses(raw)
50+
assert parsed["applied_config_version"] is None
51+
52+
53+
def test_leading_comma_first_member():
54+
# Empty $system_info as the FIRST member -> '{ , "name": ... }'.
55+
raw = '{\n ,\n "name": "t",\n "status": "online"\n}'
56+
parsed = _assert_parses(raw)
57+
assert parsed["name"] == "t"
58+
59+
60+
def test_comma_run_two_empty_fields():
61+
# Two empties in a row (odd-length comma run) must still collapse to valid JSON.
62+
raw = '{ "a": 1,\n ,\n ,\n "b": 2 }'
63+
parsed = _assert_parses(raw)
64+
assert parsed["a"] == 1 and parsed["b"] == 2
65+
66+
67+
def test_trailing_comma_regression():
68+
# Pre-existing Fix 3 must still hold after the new fixes were added.
69+
raw = '{ "name": "t", "status": "online", }'
70+
parsed = _assert_parses(raw)
71+
assert parsed["name"] == "t"
72+
73+
74+
def test_healthy_payload_is_untouched():
75+
# A well-formed agent payload must pass through unchanged (sanitized=False) and its values —
76+
# including the base64 stats CSV and the nested server_statuses — must be byte-identical.
77+
payload = {
78+
"name": "agent-1",
79+
"status": "online",
80+
"cluster_id": 1,
81+
"server_statuses": {"be_app": {"s1": "UP", "s2": "DOWN"}},
82+
"network_interfaces": ["eth0", "eth1"],
83+
"haproxy_stats_csv": "IyBwdmJjLGJhY2tlbmQsZnJvbnRlbmQs", # base64: contains commas only inside a quoted string is impossible (base64 has none)
84+
"applied_config_version": "cluster-1-v42",
85+
}
86+
raw = json.dumps(payload)
87+
out, changed = _sanitize_agent_json(raw)
88+
assert changed is False
89+
assert out == raw # byte-identical
90+
assert json.loads(out) == payload
91+
92+
93+
def test_idempotent_on_already_clean_minimal():
94+
raw = '{"name": "t", "status": "online"}'
95+
out, changed = _sanitize_agent_json(raw)
96+
assert changed is False
97+
assert out == raw
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
"""Issue #31 — agent-script hardening guard (static).
2+
3+
The agent install scripts hand-build the heartbeat JSON, so if `collect_system_info` ever yields
4+
nothing the `$system_info,` line collapses to a bare comma and the whole heartbeat is invalid JSON
5+
(HTTP 400). The fix adds a guard at every fragment-form call site that substitutes a single valid
6+
key when system_info is empty. This static check enforces that the guard is present AND kept in
7+
sync across BOTH platform scripts — the project requires the two agent-script copies to stay in
8+
lockstep. (Empty numeric subfields like "memory_total": , are a separate, milder case already
9+
repaired by the backend sanitizer, so they are intentionally NOT guarded in the script — guarding
10+
them with a strict integer test would wrongly reject the scientific-notation that mawk emits for
11+
multi-GB sizes on Debian/Ubuntu.)
12+
"""
13+
import os
14+
15+
_SCRIPT_DIR = os.path.join(
16+
os.path.dirname(os.path.dirname(os.path.abspath(__file__))), # backend/
17+
"utils", "agent_scripts",
18+
)
19+
20+
21+
def _read(name: str) -> str:
22+
with open(os.path.join(_SCRIPT_DIR, name), "r") as f:
23+
return f.read()
24+
25+
26+
LINUX = _read("linux_install.sh")
27+
MACOS = _read("macos_install.sh")
28+
29+
# The empty-system_info guard — present at BOTH fragment call sites (register_agent + send_heartbeat).
30+
_B2_GUARD = '[[ "$system_info" != *\'"\'* ]] && system_info=\'"operating_system": "unknown"\''
31+
32+
33+
def test_b2_guard_present_and_in_sync():
34+
# Two fragment-form call sites per script (register_agent + send_heartbeat), identical wording.
35+
assert LINUX.count(_B2_GUARD) == 2, "linux_install.sh missing/duplicated empty-system_info guard"
36+
assert MACOS.count(_B2_GUARD) == 2, "macos_install.sh missing/duplicated empty-system_info guard"
37+
38+
39+
def test_b2_guard_precedes_every_fragment_system_info_use():
40+
# Every ' $system_info,' fragment line (the one that breaks on an empty value) must be in a
41+
# function whose system_info was guarded. We assert the count of guards matches the count of
42+
# fragment-form interpolations' call sites: each script has exactly one register + one
43+
# send_heartbeat fragment builder feeding those lines, both guarded above.
44+
for name, script in (("linux", LINUX), ("macos", MACOS)):
45+
assert script.count(" $system_info,") >= 1, f"{name}: fragment heartbeat form unexpectedly gone"
46+
assert script.count(_B2_GUARD) == 2, f"{name}: each fragment call site must carry the guard"

backend/utils/agent_scripts/linux_install.sh

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1055,7 +1055,12 @@ register_agent() {
10551055
local arch=$(uname -m)
10561056
platform=$(uname -s | tr '[:upper:]' '[:lower:]') # Remove local to make it global
10571057
local system_info=$(collect_system_info)
1058-
1058+
# issue #31: if collect_system_info produced no JSON content (empty on an unusual host), the
1059+
# '$system_info,' line below would collapse to a bare comma and break the heartbeat JSON. A
1060+
# valid fragment always contains a quoted key; if none is present, fall back to one. The glob
1061+
# '*"*' is the most portable bash test (no POSIX class / pattern-substitution), safe on bash 3.x+.
1062+
[[ "$system_info" != *'"'* ]] && system_info='"operating_system": "unknown"'
1063+
10591064
local json_payload=$(cat <<SIMPLE_EOF
10601065
{
10611066
"name": "$AGENT_NAME",
@@ -1357,7 +1362,12 @@ send_heartbeat() {
13571362
local server_statuses=$(get_server_statuses)
13581363
local haproxy_stats_csv=$(get_haproxy_stats_csv)
13591364
local system_info=$(collect_system_info)
1360-
1365+
# issue #31: if collect_system_info produced no JSON content (empty on an unusual host), the
1366+
# '$system_info,' line below would collapse to a bare comma and break the heartbeat JSON. A
1367+
# valid fragment always contains a quoted key; if none is present, fall back to one. The glob
1368+
# '*"*' is the most portable bash test (no POSIX class / pattern-substitution), safe on bash 3.x+.
1369+
[[ "$system_info" != *'"'* ]] && system_info='"operating_system": "unknown"'
1370+
13611371
# Get HAProxy version for heartbeat (safe extraction, fallback to "unknown")
13621372
local haproxy_version="unknown"
13631373
if command -v haproxy &> /dev/null; then

backend/utils/agent_scripts/macos_install.sh

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -920,7 +920,12 @@ register_agent() {
920920
local arch=$(uname -m)
921921
platform=$(uname -s | tr '[:upper:]' '[:lower:]') # Remove local to make it global
922922
local system_info=$(collect_system_info)
923-
923+
# issue #31: if collect_system_info produced no JSON content (empty on an unusual host), the
924+
# '$system_info,' line below would collapse to a bare comma and break the heartbeat JSON. A
925+
# valid fragment always contains a quoted key; if none is present, fall back to one. The glob
926+
# '*"*' is the most portable bash test (no POSIX class / pattern-substitution), safe on bash 3.x+.
927+
[[ "$system_info" != *'"'* ]] && system_info='"operating_system": "unknown"'
928+
924929
local json_payload=$(cat <<SIMPLE_EOF
925930
{
926931
"name": "$AGENT_NAME",
@@ -1191,7 +1196,12 @@ send_heartbeat() {
11911196
local server_statuses=$(get_server_statuses)
11921197
local haproxy_stats_csv=$(get_haproxy_stats_csv)
11931198
local system_info=$(collect_system_info)
1194-
1199+
# issue #31: if collect_system_info produced no JSON content (empty on an unusual host), the
1200+
# '$system_info,' line below would collapse to a bare comma and break the heartbeat JSON. A
1201+
# valid fragment always contains a quoted key; if none is present, fall back to one. The glob
1202+
# '*"*' is the most portable bash test (no POSIX class / pattern-substitution), safe on bash 3.x+.
1203+
[[ "$system_info" != *'"'* ]] && system_info='"operating_system": "unknown"'
1204+
11951205
# Get HAProxy version for heartbeat (safe extraction, fallback to "unknown")
11961206
local haproxy_version="unknown"
11971207
if command -v haproxy &> /dev/null; then

frontend/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "haproxy-openmanager-frontend",
3-
"version": "1.8.2",
3+
"version": "1.8.3",
44
"description": "HAProxy Load Balancer Management UI",
55
"license": "AGPL-3.0-or-later",
66
"dependencies": {

version.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"version": "1.8.2",
3-
"releaseName": "ACME nonce fix (ZeroSSL registration)",
2+
"version": "1.8.3",
3+
"releaseName": "Agent heartbeat JSON fix",
44
"releaseDate": "2026-06-25"
55
}

0 commit comments

Comments
 (0)