Add so-postgres Salt states and infrastructure by TOoSmOotH · Pull Request #15749 · Security-Onion-Solutions/securityonion

TOoSmOotH · 2026-04-09T17:46:48Z

Summary

Salt states: init, enabled, disabled, config, ssl, auth, sostatus
TLS via SO CA-signed certs with postgresql.conf template
Two-tier auth: postgres superuser + so_postgres application user
Firewall restricts port 5432 to manager-only (HA-ready)
Host management scripts: so-postgres-manage, start/stop/restart
Daily pg_dumpall backup with 7-day retention
Added to container image pull list, docker defaults, top.sls, pillar/top.sls, allowed_states, firewall, CA signing policies, setup scripts
Enabled by default (only applied to manager-type nodes)

Test plan

Install on manager/standalone — so-postgres appears in so-status
Verify firewall blocks non-manager access to port 5432
Verify so-postgres-manage shell connects successfully
Verify daily backup cron: crontab -l | grep so-postgres-backup
Verify TLS: docker exec so-postgres psql -U postgres -c "SHOW ssl;"

Phase 1 of the PostgreSQL central data platform: - Salt states: init, enabled, disabled, config, ssl, auth, sostatus - TLS via SO CA-signed certs with postgresql.conf template - Two-tier auth: postgres superuser + so_postgres application user - Firewall restricts port 5432 to manager-only (HA-ready) - Wired into top.sls, pillar/top.sls, allowed_states, firewall containers map, docker defaults, CA signing policies, and setup scripts for all manager-type roles

- so-postgres-manage: wraps docker exec for psql operations (sql, sqlfile, shell, dblist, userlist) - so-postgres-start/stop/restart: standard container lifecycle - Scripts installed to /usr/sbin via file.recurse in config.sls

Add to both the import and default manager container lists so the image gets downloaded during installation.

- pg_dumpall piped through gzip, stored in /nsm/backup/ - Runs daily at 00:05 (4 minutes after config backup) - 7-day retention matching existing config backup policy - Skips gracefully if container isn't running

Safe because postgres states are only applied to manager-type nodes via top.sls and allowed_states.map.jinja.

Matches the elasticsearch.auth pattern where auth states use the full sls path check and are explicitly listed.

- Create vars/postgres.map.jinja for postgres auth globals - Add POSTGRES_GLOBALS to all manager-type role vars (manager, eval, standalone, managersearch, import) - Add postgres module config to soc/defaults.yaml - Inject so_postgres credentials from auth pillar into soc/defaults.map.jinja (conditional on auth pillar existing)

SOC connects to postgres via the host network, not the Docker bridge network, so it needs the manager's IP address rather than the container hostname.

Removed postgres from soc/defaults.yaml (shared by all nodes) and moved it entirely into defaults.map.jinja, which only injects the config when postgres auth pillar exists (manager-type nodes). Sensors and other non-manager nodes will not have a postgres module section in their sensoroni.json, so sensoroni won't try to connect.

Use format() with %L for SQL literal escaping instead of raw string interpolation. Also ALTER ROLE if user already exists to keep password in sync with pillar.

Injects the postgres superuser password from secrets pillar so SOC can run schema migrations as admin before switching to the app user for normal operations.

Postgres module now queries Elasticsearch directly via HTTP for the chat migration (bypasses RBAC that needs user context). Pass esHostUrl, esUsername, esPassword alongside postgres creds.

Introduces global.telegraf_output (INFLUXDB|POSTGRES|BOTH, default BOTH) so Telegraf can write metrics to Postgres alongside or instead of InfluxDB. Each minion authenticates with its own so_telegraf_<minion> role and writes to a matching schema inside a shared so_telegraf database, keeping blast radius per-credential to that minion's data. - Per-minion credentials auto-generated and persisted in postgres/auth.sls - postgres/telegraf_users.sls reconciles roles/schemas on every apply - Firewall opens 5432 only to minion hostgroups when Postgres output is active - Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new minions automatically on key accept - soup post_to_3.1.0 backfills users for existing minions on upgrade - so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks - so-telegraf-trim + nightly cron prune rows older than postgres.telegraf.retention_days (default 14)

Telegraf's postgresql output stores tag values either as individual columns on <metric>_tag or as a single JSONB 'tags' column, depending on plugin version. Introspect information_schema.columns and build the right accessor per tag instead of assuming one layout.

The psql invocation flag '-v ON_ERROR_STOP=1' used by the so-postgres init script gets flagged by so-log-check because the token 'ERROR' matches its error regex. Add to the exclusion list.

Per-minion schemas cause table count to explode (N minions * M metrics) and the per-minion revocation story isn't worth it when retention is short. Move all minions to a shared 'telegraf' schema while keeping per-minion login credentials for audit. - New so_telegraf NOLOGIN group role owns the telegraf schema; each per-minion role is a member and inherits insert/select via role inheritance - Telegraf connection string uses options='-c role=so_telegraf' so tables auto-created on first write belong to the group role - so-telegraf-trim walks the flat telegraf.* table set instead of per-minion schemas - so-stats-show filters by host tag; CLI arg is now the hostname as tagged by Telegraf rather than a sanitized schema suffix - Also renames so-show-stats -> so-stats-show

High-cardinality inputs (docker, procstat, kafka) trigger ALTER TABLE ADD COLUMN on every new field name, and with all minions writing into a shared 'telegraf' schema the metric tables hit Postgres's 1600-column per-table ceiling quickly. Setting fields_as_jsonb and tags_as_jsonb on the postgresql output keeps metric tables fixed at (time, tag_id, fields jsonb) and tag tables at (tag_id, tags jsonb). - so-stats-show rewritten to use JSONB accessors ((fields->>'x')::numeric, tags->>'host', etc.) and cast memory/disk sizes to bigint so pg_size_pretty works - Drop regex/regexFailureMessage from telegraf_output SOC UI entry to match the convention upstream used when removing them from mdengine/pcapengine/pipeline; options: list drives validation

Every telegraf.* metric table is now a daily time-range partitioned parent managed by pg_partman. Retention drops old partitions instead of the row-by-row DELETE that so-telegraf-trim used to run nightly, and dashboards will benefit from partition pruning at query time. - Load pg_cron at server start via shared_preload_libraries and point cron.database_name at so_telegraf so job metadata lives alongside the metrics - Telegraf create_templates override makes every new metric table a PARTITION BY RANGE (time) parent registered with partman.create_parent in one transaction (1 day interval, 3 premade) - postgres_telegraf_group_role now also creates pg_partman and pg_cron extensions and schedules hourly partman.run_maintenance_proc - New retention reconcile state updates partman.part_config.retention from postgres.telegraf.retention_days on every apply - so_telegraf_trim cron is now unconditionally absent; script stays on disk as a manual fallback

init-users.sh only runs on a fresh data dir, so upgrades onto an existing /nsm/postgres volume never got so_telegraf. Pinning partman's schema also makes partman.part_config reliably resolvable.

- Telegraf's partman template passed p_type:='native', which pg_partman 5.x (the version shipped by postgresql-17-partman on Debian) rejects. Switched to 'range' so partman.create_parent() actually creates partitions and Telegraf's INSERTs succeed. - Added a postgres_wait_ready gate in telegraf_users.sls so psql execs don't race the init-time restart that docker-entrypoint.sh performs. - so-verify now ignores the literal "-v ON_ERROR_STOP=1" token in the setup log. Dropped the matching entry from so-log-check, which scans container stdout where that token never appears.

- Telegraf's outputs.postgresql plugin uses Go text/template syntax, not uppercase tokens. The {TABLE}/{COLUMNS}/{TABLELITERAL} strings were passed through to Postgres literally, producing syntax errors on every metric's first write. Switch to {{ .table }}, {{ .columns }}, and {{ .table|quoteLiteral }} so partitioned parents and the partman create_parent() call succeed. - Replace the \gexec "CREATE DATABASE ... WHERE NOT EXISTS" idiom in both init-users.sh and telegraf_users.sls with an explicit shell conditional. The prior idiom occasionally fired CREATE DATABASE even when so_telegraf already existed, producing duplicate-key failures.

Telegraf calls partman.create_parent() on first write of each metric, which needs USAGE on the partman schema, EXECUTE on its functions and procedures, and DML on partman.part_config.

pg_partman 5.x splits p_parent_table on '.' and looks up the parts as raw identifiers, so the literal must be 'schema.name' rather than the double-quoted form quoteLiteral emits for .table.

pg_partman 5.x requires the control column to be NOT NULL; Telegraf's generated columns are nullable by default.

pg_partman 5.x's create_partition() creates a per-parent template table inside the partman schema at runtime, which requires CREATE on that schema. Also extend ALTER DEFAULT PRIVILEGES so the runtime- created template tables are accessible to so_telegraf.

reyesj2 · 2026-04-20T18:26:44Z


    [[ "$POSTVERSION" =~ ^2\.4\.21[0-9]+$ ]] && post_to_3.0.0
-    [[ "$POSTVERSION" == "3.0.0" ]] && post_to_3.1.0
+    [[ "$POSTVERSION" =~ 3.0.0 ]] && post_to_3.1.0


should stay as

[[ "$POSTVERSION" == "3.0.0" ]] && post_to_3.1.0

feature/postgres had rewritten the 3.1.0 upgrade block, dropping the elastic upgrade work 3/dev landed for 9.0.8→9.3.3: elasticsearch_backup_index_templates, the component template state cleanup, and the /usr/sbin/so-kibana-space-defaults post-upgrade call. It also carried an older ES upgrade mapping (8.18.8→9.0.8) that was superseded on 3/dev (9.0.8→9.3.3 for 3.0.0-20260331), and a handful of latent shell-quoting regressions in verify_es_version_compatibility and the intermediate-upgrade helpers. Adopt the 3/dev soup verbatim and only add the new Telegraf Postgres provisioning to post_to_3.1.0 on top of so-kibana-space-defaults.

state.apply takes a single mods argument; space-separated names are not a list, so `state.apply postgres.auth postgres.telegraf_users` was only applying postgres.auth and silently dropping the telegraf_users state. Use comma-separated mods and add queue=True to match the rest of soup.

The Telegraf backend selector lived at global.telegraf_output but it is a Telegraf-scoped setting, not a cross-cutting grid global. Move both the value and the UI annotation under the telegraf pillar so it shows up alongside the other Telegraf tuning knobs in the Configuration UI. - salt/telegraf/defaults.yaml: add telegraf.output: BOTH - salt/telegraf/soc_telegraf.yaml: add telegraf.output annotation - salt/global/defaults.yaml: remove global.telegraf_output - salt/global/soc_global.yaml: remove global.telegraf_output annotation - salt/vars/globals.map.jinja: drop telegraf_output from GLOBALS - salt/firewall/map.jinja: read via pillar.get('telegraf:output') - salt/postgres/telegraf_users.sls: read via pillar.get('telegraf:output') - salt/telegraf/etc/telegraf.conf: read via TELEGRAFMERGED.output - salt/postgres/tools/sbin/so-stats-show: update user-facing docs No behavioral change — default stays BOTH.

- firewall/map.jinja and postgres/telegraf_users.sls now pull the telegraf output selector through TELEGRAFMERGED so the defaults.yaml value (BOTH) is the source of truth and pillar overrides merge in cleanly. pillar.get with a hardcoded fallback was brittle and would disagree with defaults.yaml if the two ever diverged. - Rename salt/postgres/files/pg_hba.conf.jinja to pg_hba.conf and drop template: jinja from config.sls — the file has no jinja besides the comment header.

Add Configuration-UI annotations for every postgres pillar key defined in defaults.yaml, not just telegraf.retention_days: - postgres.enabled — readonly; admin-visible but toggled via state - postgres.telegraf.retention_days — drop advanced so user-tunable knobs surface in the default view - postgres.config.max_connections, shared_buffers, log_min_messages — user-tunable performance/verbosity knobs, not advanced - postgres.config.listen_addresses, port, ssl, ssl_cert_file, ssl_key_file, ssl_ca_file, hba_file, log_destination, logging_collector, shared_preload_libraries, cron.database_name — infra/Salt-managed, marked advanced so they're visible but out of the way No defaults.yaml change; value-side stays the same.

salt/auth fires on every minion authentication — including every minion restart and every master restart — so the reactor was re-running the postgres.auth + postgres.telegraf_users + telegraf orchestration for every already-accepted minion on every reconnect. The underlying states are idempotent, so this was wasted work and log noise, not a correctness issue. Switch the subscription to salt/key, which fires only when the master actually changes a key's state (accept / reject / delete). Match the pattern used by salt/reactor/check_hypervisor.sls (registered in salt/salt/cloud/reactor_config_hypervisor.sls) and add the result==True guard so half-failed key operations don't trigger the orchestration.

New minions run highstate as part of onboarding, which already applies the telegraf state with the fresh pillar entry we just wrote. Pushing telegraf a second time from the reactor is redundant. - Remove the MINION-scoped salt.state block from the orch; keep only the manager-side postgres.auth + postgres.telegraf_users provisioning. - Stop passing minion_id as pillar in the reactor; the orch doesn't reference it anymore.

postgres_wait_ready requires docker_container: so-postgres, which is declared in postgres.enabled. Running postgres.telegraf_users on its own — as the reactor orch and the soup post-upgrade step both do — errored because Salt couldn't resolve the require. Include postgres.enabled from postgres.telegraf_users so the container state is always in the render. postgres.enabled already includes telegraf_users; Salt de-duplicates the circular include and the included states are all idempotent, so repeated application is a no-op.

The previous MANAGER resolution used pillar.get('setup:manager') with a fallback to grains.get('master'). Neither works from the reactor: setup:manager is only populated by the setup workflow (not by reactor runs), and grains.master returns the minion's master-hostname setting, not a targetable minion id. Match the pattern used by orch/delete_hypervisor.sls: compound-target whichever minion is the manager via role grain.

- config.sls: postgresconfdir creates /opt/so/conf/postgres, so the two subdirectories under it (postgressecretsdir, postgresinitdir) don't need their own makedirs — require the parent instead. - soc_postgres.yaml: helpLink for every annotated key now points to 'postgres' instead of the carried-over 'influxdb' slug.

The so-postgres-backup script and its cron were living under salt/backup/config_backup.sls, which meant the backup script and cron were deployed independently of whether postgres was enabled/disabled. - Relocate salt/backup/tools/sbin/so-postgres-backup to salt/postgres/tools/sbin/so-postgres-backup so the existing postgres_sbin file.recurse in postgres/config.sls picks it up with everything else — no separate file.managed needed. - Remove postgres_backup_script and so_postgres_backup from salt/backup/config_backup.sls. - Add cron.present for so_postgres_backup to salt/postgres/enabled.sls and the matching cron.absent to salt/postgres/disabled.sls so the cron follows the container's lifecycle.

pillar/top.sls only distributes postgres.auth to manager-class roles, so sensors / heavynodes / searchnodes / receivers / fleet / idh / hypervisor / desktop minions never received the postgres telegraf password they need to write metrics. Broadcasting the aggregate postgres.auth pillar to every role would leak the so_postgres admin password and every other minion's cred. Fan out per-minion credentials into each minion's own pillar file at /opt/so/saltstack/local/pillar/minions/<id>.sls. That file is already distributed by pillar/top.sls exclusively to the matching minion via `- minions.{{ grains.id }}`, so each minion sees only its own postgres.telegraf.{user,pass} and nothing else. - salt/postgres/auth.sls: after writing the manager-scoped aggregate pillar, fan the per-minion creds out via so-yaml.py replace for every up-minion. Creates the minion pillar file if missing. Requires postgres_auth_pillar so the manager pillar lands first. - salt/telegraf/etc/telegraf.conf: consume postgres:telegraf:user and postgres:telegraf:pass directly from the minion's own pillar instead of walking postgres:auth:users which isn't visible off the manager.

Every postgres.auth run was rewriting every minion pillar file via two so-yaml.py replace calls, even when nothing had changed. Passwords are only generated on first encounter (see the `if key not in telegraf_users` guard) and never rotate, so re-writing the same values on every apply is wasted work and noisy state output. Add an `unless:` check that compares the already-written postgres.telegraf.user to the one we'd set. If they match, skip the fan-out entirely. On first apply for a new minion the key isn't there, so the replace runs; on subsequent applies it's a no-op.

postgres.auth was running an `unless` shell check per up-minion on every manager highstate, even when nothing had changed — N fork+python starts of so-yaml.py add up on large grids. The work is only needed when a specific minion's key is accepted. - salt/postgres/auth.sls: fan out only when postgres_fanout_minion pillar is set (targets that single minion). Manager highstates with no pillar take a zero-N code path. - salt/reactor/telegraf_user_sync.sls: re-pass the accepted minion id as postgres_fanout_minion to the orch. - salt/orch/telegraf_postgres_sync.sls: forward the pillar to the salt.state invocation so the state render sees it. - salt/manager/tools/sbin/soup: for the one-time 3.1.0 backfill, drop the per-minion state.apply and do an in-shell loop over the minion pillar files using so-yaml.py directly. Skips minions that already have postgres.telegraf.user set.

The empty-pillar case produced a telegraf.conf with `user= password=` which libpq misparses ("password=" gets consumed as the user value), yielding `password authentication failed for user "password="` on every manager without a prior fan-out (fresh install, not the salt-key path the reactor handles). Two fixes: - salt/postgres/auth.sls: always fan for grains.id in addition to any postgres_fanout_minion from the reactor, so the manager's own pillar is populated on every postgres.auth run. The existing `unless` guard keeps re-runs idempotent. - salt/telegraf/etc/telegraf.conf: gate the [[outputs.postgresql]] block on PG_USER and PG_PASS being non-empty. If a minion hasn't received its pillar yet the output block simply isn't rendered — the next highstate picks up the creds once the fan-out completes, and in the meantime telegraf keeps running the other outputs instead of erroring with a malformed connection string.

replace calls removeKey before addKey, so running `so-yaml.py replace` on a new dotted key whose parent doesn't exist — e.g., postgres.auth fanning postgres.telegraf.user into a minion pillar file that has never carried any postgres.* keys — crashed with KeyError: 'postgres' from removeKey recursing into a missing parent dict. Make removeKey a no-op when an intermediate key is absent so that: - `remove` has the natural "remove if exists" semantics, and - `replace` works for brand-new nested keys.

Two fixes on the postgres telegraf fan-out path: 1. postgres.auth cmd.run leaked the password to the console because Salt always prints the Name: field and `show_changes: False` does not apply to cmd.run. Move the user and password into the `env:` attribute so the shell body still sees them via $PG_USER / $PG_PASS but Salt's state reporter never renders them. 2. so-minion's addMinion -> setupMinionFiles sequence removes the minion pillar file and rewrites it from scratch, which wipes the postgres.telegraf.* entries the reactor may have already written on salt-key accept. Add a postgres.auth fan-out step to orch.deploy_newnode (the orch so-minion kicks off after setupMinionFiles) and require it from the new minion's highstate. Idempotent via the existing unless: guard in postgres.auth.

Simpler, race-free replacement for the reactor + orch + fan-out chain. - salt/manager/tools/sbin/so-minion: expand add_telegraf_to_minion to generate a random 72-char password, reuse any existing password from the aggregate pillar, write postgres.telegraf.{user,pass} into the minion's own pillar file, and update the aggregate pillar so postgres.telegraf_users can CREATE ROLE on the next manager apply. Every create<ROLE> function already calls this hook, so add / addVM / setup dispatches are all covered identically and synchronously. - salt/postgres/auth.sls: strip the fanout_targets loop and the postgres_telegraf_minion_pillar_<safe> cmd.run block — it's now redundant. The state still manages the so_postgres admin user and writes the aggregate pillar for postgres.telegraf_users to consume. - salt/reactor/telegraf_user_sync.sls: deleted. - salt/orch/telegraf_postgres_sync.sls: deleted. - salt/salt/master.sls: drop the reactor_config_telegraf block that registered the reactor on /etc/salt/master.d/reactor_telegraf.conf. - salt/orch/deploy_newnode.sls: drop the manager_fanout_postgres_telegraf step and the require: it added to the newnode highstate. Back to its original 3/dev shape. No more ephemeral postgres_fanout_minion pillar, no more async salt/key reactor, no more so-minion setupMinionFiles race: the pillar write happens inline inside setupMinionFiles itself.

Paired with the add path in add_telegraf_to_minion: when a minion is removed, drop its entry from the aggregate postgres pillar and drop the matching so_telegraf_<safe> role from the database. Without this, stale entries and DB roles accumulate over time. Makes rotate-password and compromise-recovery both a clean delete+add: so-minion -o=delete -m=<id> so-minion -o=add -m=<id> The first call drops the role and clears the aggregate pillar; the second generates a brand-new password. The cleanup is best-effort — if so-postgres isn't running or the DROP ROLE fails (e.g., the role owns unexpected objects), we log a warning and continue so the minion delete itself never gets blocked by postgres state. Admins can mop up stray roles manually if that happens.

The reactor path is gone; so-minion now owns add/delete for new minions. The backfill itself is unchanged — postgres.auth's up_minions fallback fills the aggregate, postgres.telegraf_users creates the roles, and the bash loop fans to per-minion pillar files — so the pre-feature upgrade story still works end-to-end. Just refresh the comment so it isn't misleading.

The old flow had two writers for each per-minion Telegraf password (so-minion wrote the minion pillar; postgres.auth regenerated any missing aggregate entries). They drifted on first-boot and there was no trigger to create DB roles when a new minion joined. Split responsibilities: - pillar/postgres/auth.sls (manager-scoped) keeps only the so_postgres admin cred. - pillar/telegraf/creds.sls (grid-wide) holds a {minion_id: {user, pass}} map, shadowed per-install by the local-pillar copy. - salt/manager/tools/sbin/so-telegraf-cred is the single writer: flock, atomic YAML write, PyYAML safe_dump so passwords never round-trip through so-yaml.py's type coercion. Idempotent add, quiet remove. - so-minion's add/remove hooks now shell out to so-telegraf-cred instead of editing pillar files directly. - postgres.telegraf_users iterates the new pillar key and CREATE/ALTERs roles from it; telegraf.conf reads its own entry via grains.id. - orch.deploy_newnode runs postgres.telegraf_users on the manager and refreshes the new minion's pillar before the new node highstates, so the DB role is in place the first time telegraf tries to connect. - soup's post_to_3.1.0 backfills the creds pillar from accepted salt keys (idempotent) and runs postgres.telegraf_users once to reconcile the DB.

Swap the ~150-line Python implementation for a 48-line bash script that delegates YAML mutation to so-yaml.py — the same helper so-minion and soup already use. Same semantics: seed the creds pillar on first use, idempotent add, silent remove. SO minion ids are dot-free by construction (setup/so-functions:1884 strips everything after the first '.'), so using the raw id as the so-yaml.py key path is safe.

so-telegraf-cred was committed with mode 644, causing `so-telegraf-cred add "$MINION_ID"` in so-minion's add_telegraf_to_minion to fail with "Permission denied" and log "Failed to provision postgres telegraf cred for <minion>". Mark it executable. Also bail early in seed_creds_file if mkdir/printf/chmod fail, and in so-yaml.py loadYaml surface a clear stderr message with the filename instead of an unhandled FileNotFoundError traceback.

Exercises the FileNotFoundError and generic-exception branches added to loadYaml in the previous commit, restoring 100% coverage required by the build.

pillar/top.sls now references postgres.soc_postgres / postgres.adv_postgres unconditionally, but make_some_dirs only runs at install time so managers upgrading from 3.0.0 have no local/pillar/postgres/ and salt-master fails pillar render on the first post-upgrade restart. Similarly, secrets_pillar is a no-op on upgrade (secrets.sls already exists), so secrets:postgres_pass never gets seeded and the postgres container's POSTGRES_PASSWORD_FILE and SOC's PG_ADMIN_PASS would land empty after highstate. Add ensure_postgres_local_pillar and ensure_postgres_secret to up_to_3.1.0 so the stubs and secret exist before masterlock/salt-master restart. Both are idempotent and safe to re-run.

The manager's /etc/salt/minion (written by so-functions:configure_minion) has no file_roots, so salt-call --local falls back to Salt's default /srv/salt and fails with "No matching sls found for 'postgres.telegraf_users' in env 'base'". || true was silently swallowing the error, which meant the DB roles for the pillar entries just populated by the so-telegraf-cred backfill loop never actually got created. Route through salt-master instead; its file_roots already points at the default/local salt trees.

Removed helpLink for influxdb from endgamehost configuration.

TOoSmOotH and others added 30 commits April 8, 2026 10:58

Add so-postgres host management scripts

762e73f

- so-postgres-manage: wraps docker exec for psql operations (sql, sqlfile, shell, dblist, userlist) - so-postgres-start/stop/restart: standard container lifecycle - Scripts installed to /usr/sbin via file.recurse in config.sls

Add so-postgres to container image pull list

358a2e6

Add to both the import and default manager container lists so the image gets downloaded during installation.

Add daily PostgreSQL database backup

61bdfb1

- pg_dumpall piped through gzip, stored in /nsm/backup/ - Runs daily at 00:05 (4 minutes after config backup) - 7-day retention matching existing config backup policy - Skips gracefully if container isn't running

Enable postgres by default

46e38d3

Safe because postgres states are only applied to manager-type nodes via top.sls and allowed_states.map.jinja.

Add postgres.auth to allowed_states

b87af8e

Matches the elasticsearch.auth pattern where auth states use the full sls path check and are explicitly listed.

Use manager IP for postgres hostUrl instead of container hostname

c1b1452

SOC connects to postgres via the host network, not the Docker bridge network, so it needs the manager's IP address rather than the container hostname.

Fix init-users.sh password escaping for special characters

da1045e

Use format() with %L for SQL literal escaping instead of raw string interpolation. Also ALTER ROLE if user already exists to keep password in sync with pillar.

Add postgres adminPassword to SOC module config

1ffdcab

Injects the postgres superuser password from secrets pillar so SOC can run schema migrations as admin before switching to the app user for normal operations.

Add ES credentials to postgres module config for migration

9ccd0ac

Postgres module now queries Elasticsearch directly via HTTP for the chat migration (bypasses RBAC that needs user context). Pass esHostUrl, esUsername, esPassword alongside postgres creds.

so-log-check: exclude psql ON_ERROR_STOP flag

c124186

The psql invocation flag '-v ON_ERROR_STOP=1' used by the so-postgres init script gets flagged by so-log-check because the token 'ERROR' matches its error regex. Add to the exclusion list.

Fix soup

a2ffb92

Fix soup

2013bf9

Fix soup

f11d315

Merge branch '3/dev' into feature/postgres

f7b80f5

Create so_telegraf DB from Salt and pin pg_partman schema

7d07f3c

init-users.sh only runs on a fresh data dir, so upgrades onto an existing /nsm/postgres volume never got so_telegraf. Pinning partman's schema also makes partman.part_config reliably resolvable.

Escape Go-template placeholders from Jinja in telegraf.conf

af9330a

Grant so_telegraf access to partman schema

927eba5

Telegraf calls partman.create_parent() on first write of each metric, which needs USAGE on the partman schema, EXECUTE on its functions and procedures, and DML on partman.part_config.

Pass unquoted schema.name to partman.create_parent

0fddcd8

pg_partman 5.x splits p_parent_table on '.' and looks up the parts as raw identifiers, so the literal must be 'schema.name' rather than the double-quoted form quoteLiteral emits for .table.

Mark time column NOT NULL before partman.create_parent

f11e9da

pg_partman 5.x requires the control column to be NOT NULL; Telegraf's generated columns are nullable by default.

reyesj2 reviewed Apr 20, 2026

View reviewed changes

TOoSmOotH and others added 29 commits April 20, 2026 14:32

Merge remote-tracking branch 'origin/3/dev' into feature/postgres

1537ba5

Change so-postgres final_octet to 47

37e9257

so-yaml_test: cover loadYaml error paths

d5c0ec4

Exercises the FileNotFoundError and generic-exception branches added to loadYaml in the previous commit, restoring 100% coverage required by the build.

Remove helpLink for influxdb in soc_global.yaml

a6948e8

Removed helpLink for influxdb from endgamehost configuration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add so-postgres Salt states and infrastructure#15749

Add so-postgres Salt states and infrastructure#15749
TOoSmOotH wants to merge 65 commits into3/devfrom
feature/postgres

TOoSmOotH commented Apr 9, 2026

Uh oh!

reyesj2 Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TOoSmOotH commented Apr 9, 2026

Summary

Test plan

Uh oh!

reyesj2 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants