Skip to content
Draft
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
1fb376b
Revert "SOLR-17153: CloudSolrClient should not throw "Collection not …
gerlowskija Apr 5, 2024
7f3e980
Allow embedded-ZK to run in quorum/ensemble mode
gerlowskija Apr 5, 2024
459ecf6
Revert "Revert "SOLR-17153: CloudSolrClient should not throw "Collect…
gerlowskija Sep 14, 2025
de59401
Merge branch 'main' into spike-zk-quorum
gerlowskija Sep 14, 2025
ff0a9ca
Merge remote-tracking branch 'upstream/main' into pr/2391
epugh Oct 13, 2025
cd6ecc5
Take advantage of Solr node roles to determine when to start embedded zk
epugh Oct 14, 2025
9e6ef68
Strip out old log4j workaround not needed, look at ide warnings.
epugh Oct 14, 2025
479e85f
check in some work to be removed
epugh Oct 14, 2025
0601701
Merge branch 'refs/heads/main' into spike-zk-quorum
janhoy Oct 15, 2025
27ea8e9
Properly clean up ZK server resources
janhoy Oct 15, 2025
6bd62ec
Fix precommit in ZkContainer
janhoy Oct 16, 2025
8ee628d
New test TestEmbeddedZkQuorum
janhoy Oct 16, 2025
d8bac96
Handle standalone case in ZkContainer.initZookeeper
janhoy Oct 16, 2025
c5ee205
Spent too much time on this, backing it out.
epugh Oct 16, 2025
991ba1e
Merge remote-tracking branch 'upstream/main' into spike-zk-quorum
epugh Nov 1, 2025
47e928b
add change log
epugh Nov 1, 2025
7cd745d
Redo explanation to be clearer
epugh Nov 1, 2025
42d5213
update variable name
epugh Nov 1, 2025
f8d4f8a
remove unneed variable and if statement, and add a reminder
epugh Nov 1, 2025
bacb4af
remove the /solr/initialized zk node, it appears to be a multi thread…
epugh Nov 1, 2025
5199b84
Remove intermediate test class and simplify cluster set up
epugh Nov 1, 2025
3f785da
Better nesting of zkServerEnabled check and if in quorum mode...
epugh Nov 1, 2025
367d37b
zkEnabled does actually do anything!
epugh Nov 1, 2025
898897a
Merge branch 'main' into spike-zk-quorum
janhoy Jan 28, 2026
ac9bf9e
Update code to work with latest main
janhoy Jan 28, 2026
227f8cc
Safer port allocation in MiniSolrCloudCluster
janhoy Jan 28, 2026
92b8420
Two new tests for resilience
janhoy Jan 28, 2026
74f5f58
Improve and refactor the new tests a bit
janhoy Jan 28, 2026
881c1ec
Improve resilience test by waiting for active collection
janhoy Jan 28, 2026
c835284
Precommit
janhoy Jan 29, 2026
ce27f55
ForbiddenAPI
janhoy Jan 29, 2026
d185c99
Changelog with JIRA link
janhoy Jan 29, 2026
408b879
Move ZooKeeperServerMain init into own static method on SolrZkServer
janhoy Jan 29, 2026
00ff572
Merge main into spike-zk-quorum to bring branch up to date
janhoy May 10, 2026
4d4426a
Tidy
janhoy May 10, 2026
9038b90
Fix test failure
janhoy May 10, 2026
ff8e204
ForbiddenAPI: replace Collections.singletonList with List.of
janhoy May 11, 2026
a439a7d
SOLR-18094: Fix @AwaitsFix placeholder URL in TestEmbeddedZkQuorum
janhoy May 11, 2026
c67c774
SOLR-18094: Fix NPE in newSolrClient when zkServer is null in quorum …
janhoy May 11, 2026
f39af91
SOLR-18094: Fix quorum/leader port derivation to match SolrZkServerPr…
janhoy May 11, 2026
f25e0ce
SOLR-18094: Reserve quorum/election ports (+1/+2) in reservePortPairs
janhoy May 11, 2026
26c9e85
SOLR-18094: Replace hard sleep with ZK connectivity wait for quorum f…
janhoy May 11, 2026
5dcf04b
SOLR-18094: Add TODO comments for waitForLiveNodes and myId host matc…
janhoy May 11, 2026
63dc879
SOLR-18094: Make zkServerEmbedded field volatile
janhoy May 11, 2026
8b25153
SOLR-18094: Wait for ZK quorum threads to stop after embedded ZK shut…
janhoy May 11, 2026
5adaaef
SOLR-18094: Validate zkHost entry format when starting quorum mode
janhoy May 11, 2026
442e3c9
SOLR-18094: Revert unrelated changes from libs.versions.toml and pack…
janhoy May 11, 2026
4228fa8
SOLR-18094: Add ref-guide documentation for zookeeper_quorum node rol…
janhoy May 11, 2026
0cc2d38
Add major-changes doc header
janhoy May 11, 2026
d954146
SOLR-18094: Add null check for zkThread in SolrZkServer.stop()
janhoy May 11, 2026
47251b9
SOLR-18094: zookeeper_quorum role alone is sufficient to start embedd…
janhoy May 11, 2026
c573590
SOLR-18094: Log 'Closed embedded ZK' after thread-drain wait, not before
janhoy May 11, 2026
9db9a2c
SOLR-18094: Add TODO comment explaining dead code in SolrZkServer.inj…
janhoy May 11, 2026
829ec2f
SOLR-18094: Pass Properties directly to startZooKeeperServerEmbedded,…
janhoy May 11, 2026
1e1f7f2
SOLR-18094: Add zookeeper_quorum to node-roles API example output
janhoy May 11, 2026
5d969e5
SOLR-18094: Fix port-collision in embedded ZK ensemble doc example
janhoy May 11, 2026
ccf6ec3
Avoid zoo_home collision on same host
janhoy May 11, 2026
f281185
Eliminate SolrZkServer in favour of using ZooKeeperServerEmbedded
epugh May 12, 2026
fe81c20
Much nicer to read.
epugh May 12, 2026
f51ee69
Better solution for zoo_home configuration in the example
janhoy May 13, 2026
b071bd2
Use zoo_data as default zk data dir, relative to SOLR_HOME
janhoy May 13, 2026
197e0cf
Make ZK port offsets configurable
janhoy May 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions changelog/unreleased/SOLR-18094-zk-quorum-noderole.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# See https://github.com/apache/solr/blob/main/dev-docs/changelog.adoc
title: Capability for Solr to run embedded ZooKeeper in a quorum/ensemble mode, allowing multiple Solr nodes to form a distributed ZooKeeper ensemble within their own processes. Controlled by a new solr node-role.
type: added # added, changed, fixed, deprecated, removed, dependency_update, security, other
authors:
- name: Eric Pugh
- name: Jason Gerlowski
- name: Jan Høydahl
links:
- name: SOLR-18094
url: https://issues.apache.org/jira/browse/SOLR-18094
2 changes: 1 addition & 1 deletion gradle/libs.versions.toml
Original file line number Diff line number Diff line change
Expand Up @@ -240,8 +240,8 @@ amazon-awssdk-s3 = { module = "software.amazon.awssdk:s3", version.ref = "amazon
amazon-awssdk-sdkcore = { module = "software.amazon.awssdk:sdk-core", version.ref = "amazon-awssdk" }
amazon-awssdk-sts = { module = "software.amazon.awssdk:sts", version.ref = "amazon-awssdk" }
androidx-lifecycle-runtimeCompose = { module = "org.jetbrains.androidx.lifecycle:lifecycle-runtime-compose", version.ref = "androidx-lifecycle" }
androidx-lifecycle-viewmodelCompose = { module = "org.jetbrains.androidx.lifecycle:lifecycle-viewmodel-compose", version.ref = "androidx-lifecycle" }
androidx-lifecycle-viewModelNav3 = { module = "org.jetbrains.androidx.lifecycle:lifecycle-viewmodel-navigation3", version.ref = "androidx-lifecycle" }
androidx-lifecycle-viewmodelCompose = { module = "org.jetbrains.androidx.lifecycle:lifecycle-viewmodel-compose", version.ref = "androidx-lifecycle" }
androidx-material3-adaptive = { module = "org.jetbrains.compose.material3.adaptive:adaptive", version.ref = "androidx-adaptive" }
androidx-material3-adaptive-nav3 = { module = "org.jetbrains.compose.material3.adaptive:adaptive-navigation3", version.ref = "androidx-adaptive" }
androidx-navigation3-ui = { module = "org.jetbrains.androidx.navigation3:navigation3-ui", version.ref = "androidx-navigation3" }
Expand Down
84 changes: 47 additions & 37 deletions solr/core/src/java/org/apache/solr/cloud/SolrZkServer.java
Original file line number Diff line number Diff line change
Expand Up @@ -45,26 +45,58 @@ public class SolrZkServer {
// Per ZooKeeper, "0" means no limit for max client connections.
public static final String ZK_MAX_CNXNS_DEFAULT = "0";

boolean zkRun = false;
String zkHost;

int solrPort;
Properties props;
SolrZkServerProps zkProps;

private Thread zkThread; // the thread running a zookeeper server, only if zkRun is true
private Thread zkThread; // the thread running a zookeeper server, only if zkServerEnabled is true

private Path dataHome; // o.a.zookeeper.**.QuorumPeerConfig needs a File not a Path
private String confHome;

public SolrZkServer(boolean zkRun, String zkHost, Path dataHome, String confHome, int solrPort) {
this.zkRun = zkRun;
public SolrZkServer(String zkHost, Path dataHome, String confHome, int solrPort) {
this.zkHost = zkHost;
this.dataHome = dataHome;
this.confHome = confHome;
this.solrPort = solrPort;
}

/**
* Creates and initializes a SolrZkServer instance for standalone (non-quorum) mode.
*
* @param zkHost the ZooKeeper host string (chroot will be stripped)
* @param solrHome the Solr home directory path
* @param solrHostPort the Solr host port
* @return initialized and started SolrZkServer instance
*/
public static SolrZkServer createAndStart(String zkHost, Path solrHome, int solrHostPort) {
String zkDataHome =
EnvUtils.getProperty(
"solr.zookeeper.server.datadir", solrHome.resolve("zoo_data").toString());
String zkConfHome = EnvUtils.getProperty("solr.zookeeper.server.confdir", solrHome.toString());

String strippedZkHost = stripChroot(zkHost);
SolrZkServer zkServer =
new SolrZkServer(strippedZkHost, Path.of(zkDataHome), zkConfHome, solrHostPort);
zkServer.parseConfig();
zkServer.start();

return zkServer;
}

/**
* Strips the chroot portion from a ZooKeeper host string.
*
* @param zkRun the ZooKeeper host string (e.g., "localhost:2181/solr")
* @return the host string without chroot (e.g., "localhost:2181")
*/
private static String stripChroot(String zkRun) {
if (zkRun == null || zkRun.trim().isEmpty() || zkRun.lastIndexOf('/') < 0) return zkRun;
return zkRun.substring(0, zkRun.lastIndexOf('/'));
}

public String getClientString() {
if (zkHost != null) {
return zkHost;
Expand All @@ -74,11 +106,6 @@ public String getClientString() {
return null;
}

// if the string wasn't passed as zkHost, then use the standalone server we started
if (!zkRun) {
return null;
}

InetSocketAddress addr = zkProps.getClientPortAddress();
String hostName;
// We cannot advertise 0.0.0.0, so choose the best host to advertise
Expand All @@ -97,7 +124,6 @@ public void parseConfig() {
// set default data dir
// TODO: use something based on IP+port??? support ensemble all from same solr home?
zkProps.setDataDir(dataHome);
zkProps.zkRun = zkRun;
zkProps.solrPort = Integer.toString(solrPort);
}

Expand All @@ -116,7 +142,7 @@ public void parseConfig() {

try {
props = SolrZkServerProps.getProperties(zooCfgPath);
SolrZkServerProps.injectServers(props, zkRun, zkHost);
SolrZkServerProps.injectServers(props, zkHost);
// This is the address that the embedded Zookeeper will bind to. Like Solr, it defaults to
// "127.0.0.1".
props.setProperty(
Expand All @@ -126,9 +152,8 @@ public void parseConfig() {
}
zkProps.parseProperties(props);
} catch (QuorumPeerConfig.ConfigException | IOException e) {
if (zkRun) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
}

throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
}
}

Expand All @@ -137,9 +162,6 @@ public Map<Long, QuorumPeer.QuorumServer> getServers() {
}

public void start() {
if (!zkRun) {
return;
}

ensureZkMaxCnxnsConfigured();
if (System.getProperty(ZK_WHITELIST_PROPERTY) == null) {
Expand Down Expand Up @@ -167,20 +189,11 @@ public void start() {
},
"embeddedZkServer");

if (zkProps.getServers().size() > 1) {
if (log.isInfoEnabled()) {
log.info(
"STARTING EMBEDDED ENSEMBLE ZOOKEEPER SERVER at port {}, listening on host {}",
zkProps.getClientPortAddress().getPort(),
zkProps.getClientPortAddress().getAddress().getHostAddress());
}
} else {
if (log.isInfoEnabled()) {
log.info(
"STARTING EMBEDDED ENSEMBLE ZOOKEEPER SERVER at port {}, listening on host {}",
zkProps.getClientPortAddress().getPort(),
zkProps.getClientPortAddress().getAddress().getHostAddress());
}
if (log.isInfoEnabled()) {
log.info(
"STARTING EMBEDDED ENSEMBLE ZOOKEEPER SERVER at port {}, listening on host {}",
zkProps.getClientPortAddress().getPort(),
zkProps.getClientPortAddress().getAddress().getHostAddress());
}

zkThread.setDaemon(true);
Expand All @@ -207,9 +220,7 @@ public void start() {
}

public void stop() {
if (!zkRun) {
return;
}

zkThread.interrupt();
}

Expand All @@ -224,7 +235,6 @@ class SolrZkServerProps extends QuorumPeerConfig {
private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());

String solrPort; // port that Solr is listening on
boolean zkRun;

/**
* Parse a ZooKeeper configuration file
Expand Down Expand Up @@ -253,10 +263,10 @@ public static Properties getProperties(Path configPath) throws ConfigException {
// Given zkHost=localhost:1111,localhost:2222 this will inject
// server.0=localhost:1112:1113
// server.1=localhost:2223:2224
public static void injectServers(Properties props, boolean zkRun, String zkHost) {
public static void injectServers(Properties props, String zkHost) {

// if clientPort not already set, use zkRun
if (zkRun && props.getProperty("clientPort") == null) {
if (props.getProperty("clientPort") == null) {
// int portIdx = zkRun.lastIndexOf(':');
int portIdx = "".lastIndexOf(':');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epugh This line was modified by you and causes the below if() to be dead code. Remember what's going on here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not totally... And I think it was me hacking around a bit trying to figure out all the various sys props that existed!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like figuring out what is the correct code here?

if (portIdx > 0) {
Expand Down
2 changes: 1 addition & 1 deletion solr/core/src/java/org/apache/solr/core/CoreContainer.java
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@ public JerseyAppHandlerCache getJerseyAppHandlerCache() {

private final ObjectCache objectCache = new ObjectCache();

public final NodeRoles nodeRoles = new NodeRoles(System.getProperty(NodeRoles.NODE_ROLES_PROP));
public final NodeRoles nodeRoles = new NodeRoles(EnvUtils.getProperty(NodeRoles.NODE_ROLES_PROP));

private final ExecutorService indexSearcherExecutor;

Expand Down
13 changes: 9 additions & 4 deletions solr/core/src/java/org/apache/solr/core/NodeConfig.java
Original file line number Diff line number Diff line change
Expand Up @@ -235,11 +235,16 @@ public static NodeConfig loadNodeConfig(Path solrHome, Properties nodeProperties
initModules(loader, null);
nodeProperties = SolrXmlConfig.wrapAndSetZkHostFromSysPropIfNeeded(nodeProperties);

// TODO: Only job of this block is to
// delay starting a solr core to satisfy
// ZkFailoverTest test case...
String zkHost = nodeProperties.getProperty(SolrXmlConfig.ZK_HOST);
if (StrUtils.isNotNullOrEmpty(zkHost)) {
NodeRoles nodeRoles = new NodeRoles(EnvUtils.getProperty(NodeRoles.NODE_ROLES_PROP));
boolean zookeeperQuorumNode =
NodeRoles.MODE_ON.equals(nodeRoles.getRoleMode(NodeRoles.Role.ZOOKEEPER_QUORUM));

// This block demonstrates how we pause and wait for a ZooKeeper to be available before
// continuing.
// See the ZkFailoverTest to see how changing solr.cloud.wait.for.zk.seconds impacts this
// capability.
if (StrUtils.isNotNullOrEmpty(zkHost) && !zookeeperQuorumNode) {
int startUpZkTimeOut =
1000
* Integer.getInteger(
Expand Down
12 changes: 12 additions & 0 deletions solr/core/src/java/org/apache/solr/core/NodeRoles.java
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,18 @@ public String modeWhenRoleIsAbsent() {
public Set<String> supportedModes() {
return Set.of(MODE_ON, MODE_OFF);
}
},

ZOOKEEPER_QUORUM("zookeeper_quorum") {
@Override
public Set<String> supportedModes() {
return Set.of(MODE_ON, MODE_OFF);
}

@Override
public String modeWhenRoleIsAbsent() {
return MODE_OFF;
}
};

public final String roleName;
Expand Down
Loading
Loading