Troubleshooting MongoServerError Not Primary In Canonical K8s
This article addresses a common issue encountered when deploying MongoDB on Canonical Kubernetes (K8s) using Juju, specifically the MongoServerError: not primary
error. This error typically arises when attempting to create a database user using db.createUser
while the MongoDB instance is not in a primary state. This can occur during initial setup, replication issues, or failover scenarios. The problem was observed after integrating the mongodb-k8s
charm with a database requiring charm through the database
relation.
Problem Description
The core issue manifests as a MongoServerError: not primary
error when the mongodb-k8s
charm attempts to create an operator user. This typically happens during the integration process with other charms, such as sdcore-nrf-k8s
and sdcore-nms-k8s
, which require database access. The error prevents the successful creation of database users, leaving the requiring charms in a waiting state. This article aims to provide a comprehensive understanding of the problem, its causes, and potential solutions.
The error message MongoServerError: not primary
indicates that the MongoDB instance you are trying to interact with is not the primary node in a replica set. In MongoDB, write operations, including user creation, can only be performed on the primary node. If you attempt to write to a secondary node, you will encounter this error. This is a crucial aspect of MongoDB's replication mechanism, ensuring data consistency and preventing write conflicts across the cluster. Understanding this fundamental concept is key to troubleshooting the error effectively. The MongoDB architecture relies on a primary-secondary setup within a replica set to ensure high availability and data durability. The primary node is the only member that accepts write operations, while secondary nodes replicate data from the primary. This setup guarantees that even if the primary node fails, a secondary node can take over, minimizing downtime and data loss. When a client attempts to write data to a secondary node, MongoDB throws the not primary
error to enforce this architecture and prevent inconsistencies.
Steps to Reproduce
The following steps outline how to reproduce the MongoServerError: not primary
error in a Canonical Kubernetes environment using Juju:
-
Deploy the necessary charms:
juju deploy sdcore-nrf-k8s --channel=1.6/edge juju deploy mongodb-k8s --trust --channel=6/stable juju deploy sdcore-nms-k8s --channel=1.6/edge juju deploy self-signed-certificates
-
Integrate the charms using Juju relations:
juju integrate sdcore-nms-k8s:common_database mongodb-k8s:database juju integrate sdcore-nms-k8s:auth_database mongodb-k8s:database juju integrate sdcore-nms-k8s:certificates self-signed-certificates:certificates juju integrate sdcore-nrf-k8s:database mongodb-k8s juju integrate sdcore-nrf-k8s:sdcore_config sdcore-nms-k8s:sdcore_config juju integrate sdcore-nrf-k8s:certificates self-signed-certificates:certificates
These steps deploy a set of charms, including mongodb-k8s
, and establish relations between them. The integration of sdcore-nms-k8s
and sdcore-nrf-k8s
with mongodb-k8s
via the database
relation triggers the user creation process, which is where the error typically occurs. This setup mimics a real-world deployment scenario where multiple applications rely on a central MongoDB database, making it a valuable example for understanding and resolving the issue.
Expected Behavior
In a successful deployment, the database users should be created on the MongoDB instance without any errors. The requiring charms, such as sdcore-nrf-k8s
and sdcore-nms-k8s
, should transition to an active state, indicating that they have successfully connected to the database and are functioning as expected. The absence of errors during user creation is a critical indicator of a healthy MongoDB deployment, ensuring that applications can access and utilize the database effectively. The creation of database users is a fundamental step in setting up secure access to the MongoDB instance. Each application or service should have its own dedicated user with specific permissions, adhering to the principle of least privilege. This enhances security by limiting the potential impact of compromised credentials and preventing unauthorized access to sensitive data. When the user creation process fails, it can disrupt the entire deployment pipeline, as applications cannot connect to the database and perform their intended functions. Therefore, resolving the MongoServerError: not primary
error is crucial for ensuring a smooth and secure MongoDB deployment.
Actual Behavior
The actual behavior observed is that the database user creation fails, resulting in the MongoServerError: not primary
error. The requiring charms remain in a waiting state, unable to proceed with their configuration and initialization. This waiting state indicates that the applications are blocked from accessing the database, preventing them from functioning correctly. The failure of user creation is a significant impediment to the overall deployment, as it can lead to cascading errors and service unavailability. The MongoServerError: not primary
error specifically points to a problem with the MongoDB replica set configuration or the timing of operations during the deployment process. It suggests that the charm is attempting to create a user on a node that is not currently the primary, which is a violation of MongoDB's write constraints. This can happen if the primary node is temporarily unavailable, undergoing a failover, or if the charm is attempting to write to a secondary node due to a misconfiguration or timing issue.
Versions
- Operating system: Ubuntu 24.04
- Juju CLI: 3.6.8
- Juju agent: 3.6.8
- Charm revision: 61 (6/stable)
- LXD:
- Canonical K8s: 1.32/stable (same with 1.33/stable)
Log Output
The Juju debug log provides valuable insights into the error. The relevant snippet from the log is:
unit-mongodb-k8s-0: 2025-07-17 01:41:34 ERROR unit.mongodb-k8s/0.juju-log database:3: Failed to create the operator user: non-zero exit code 1 executing ['/usr/bin/mongosh', 'mongodb://localhost/admin', '--quiet', '--eval', '"db.createUser({ user: \'operator\', pwd: passwordPrompt(), roles:[ {\'role\': \'userAdminAnyDatabase\', \'db\': \'admin\'}, {\'role\': \'readWriteAnyDatabase\', \'db\': \'admin\'}, {\'role\': \'clusterAdmin\', \'db\': \'admin\'}, ], mechanisms: [\'SCRAM-SHA-256\'\], passwordDigestor: \'server\',})"'], stdout='Enter password\n********************************', stderr='MongoServerError: not primary\n'
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/./src/charm.py", line 1173, in _init_operator_user
process.wait_output()
File "/var/lib/juju/agents/unit-mongodb-k8s-0/charm/venv/ops/pebble.py", line 1771, in wait_output
raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 1 executing ['/usr/bin/mongosh', 'mongodb://localhost/admin', '--quiet', '--eval', '"db.createUser({ user: \'operator\', pwd: passwordPrompt(), roles:[ {\'role\': \'userAdminAnyDatabase\', \'db\': \'admin\'}, {\'role\': \'readWriteAnyDatabase\', \'db\': \'admin\'}, {\'role\': \'clusterAdmin\', \'db\': \'admin\'}, ], mechanisms: [\'SCRAM-SHA-256\'\], passwordDigestor: \'server\',})"'], stdout='Enter password\n********************************', stderr='MongoServerError: not primary\n'
This log excerpt clearly indicates that the db.createUser
command failed with the MongoServerError: not primary
error. The traceback provides further details about the location of the error within the charm's code. Analyzing the log output is crucial for pinpointing the exact cause of the error and identifying the steps needed to resolve it. The log message shows that the mongodb-k8s
charm attempts to create an operator user with specific roles, including userAdminAnyDatabase
, readWriteAnyDatabase
, and clusterAdmin
. These roles grant the user extensive privileges within the MongoDB instance, highlighting the importance of successful user creation for the proper functioning of the deployment. The failure to create this user can have significant consequences, as it may prevent other applications from accessing the database or limit the ability to manage the MongoDB instance effectively.
Root Cause Analysis
The root cause of the MongoServerError: not primary
error in this context is typically related to the timing of operations during the MongoDB replica set initialization. When the mongodb-k8s
charm is deployed and integrated with other charms, it attempts to create database users before the MongoDB replica set has fully initialized and elected a primary node. This race condition results in the db.createUser
command being executed against a secondary node or a node that is not yet part of the replica set, leading to the error.
MongoDB replica set initialization involves several steps, including the election of a primary node, replication of data across the nodes, and establishment of quorum. These steps take time to complete, and during this period, write operations are not allowed on secondary nodes. If the charm attempts to create a user before the primary node is elected, the MongoServerError: not primary
error will occur. This timing issue is a common challenge in distributed systems, where operations need to be coordinated across multiple nodes. The mongodb-k8s
charm needs to be designed to handle this timing issue gracefully, ensuring that user creation is attempted only after the replica set has fully initialized.
Potential Solutions
Several strategies can be employed to address the MongoServerError: not primary
error. Here are some effective approaches:
-
Implement a Retry Mechanism: The most robust solution is to implement a retry mechanism within the
mongodb-k8s
charm. This involves detecting theMongoServerError: not primary
error and retrying thedb.createUser
operation after a short delay. The retry logic should include a maximum number of attempts and an exponential backoff strategy to avoid overwhelming the MongoDB instance. A retry mechanism ensures that the user creation process will eventually succeed once a primary node is available. This is a common pattern in distributed systems to handle transient errors and timing issues. By retrying the operation, the charm can automatically recover from temporary unavailability of the primary node, ensuring a more resilient deployment. -
Check Replica Set Status: Before attempting to create users, the charm should check the status of the MongoDB replica set using the
rs.status()
command. This command provides information about the replica set, including the current primary node and the health of the other nodes. The charm can use this information to ensure that a primary node is available before attempting to create users. Checking the replica set status is a proactive approach that can prevent theMongoServerError: not primary
error from occurring in the first place. By verifying that a primary node is available, the charm can avoid attempting write operations on secondary nodes or during the initialization phase. This approach requires the charm to have the necessary permissions to execute thers.status()
command, which is typically granted to the operator user. -
Delay User Creation: Another approach is to delay the user creation process until the MongoDB replica set has fully initialized. This can be achieved by introducing a delay in the charm's code or by using Juju's reactive framework to trigger the user creation event after a specific condition is met, such as the availability of a primary node. Delaying user creation is a simple and effective way to avoid the timing issue. By waiting for the replica set to initialize, the charm can ensure that a primary node is available before attempting to create users. This approach may introduce a slight delay in the overall deployment process, but it can significantly improve the reliability of user creation.
-
Monitor Juju Events: Implement monitoring of Juju events related to MongoDB to identify potential issues early on. This can involve setting up alerts for specific error messages or unusual behavior, allowing for proactive intervention and faster resolution of problems. Monitoring Juju events provides valuable insights into the health and status of the deployment. By tracking events related to MongoDB, such as charm deployments, relation changes, and error messages, operators can identify potential issues before they escalate into major problems. This proactive approach allows for faster resolution of problems and minimizes downtime. Monitoring can also help identify patterns and trends that can be used to improve the overall reliability and performance of the deployment.
Recommended Solution
The most recommended solution is to implement a combination of the retry mechanism and the replica set status check. This approach provides the most robust and reliable way to handle the MongoServerError: not primary
error. The replica set status check ensures that the charm attempts user creation only when a primary node is available, while the retry mechanism handles any transient errors that may occur during the process. This combination provides a comprehensive solution that addresses both the timing issue and the possibility of temporary unavailability of the primary node.
Additional Context
In addition to the solutions mentioned above, it's crucial to ensure that the MongoDB replica set is configured correctly and that the network connectivity between the nodes is stable. Misconfigurations or network issues can exacerbate the MongoServerError: not primary
error. Regularly reviewing the MongoDB configuration and monitoring the network health can help prevent these issues. Proper configuration of the MongoDB replica set is essential for its stability and performance. This includes setting appropriate values for parameters such as the replica set name, the number of members, and the priority of nodes. Network connectivity is also crucial, as any disruptions can lead to failovers and other issues. Monitoring the network latency and packet loss between the nodes can help identify potential problems before they impact the deployment.
Conclusion
The MongoServerError: not primary
error when executing db.createUser
in Canonical K8s is a common issue related to the timing of operations during MongoDB replica set initialization. By implementing a retry mechanism, checking the replica set status, and considering other solutions like delaying user creation, this error can be effectively addressed. A combination of a retry mechanism and replica set status checks provides the most robust solution for ensuring successful database user creation in a Juju-deployed MongoDB environment. Understanding the root cause of the error and implementing appropriate solutions are essential for ensuring the stability and reliability of MongoDB deployments on Kubernetes.
By adopting these strategies, you can ensure a smoother and more reliable deployment of MongoDB on Canonical Kubernetes, leading to a more robust and efficient infrastructure for your applications.