Troubleshooting Coolify Proxy Connectivity Issues With Custom Docker Networks
Introduction
In the realm of modern web application deployment, Coolify stands out as a robust platform for managing and deploying applications. However, like any complex system, it can encounter unforeseen issues. This article delves into a specific bug report concerning coolify-proxy losing connectivity to app containers when using custom Docker networks, resulting in a frustrating 504 Gateway Timeout
error. We will explore the error message and logs, the steps to reproduce the issue, example configurations, and potential workarounds. Additionally, we will address key questions and requests for help to better understand and resolve this connectivity problem.
Error Message and Logs
The primary symptom of this issue is the inaccessibility of deployed applications via their domain names, manifesting as a 504 Gateway Timeout
error. Despite this, the applications remain reachable through their direct IP addresses and ports. Detailed logs from coolify-proxy, captured with --log.level=DEBUG
enabled, reveal the underlying problem. The logs display a critical error message:
504 Gateway Timeout error="dial tcp 10.0.X.X:3000: i/o timeout"
This log snippet indicates that coolify-proxy is unable to establish a TCP connection with the application container at the specified IP address (10.0.X.X
) and port (3000). The IP address corresponds to the internal IP of the application container within a custom Compose network, such as app_network
. This error suggests a fundamental connectivity issue between the proxy and the application container. Connectivity can only be restored by either restarting the proxy or manually connecting the proxy to the application network using the following command:
docker network connect app_network coolify-proxy
This temporary fix highlights that the proxy’s isolation from the application network is a key factor in the problem. Understanding the error message and the conditions under which it occurs is the first step in diagnosing and resolving this connectivity issue. The sporadic nature of the problem, occurring after hours or days, adds complexity to the troubleshooting process, making it crucial to identify a consistent reproduction path.
Steps to Reproduce
The intermittent nature of this bug makes it challenging to reproduce consistently. It typically occurs sporadically, often after several hours or even days of operation, which complicates the debugging process. However, the issue appears to be confined to specific conditions, which are critical to understanding the root cause. These conditions are:
- Deployment via Docker Compose with a Custom Bridge Network: The application must be deployed using Docker Compose, and it must be part of a custom bridge network (e.g.,
app_network
). This network isolates the application containers from the default Docker network, which is crucial for security and network management in complex deployments. - coolify-proxy Running in Coolify’s Default Network: The
coolify-proxy
must be running within Coolify’s default network (coolify
) and thus isolated from the application network. This isolation is a significant factor, as it prevents direct communication between the proxy and the application containers without explicit network connections. - Proxy Loses Connection to App Container: Over time, the
coolify-proxy
loses its ability to connect to the application container. This loss of connectivity is the core issue, resulting in the504 Gateway Timeout
errors observed by users trying to access the application through its domain name. - Connectivity Restored by Restart or Manual Connection: The connection can only be re-established by either restarting the
coolify-proxy
, restarting the application, or manually connecting the proxy to the application network using thedocker network connect
command. This temporary fix confirms that the underlying issue is a network connectivity problem that requires intervention to resolve.
Identifying these steps is essential for anyone encountering this issue, as it provides a framework for reproducing the bug in a controlled environment. Creating a reliable reproduction path is the next crucial step in thoroughly diagnosing and addressing the underlying cause.
Example Broken Configuration
To illustrate the misconfiguration that leads to this connectivity issue, consider the following docker-compose.yaml
example. This setup highlights the use of a custom bridge network and how it interacts with coolify-proxy
.
services:
backend:
...
networks:
- app_network
networks:
app_network:
driver: bridge
In this configuration:
- The
backend
service is defined as part of the application. - The
networks
section specifies that thebackend
service should be connected to theapp_network
. - The
app_network
is defined as a bridge network. This setup is common for isolating application components within their own network for security and management purposes.
However, the problem arises when coolify-proxy
is not also connected to the app_network
. By default, coolify-proxy
runs in Coolify’s default network, which means it cannot directly communicate with containers in the app_network
. This isolation is intentional to maintain security, but it requires careful configuration to ensure connectivity.
The issue is that when the proxy attempts to forward requests to the backend
service, it cannot reach the container’s internal IP address on the app_network
, resulting in the 504 Gateway Timeout
error. This misconfiguration is a common pitfall when using custom Docker networks with reverse proxies like coolify-proxy
.
In contrast, a working configuration typically involves either removing the custom network or explicitly connecting coolify-proxy
to the application network. One effective approach is to add a health check to the application container, which helps Coolify monitor the application’s status and restart it if necessary. This setup can mitigate downtime and improve the overall reliability of the deployment.
By understanding the broken configuration, users can identify and rectify similar issues in their own setups, ensuring that the proxy and application containers can communicate effectively.
Current Working Configuration and Workarounds
To address the connectivity issues, several workarounds and configuration adjustments have been identified. These solutions aim to ensure that coolify-proxy
can reliably communicate with application containers in custom Docker networks. The primary strategies include removing the custom network, manually connecting the proxy to the application network, and implementing health checks.
Removing Custom Network
One straightforward solution is to remove the custom network configuration altogether. By deploying applications in the default Docker network, coolify-proxy
can automatically discover and communicate with the containers. While this approach simplifies the network setup, it may not be suitable for all deployments, particularly those that require network isolation for security or organizational reasons.
Manually Connecting the Proxy to the Application Network
Another workaround involves manually connecting coolify-proxy
to the application network using the docker network connect
command. This can be done as a temporary fix to restore connectivity, as demonstrated in the initial bug report:
docker network connect app_network coolify-proxy
However, this is not a permanent solution, as the connection may be lost again over time. A more robust approach would be to automate this connection as part of the deployment process or to configure coolify-proxy
to automatically join the application network.
Adding Health Checks
Implementing health checks is a crucial step in ensuring application availability. By defining a health check in the docker-compose.yaml
file, Coolify can monitor the application’s status and automatically restart it if it becomes unhealthy. This helps mitigate downtime and ensures that the application remains responsive.
services:
backend:
...
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 1m30s
timeout: 10s
retries: 3
start_period: 40s
In this example, a health check is defined that periodically sends a request to the /health
endpoint of the application. If the check fails, Docker will restart the container. This can help prevent extended periods of downtime due to connectivity issues.
Current Working Configuration
The most reliable configuration observed so far involves removing the custom network and adding a health check. This setup has shown no downtime, but it may not be the ideal solution for all use cases. Deployments that require network isolation may need a more sophisticated approach, such as configuring coolify-proxy
to automatically join the application network.
By understanding these workarounds and configuration adjustments, users can better manage connectivity issues between coolify-proxy
and application containers, ensuring the reliability and availability of their deployments.
Coolify Version and Operating System
Understanding the environment in which this bug occurs is crucial for effective troubleshooting. The reported issue was observed on a specific version of Coolify and a particular operating system, which may provide valuable context for developers and other users facing similar problems.
Coolify Version
The bug was reported on Coolify version v4.0.0-beta.420.5
. Beta versions often contain the latest features and improvements but may also include undiscovered bugs. Identifying the version helps developers focus their efforts on potential regressions or issues introduced in that specific release. Users experiencing similar problems should check their Coolify version to determine if it matches the reported version, as this could indicate a shared root cause.
Operating System and Version
The issue was observed on a self-hosted instance of Coolify running on Ubuntu 20.04.6 LTS. Ubuntu 20.04.6 LTS is a widely used and stable operating system, making it a common platform for deploying applications. Knowing the operating system helps narrow down potential environmental factors that could contribute to the bug, such as specific kernel versions, networking configurations, or Docker-related issues.
Significance of Environment Details
These details are essential because bugs can often be environment-specific. For instance, a network configuration issue might only manifest on certain operating systems or Docker versions. Similarly, changes in Coolify’s codebase between versions could introduce or resolve bugs. By providing the Coolify version and operating system details, the bug report offers a valuable starting point for developers to replicate the issue and identify the underlying cause.
Users encountering similar problems should include their Coolify version and operating system details when reporting issues. This information helps developers prioritize and address bugs more efficiently, ultimately leading to a more stable and reliable platform.
Additional Information and Community Reports
In addition to the technical details provided in the bug report, additional information and community reports can offer valuable insights into the nature and scope of the issue. These supplemental resources can highlight recurring patterns, potential workarounds, and the impact on other users. Specifically, reports from online communities, such as Reddit, can provide a broader perspective on the problem.
Coolify Subreddit Report
The bug report references a related issue reported on the Coolify subreddit. A user described a similar problem where their deployed applications randomly became unreachable via their subdomains. The applications remained accessible via their direct IP addresses and corresponding ports, indicating a network-related issue rather than an application crash. Restarting the proxy temporarily resolved the problem, which aligns with the findings in the primary bug report.
The Reddit report (https://www.reddit.com/r/coolify/comments/1je4gsa/coolify_deployed_apps_randomly_stop_working_until) states:
Randomly, at least once a day, one of my apps becomes unreachable via its subdomain… the app itself doesn’t crash—it remains accessible if I navigate directly to the instance IP and its corresponding port. Restarting the proxy… temporarily fixes the issue.
This corroborating evidence suggests that the connectivity issue is not isolated and affects multiple users. The shared symptoms and temporary fix further strengthen the hypothesis that the problem lies within the network configuration or proxy handling in Coolify.
Unclear Documentation and Configuration Limitations
The original bug report also raises the question of whether the issue is a bug, a configuration limitation, or a documentation gap. This is a crucial point because it highlights the need for clear documentation and best practices for configuring Coolify with custom Docker networks. If the expected behavior is that coolify-proxy
cannot reach containers on custom networks without additional configuration, this should be explicitly stated in the documentation.
Workarounds and Temporary Solutions
The provided workarounds, such as removing the custom network or manually connecting the proxy, offer temporary solutions but may not be ideal for all deployments. A more robust solution would involve a mechanism for automatically integrating Compose networks with the proxy or providing clear guidelines on how to configure this integration.
By considering additional information and community reports, a more comprehensive understanding of the bug can be achieved, leading to more effective solutions and improved documentation.
Key Questions and Requests for Help
To effectively address the connectivity issue between coolify-proxy and application containers on custom Docker networks, several key questions and requests for help have been raised. These inquiries aim to clarify the expected behavior, identify recommended configurations, and establish a reliable method for reproducing the bug. Addressing these questions is crucial for developing a comprehensive solution.
Is it Expected that coolify-proxy Can't Reach Containers on Custom Networks?
One of the primary questions is whether it is the intended behavior that coolify-proxy cannot directly communicate with containers on custom networks. Understanding this is fundamental to determining whether the current situation is a bug or a configuration oversight. If isolation between the proxy and custom networks is the default behavior, clear documentation and configuration guidance are necessary to help users establish connectivity.
Is There a Recommended or Automatic Way to Integrate Compose Networks with the Proxy?
Another critical question revolves around the existence of a recommended or automatic method for integrating Compose networks with coolify-proxy. Manually connecting the proxy to each application network is not a scalable or maintainable solution. A more automated approach, such as a configuration setting or a built-in mechanism, would streamline the process and reduce the likelihood of errors. Exploring options for automatic network integration is essential for improving the user experience and ensuring the reliability of deployments.
How Can I Reliably Reproduce This to Figure Out the Root Cause?
A significant challenge in resolving this issue is the lack of a deterministic reproduction path. The bug occurs sporadically, making it difficult to diagnose the root cause. Establishing a reliable method for reproducing the issue is crucial for developers to investigate and test potential solutions. This may involve identifying specific conditions, network configurations, or traffic patterns that trigger the connectivity loss. Collaboration and experimentation are necessary to develop a consistent reproduction scenario.
Request for More Information
To facilitate further investigation, additional information is requested, including logs, configurations, system details, and testing steps. Providing comprehensive logs from both coolify-proxy and the application containers can offer insights into the communication breakdown. Detailed configurations, such as docker-compose.yaml
files and Coolify settings, can reveal potential misconfigurations. System information, including kernel versions and Docker versions, can help identify environmental factors. Finally, specific testing steps that have been attempted can guide developers in replicating the issue.
By addressing these key questions and requests for help, the root cause of the connectivity problem can be more effectively identified and resolved, leading to a more robust and user-friendly Coolify platform.
Conclusion
The bug report concerning coolify-proxy losing connectivity to app containers on custom Docker networks highlights a significant issue that can disrupt application deployments. The 504 Gateway Timeout
errors, the specific conditions under which the bug occurs, and the temporary workarounds all point to a network configuration challenge. By examining the error messages, steps to reproduce, example configurations, and community reports, a clearer understanding of the problem emerges.
Key questions regarding the expected behavior of coolify-proxy with custom networks, the need for automated integration methods, and the quest for a reliable reproduction path underscore the complexity of the issue. Addressing these questions and requests for help is essential for developing a comprehensive solution. The collaboration between users and developers, along with the sharing of logs, configurations, and testing steps, will be critical in resolving this bug and improving the Coolify platform.
Ultimately, this investigation not only aims to fix a specific bug but also to enhance the documentation, configuration options, and overall reliability of Coolify. By ensuring that coolify-proxy can seamlessly integrate with custom Docker networks, Coolify can better serve the diverse needs of its users and maintain its position as a robust and user-friendly deployment platform.