Troubleshooting MONAILabel With MONAI 1.5.0 And PyTorch 2.7.1 GPU Recognition On Windows
Introduction
This article addresses a common issue encountered when using MONAILabel with MONAI 1.5.0 and PyTorch 2.7.1 on Windows operating systems, specifically the failure of MONAILabel to recognize GPU during training. This problem often manifests with the system reporting Pytorch version: 2.6.0+cpu
even when a compatible GPU and CUDA toolkit are installed. This guide provides a detailed analysis of the issue, steps to reproduce it, observed behavior, and troubleshooting steps to resolve the problem.
Problem Description
When installing MONAILabel on Windows 10 or Windows 11 computers with PyTorch 2.7.1 and CUDA (versions 11.8 and 12.6), MONAILabel may not recognize the GPU during training. The system log incorrectly identifies the PyTorch version as 2.6.0+cpu
, indicating that the CPU is being used instead of the GPU. This issue extends beyond MONAILabel, as PyTorch itself may fail to recognize the GPU after MONAILabel is installed. In contrast, MONAILabel functions correctly with PyTorch 2.7.1 and CUDA 11.8 in an Ubuntu 22.04 environment. This discrepancy suggests a Windows-specific compatibility issue.
Detailed Explanation of the Issue
When working with medical imaging, the GPU is crucial for performing computationally intensive tasks, particularly training deep learning models. MONAI and MONAILabel are designed to leverage GPU capabilities for enhanced performance. However, the described scenario highlights a problem where the expected GPU acceleration is not being utilized, leading to significantly slower processing times. The root cause often lies in environment configurations and library dependencies within the Windows operating system.
The primary issue is the incorrect detection of the PyTorch version and the failure to recognize CUDA, which are essential for GPU support. This can stem from several factors, including:
- Incorrectly set environment variables: Windows relies on environment variables to locate libraries and executables. If these variables are not correctly configured for CUDA and PyTorch, the system may default to the CPU.
- Conflicting CUDA versions: Having multiple CUDA versions installed or referenced can lead to conflicts, preventing PyTorch from correctly identifying the GPU.
- Incompatible PyTorch and CUDA versions: Specific PyTorch versions are built to work with certain CUDA versions. An incompatibility can result in PyTorch failing to recognize the GPU.
- DLL loading issues: Windows loads DLLs based on the system's PATH environment variable. If the CUDA DLLs are not in the PATH or are shadowed by other versions, PyTorch might not load them correctly.
- MONAILabel-specific configurations: MONAILabel relies on its internal configurations to utilize GPU. If these configurations are not set properly, it might default to CPU even if PyTorch itself recognizes the GPU.
To effectively troubleshoot this issue, it is necessary to systematically examine these potential causes, ensuring each component is correctly configured and compatible with the others. The following sections provide a detailed walkthrough of the reproduction steps and a comprehensive guide to resolving the GPU recognition problem in MONAILabel on Windows.
Server Logs Analysis
The server logs provide valuable insights into the issue. The logs indicate that the MONAILabel server starts successfully, but during the training phase, it attempts to use the CPU instead of the GPU. The relevant section of the log is:
[2025-07-08 14:05:45,536] [7508] [MainThread] [INFO] (monailabel.utils.async_tasks.task:41) - Train request: {'model': 'deepedit', 'name': 'train_01', 'pretrained': True, 'device': 'cpu', ...}
This log entry shows that the device
is explicitly set to cpu
. Further down in the logs, the traceback reveals a RuntimeError
:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
This error message confirms that PyTorch cannot detect CUDA, leading to the CPU being used for training. The stack trace points to a failure in loading a checkpoint, likely because the checkpoint was saved with CUDA tensors, which cannot be loaded on a CPU-only device. The key here is the condition torch.cuda.is_available()
returning False
, which needs to be addressed.
Steps to Reproduce the Behavior
The following steps outline how to reproduce the GPU recognition issue in MONAILabel on Windows:
-
Create a new Conda environment:
conda create -n monai python=3.9 conda activate monai
-
Upgrade pip, setuptools, and wheel:
python -m pip install --upgrade pip setuptools wheel
-
Install PyTorch with CUDA support:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
This command installs PyTorch with CUDA 11.8 support. Adjust the CUDA version as needed.
-
Verify CUDA availability in PyTorch:
python -c