Any tips for resolving an issue where a Julia/PythonCall-based training script causes separate PyTorch DataLoader worker processes to hit an ERROR: Unexpected segmentation fault encountered in worker.
on startup? Could the worker processes be in need of juliacall?
I do not know the stack trace at this point in time, but the issue occurs in relation to the first epoch of the training process (likely when getting the first batch of data).
The Julia/PythonCall-based training script creates a Python torch.utils.Dataset
-subclass (using PythonCall - with Julia-defined implementations of the Python class methods), which is then fed to the Huggingface Accelerate-based training loop - it is a direct Python-to-Julia/PythonCall translation of https://github.com/Chris-hughes10/Yolov7-training/blob/main/examples/minimal_finetune_cars.py#L121-L201 - just using a Julia-defined dataset in place of CarsDatasetAdaptor
.
The Python-training script is running just fine.
It seems that the issue is not (yet) the data returned from the torch.utils.Dataset
-subclass as it seems that the Julia-defined __getitem__
-method is not even being invoked (nor __init__
or __len__
)...
Last updated: Nov 22 2024 at 04:41 UTC