I'm trying to fine-tune the MAT (Masked Attention Transformer) model from the official repository: https://github.com/fenglinglwb/MAT
However, I keep getting the following error during training:
Traceback (most recent call last):
File "train.py", line 658, in <module>
main() # pylint: disable=no-value-for-parameter
File "/opt/conda/envs/py37/lib/python3.7/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/envs/py37/lib/python3.7/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "train.py", line 651, in main
subprocess_fn(rank=0, args=args, temp_dir=temp_dir)
File "train.py", line 481, in subprocess_fn
training_loop.training_loop(rank=rank, **args)
File "/workspace/data/dayeon/MAT/training/training_loop.py", line 203, in training_loop
loss = dnnlib.util.construct_class_by_name(device=device, **ddp_modules, **loss_kwargs) # subclass of training.loss.Loss
File "/workspace/data/dayeon/MAT/dnnlib/util.py", line 289, in construct_class_by_name
return call_func_by_name(*args, func_name=class_name, **kwargs)
File "/workspace/data/dayeon/MAT/dnnlib/util.py", line 282, in call_func_by_name
func_obj = get_obj_by_name(func_name)
File "/workspace/data/dayeon/MAT/dnnlib/util.py", line 275, in get_obj_by_name
module, obj_name = get_module_from_obj_name(name)
File "/workspace/data/dayeon/MAT/dnnlib/util.py", line 246, in get_module_from_obj_name
importlib.import_module(module_name) # may raise ImportError
File "/opt/conda/envs/py37/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 962, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'losses.loss'; 'losses' is not a package
1. my environment
I'm using VS Code Remote SSH to connect to a Dev Container on a remote server. Inside the container, I created a conda virtual environment and installed the dependencies manually.
Here are the main versions:
python version: 3.7.16 (default, Jan 17 2023, 22:20:44)
[GCC 11.2.0]
numpy version: 1.21.6
torch version: 1.13.1+cu117
CUDA available: True
CUDA version: 11.7
cuDNN version: 8500
opencv version: 4.5.5
2. What I've tried
**1) Added
__init__.py
inside losses/ folder.** (Confirmed it exists using ls losses/, so the folder is a valid Python package.)
2) Added the MAT directory to sys.path inside train.py.
3) Checked for any other “losses” modules across the entire conda environment:
find /opt/conda/envs/py37 -type d -name "losses"
→ no other conflicting folder exists.
4) Confirmed the import path resolves correctly:
python -c "import losses; print(losses.__file__)"
5) Added the following code to the top of train.py:
import os, sys
current_dir = os.path.dirname(os.path.abspath(__file__))
if current_dir not in sys.path:
sys.path.insert(0, current_dir)
print(">>> PYTHONPATH:", sys.path[0])
6) Confirmed I’m running the script inside the MAT root directory:
cd /workspace/data/myname/MAT
python train.py
3. Additional info
Despite all this, the same error persists:
ModuleNotFoundError: No module named 'losses.loss'; 'losses' is not a package
And my folder structure looks like this:
MAT/
├── train.py
├── training/
│ └── training_loop.py
├── dnnlib/
├── metrics/
├── losses/
│ ├── __init__.py
│ ├── loss.py
│ └── ...
I suspect there might be:
a circular import or
an incorrect class name string passed to dnnlib.util.construct_class_by_name() in training_loop.py.
Has anyone successfully fine-tuned MAT or resolved this kind of import error?
Any guidance would be appreciated 🙏
sys.pathbefore import. So you should see full error message to see which file has this problem and you may have to make all tests inside this file, not in your own files (which you probably run from different folder)