Human: I've been working on monodepth estimation in a self-supervised manner and have referred to monodepth2. While training on nuScene, I found it hard to train the model for some reason. What would be fatal factors for the sucessful training?
ChatGPT: Training a self-supervised monodepth model like Monodepth2 on a complex dataset like nuScenes can be tricky. Here are some fatal factors that could affect your training:
🔑 1. Inappropriate Frame Selection
Monodepth2 uses consecutive frames (e.g., t-1, t, t+1).
nuScenes has a lower frame rate (2Hz), so the motion between frames is larger than in KITTI (10Hz).
➤ Solution: Sample closer frames (e.g., use interpolated data or reduce baseline).
🔑 2. Incorrect Camera Intrinsics
Monodepth2 assumes known and accurate intrinsics.
nuScenes has different cameras with distinct calibration.
➤ Solution: Make sure you're loading the correct intrinsics per camera.
🔑 3. Poor Image Preprocessing
Image size, normalization, or augmentations might not fit Monodepth2's assumptions.
➤ Solution: Resize images properly (e.g., 640x192), normalize like in KITTI, and avoid too much augmentation that breaks photometric consistency.
🔑 4. Inaccurate Ego-Motion Estimation
Pose network struggles when motion is large or non-rigid scenes dominate.
nuScenes has complex urban scenes with dynamic objects and sparse motion cues.
➤ Solution: Consider using IMU/GPS data for supervision, or refine your pose network and loss function.
🔑 5. Too Many Dynamic Objects
Self-supervised loss assumes static scenes for photometric consistency.
nuScenes has many moving cars, pedestrians, etc.
➤ Solution: Add auto-masking, object detection masks, or use multi-view consistency instead of single-view photometric loss.
🔑 6. Loss Imbalance or Exploding Gradients
The photometric loss, smoothness loss, and pose-related losses must be balanced.
➤ Solution: Carefully tune weights. Check loss values and learning rate.
🔑 7. Small Batch Size
Batch size affects stability. If it's too small, the model may not converge well.
➤ Solution: Use gradient accumulation if memory is limited.
If you share your loss curve, data pipeline, or training logs, I can help more precisely. Want to debug together?
'Deep Learning' 카테고리의 다른 글
Improve noise estimation performance (0) | 2025.03.31 |
---|---|
Project lidar point cloud into a camera image in nuScenes (0) | 2025.03.27 |
Noise estimation idea 2 (0) | 2025.03.18 |
Improving DDPD noise estimation (0) | 2025.03.18 |
Class-specific DDPM (0) | 2025.03.11 |