fatal factors that make the self-supervised training for monodepth estimation extremely difficult when trained on nuScenes

Human: I've been working on monodepth estimation in a self-supervised manner and have referred to monodepth2. While training on nuScene, I found it hard to train the model for some reason. What would be fatal factors for the sucessful training?

ChatGPT: Training a self-supervised monodepth model like Monodepth2 on a complex dataset like nuScenes can be tricky. Here are some fatal factors that could affect your training:

🔑 1. Inappropriate Frame Selection

Monodepth2 uses consecutive frames (e.g., t-1, t, t+1).

nuScenes has a lower frame rate (2Hz), so the motion between frames is larger than in KITTI (10Hz).

➤ Solution: Sample closer frames (e.g., use interpolated data or reduce baseline).

🔑 2. Incorrect Camera Intrinsics

Monodepth2 assumes known and accurate intrinsics.

nuScenes has different cameras with distinct calibration.

➤ Solution: Make sure you're loading the correct intrinsics per camera.

🔑 3. Poor Image Preprocessing

Image size, normalization, or augmentations might not fit Monodepth2's assumptions.

➤ Solution: Resize images properly (e.g., 640x192), normalize like in KITTI, and avoid too much augmentation that breaks photometric consistency.

🔑 4. Inaccurate Ego-Motion Estimation

Pose network struggles when motion is large or non-rigid scenes dominate.

nuScenes has complex urban scenes with dynamic objects and sparse motion cues.

➤ Solution: Consider using IMU/GPS data for supervision, or refine your pose network and loss function.

🔑 5. Too Many Dynamic Objects

Self-supervised loss assumes static scenes for photometric consistency.

nuScenes has many moving cars, pedestrians, etc.

➤ Solution: Add auto-masking, object detection masks, or use multi-view consistency instead of single-view photometric loss.

🔑 6. Loss Imbalance or Exploding Gradients

The photometric loss, smoothness loss, and pose-related losses must be balanced.

➤ Solution: Carefully tune weights. Check loss values and learning rate.

🔑 7. Small Batch Size

Batch size affects stability. If it's too small, the model may not converge well.

➤ Solution: Use gradient accumulation if memory is limited.

If you share your loss curve, data pipeline, or training logs, I can help more precisely. Want to debug together?

저작자표시 비영리 변경금지 (새창열림)

'Deep Learning' 카테고리의 다른 글

Improve noise estimation performance (0)	2025.03.31
Project lidar point cloud into a camera image in nuScenes (0)	2025.03.27
Noise estimation idea 2 (0)	2025.03.18
Improving DDPD noise estimation (0)	2025.03.18
Class-specific DDPM (0)	2025.03.11

즐거운 인생 - Happy Life

fatal factors that make the self-supervised training for monodepth estimation extremely difficult when trained on nuScenes

'Deep Learning' 카테고리의 다른 글

티스토리툴바

fatal factors that make the self-supervised training for monodepth estimation extremely difficult when trained on nuScenes

'Deep Learning' 카테고리의 다른 글

'Deep Learning' Related Articles

티스토리툴바