depth_estimation
¶
MoGe-based monocular depth estimation for GEN3C 3D cache conditioning.
Functions¶
fastvideo.pipelines.basic.gen3c.depth_estimation.load_moge_model
¶
Load MoGe depth estimation model from HuggingFace.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
HuggingFace model identifier. |
'Ruicheng/moge-vitl'
|
device
|
str | device
|
Device to load model on. |
'cuda'
|
Returns:
| Type | Description |
|---|---|
MoGeModel
|
Loaded MoGe model. |
Source code in fastvideo/pipelines/basic/gen3c/depth_estimation.py
fastvideo.pipelines.basic.gen3c.depth_estimation.predict_depth_from_path
¶
predict_depth_from_path(image_path: str, target_h: int, target_w: int, device: device, moge_model: MoGeModel) -> tuple[Tensor, Tensor, Tensor, Tensor, Tensor]
Predict depth, intrinsics, and mask from an image file path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_path
|
str
|
Path to input image (RGB or BGR, any format cv2 supports). |
required |
target_h
|
int
|
Target height for output tensors. |
required |
target_w
|
int
|
Target width for output tensors. |
required |
device
|
device
|
Computation device. |
required |
moge_model
|
MoGeModel
|
Loaded MoGe model. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
image |
Tensor
|
(1, 1, 3, target_h, target_w) image tensor in [-1, 1]. |
depth |
Tensor
|
(1, 1, 1, target_h, target_w) depth map. |
mask |
Tensor
|
(1, 1, 1, target_h, target_w) confidence mask. |
w2c |
Tensor
|
(1, 1, 4, 4) world-to-camera matrix (identity). |
intrinsics |
Tensor
|
(1, 1, 3, 3) camera intrinsics. |
Source code in fastvideo/pipelines/basic/gen3c/depth_estimation.py
fastvideo.pipelines.basic.gen3c.depth_estimation.predict_depth_from_tensor
¶
predict_depth_from_tensor(image_tensor: Tensor, moge_model: MoGeModel) -> tuple[Tensor, Tensor]
Predict depth and mask from an image tensor (for autoregressive generation).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_tensor
|
Tensor
|
(C, H, W) image tensor in [0, 1] range. |
required |
moge_model
|
MoGeModel
|
Loaded MoGe model. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
depth |
Tensor
|
(1, 1, H, W) depth map. |
mask |
Tensor
|
(1, 1, H, W) confidence mask. |