gamecraft_image_encoding
¶
GameCraft image-to-video encoding stage.
Encodes a reference image into gt_latents and conditioning_mask for HunyuanGameCraft I2V generation. For T2V this stage is a no-op.
Classes¶
fastvideo.pipelines.stages.gamecraft_image_encoding.GameCraftImageVAEEncodingStage
¶
Bases: PipelineStage
Stage for encoding a reference image into gt_latents and conditioning_mask for HunyuanGameCraft image-to-video generation.
Official GameCraft I2V flow: 1. VAE-encode the reference image -> [B, 16, 1, H_lat, W_lat] 2. Scale by VAE scaling_factor (0.476986) 3. Repeat to all temporal frames 4. Zero out non-conditioned frames (first frame only for short videos, first half for longer autoregressive generation) 5. Build a binary mask (1 = conditioned, 0 = generate) 6. Store gt_latents and conditioning_mask on the batch for the denoising stage
If no image is provided (T2V mode), this stage is a no-op; the denoising stage already falls back to zero gt_latents and zero mask.
Source code in fastvideo/pipelines/stages/gamecraft_image_encoding.py
Functions¶
fastvideo.pipelines.stages.gamecraft_image_encoding.GameCraftImageVAEEncodingStage.forward
¶
forward(batch: ForwardBatch, fastvideo_args: FastVideoArgs) -> ForwardBatch
Encode reference image for I2V, or skip for T2V.
Source code in fastvideo/pipelines/stages/gamecraft_image_encoding.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |