flux_2_latent_preparation
¶
Flux2 latent preparation stage using packed 2x2 layout.
Flux2 uses packed latents: transformer sees 128 channels (32*4) with half spatial resolution; after denoising we unpatchify to 32 channels and full spatial for VAE decode. This stage prepares (B, 128, T, H//2, W//2).
Classes¶
fastvideo.pipelines.basic.flux_2.flux_2_latent_preparation.Flux2LatentPreparationStage
¶
Flux2LatentPreparationStage(scheduler, transformer, use_btchw_layout: bool = False)
Bases: LatentPreparationStage
Latent preparation for Flux2: packed layout with half spatial dimensions.
Matches diffusers Flux2Pipeline.prepare_latents: shape is (B, num_channels_latents, T, H_latent//2, W_latent//2) so the transformer sees 128 channels and half spatial; after denoising we unpatchify to (B, 32, H_latent, W_latent) before VAE.
Source code in fastvideo/pipelines/stages/latent_preparation.py
Methods:¶
fastvideo.pipelines.basic.flux_2.flux_2_latent_preparation.Flux2LatentPreparationStage.forward
¶
forward(batch: ForwardBatch, fastvideo_args: FastVideoArgs) -> ForwardBatch
Prepare latents with Flux2 packed half-spatial shape.
Source code in fastvideo/pipelines/basic/flux_2/flux_2_latent_preparation.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |