Skip to content

hunyuangamecraft

Configuration for HunyuanGameCraft transformer model.

HunyuanGameCraft extends HunyuanVideo with: 1. CameraNet for camera/action conditioning 2. 33 input channels (16 latent + 16 gt_latent + 1 mask) 3. Mask-based conditioning for autoregressive generation

Classes

fastvideo.configs.models.dits.hunyuangamecraft.HunyuanGameCraftArchConfig dataclass

HunyuanGameCraftArchConfig(stacked_params_mapping: list[tuple[str, str, str]] = list(), _fsdp_shard_conditions: list = (lambda: [is_double_block, is_single_block, is_refiner_block, is_camera_net])(), _compile_conditions: list = (lambda: [is_double_block, is_single_block, is_txt_in])(), param_names_mapping: dict = (lambda: {'^(.*)\\.img_mlp\\.fc1\\.(.*)$': '\\1.img_mlp.fc_in.\\2', '^(.*)\\.img_mlp\\.fc2\\.(.*)$': '\\1.img_mlp.fc_out.\\2', '^(.*)\\.txt_mlp\\.fc1\\.(.*)$': '\\1.txt_mlp.fc_in.\\2', '^(.*)\\.txt_mlp\\.fc2\\.(.*)$': '\\1.txt_mlp.fc_out.\\2', '^single_blocks\\.(\\d+)\\.mlp\\.fc1\\.(.*)$': 'single_blocks.\\1.mlp.fc_in.\\2', '^single_blocks\\.(\\d+)\\.mlp\\.fc2\\.(.*)$': 'single_blocks.\\1.mlp.fc_out.\\2', '^txt_in\\.individual_token_refiner\\.blocks\\.(\\d+)\\.(.*)$': 'txt_in.refiner_blocks.\\1.\\2', '^vector_in\\.in_layer\\.(.*)$': 'vector_in.fc_in.\\1', '^vector_in\\.out_layer\\.(.*)$': 'vector_in.fc_out.\\1', '^time_in\\.mlp\\.0\\.(.*)$': 'time_in.mlp.fc_in.\\1', '^time_in\\.mlp\\.2\\.(.*)$': 'time_in.mlp.fc_out.\\1', '^guidance_in\\.mlp\\.0\\.(.*)$': 'guidance_in.mlp.fc_in.\\1', '^guidance_in\\.mlp\\.2\\.(.*)$': 'guidance_in.mlp.fc_out.\\1', '^final_layer\\.adaLN_modulation\\.1\\.(.*)$': 'final_layer.adaLN_modulation.linear.\\1', '^txt_in\\.refiner_blocks\\.(\\d+)\\.mlp\\.fc1\\.(.*)$': 'txt_in.refiner_blocks.\\1.mlp.fc_in.\\2', '^txt_in\\.refiner_blocks\\.(\\d+)\\.mlp\\.fc2\\.(.*)$': 'txt_in.refiner_blocks.\\1.mlp.fc_out.\\2'})(), reverse_param_names_mapping: dict = (lambda: {})(), lora_param_names_mapping: dict = dict(), _supported_attention_backends: tuple[AttentionBackendEnum, ...] = (SAGE_ATTN, FLASH_ATTN, TORCH_SDPA, VIDEO_SPARSE_ATTN, VMOBA_ATTN, SAGE_ATTN_THREE, SLA_ATTN, SAGE_SLA_ATTN), hidden_size: int = 0, num_attention_heads: int = 24, num_channels_latents: int = 0, in_channels: int = 33, out_channels: int = 16, exclude_lora_layers: list[str] = (lambda: ['img_in', 'txt_in', 'time_in', 'vector_in', 'camera_net'])(), boundary_ratio: float | None = None, _fastvideo_version: str = '0.1.0', camera_net: bool = True, patch_size: int | tuple[int, int, int] = 2, patch_size_t: int = 1, attention_head_dim: int = 128, mlp_ratio: float = 4.0, num_layers: int = 20, num_single_layers: int = 40, num_refiner_layers: int = 2, rope_axes_dim: tuple[int, int, int] = (16, 56, 56), guidance_embeds: bool = False, dtype: dtype | None = None, text_embed_dim: int = 4096, pooled_projection_dim: int = 768, rope_theta: int = 256, qk_norm: str = 'rms_norm', camera_in_channels: int = 6, camera_downscale_coef: int = 8, camera_out_channels: int = 16)

Bases: DiTArchConfig

Architecture config for HunyuanGameCraft transformer.

fastvideo.configs.models.dits.hunyuangamecraft.HunyuanGameCraftConfig dataclass

HunyuanGameCraftConfig(arch_config: DiTArchConfig = HunyuanGameCraftArchConfig(), prefix: str = 'HunyuanGameCraft', quant_config: QuantizationConfig | None = None)

Bases: DiTConfig

Full config for HunyuanGameCraft model.