Skip to content

preprocess_hunyuan_overfit

Preprocess HunyuanVideo overfit data into parquet format.

Encodes videos with HunyuanVideo VAE and captions with dual text encoders (LLaMA + CLIP) into the t2v parquet schema expected by the training framework.

Usage

CUDA_VISIBLE_DEVICES=0 python scripts/preprocess_hunyuan_overfit.py

Functions

fastvideo.pipelines.preprocess.preprocess_hunyuan_overfit.load_video

load_video(path: str, num_frames: int) -> Tensor

Load video as [1, C, T, H, W] in [-1, 1].

Source code in fastvideo/pipelines/preprocess/preprocess_hunyuan_overfit.py
def load_video(path: str, num_frames: int) -> torch.Tensor:
    """Load video as [1, C, T, H, W] in [-1, 1]."""
    cap = cv2.VideoCapture(path)
    frames: list[np.ndarray] = []
    while len(frames) < num_frames:
        ret, frame = cap.read()
        if not ret:
            break
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        frames.append(frame)
    cap.release()

    if len(frames) < num_frames:
        # Repeat last frame to fill
        while len(frames) < num_frames:
            frames.append(frames[-1])

    frames = frames[:num_frames]
    video = np.stack(frames, axis=0)
    video = torch.from_numpy(video).float()
    video = video / 127.5 - 1.0  # [0,255] -> [-1,1]
    video = video.permute(3, 0, 1, 2).unsqueeze(0)  # [1,C,T,H,W]
    return video