Preprocess HunyuanVideo overfit data into parquet format.
Encodes videos with HunyuanVideo VAE and captions with dual text
encoders (LLaMA + CLIP) into the t2v parquet schema expected by
the training framework.
Usage
CUDA_VISIBLE_DEVICES=0 python scripts/preprocess_hunyuan_overfit.py
Functions
fastvideo.pipelines.preprocess.preprocess_hunyuan_overfit.load_video
load_video(path: str, num_frames: int) -> Tensor
Load video as [1, C, T, H, W] in [-1, 1].
Source code in fastvideo/pipelines/preprocess/preprocess_hunyuan_overfit.py
| def load_video(path: str, num_frames: int) -> torch.Tensor:
"""Load video as [1, C, T, H, W] in [-1, 1]."""
cap = cv2.VideoCapture(path)
frames: list[np.ndarray] = []
while len(frames) < num_frames:
ret, frame = cap.read()
if not ret:
break
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frames.append(frame)
cap.release()
if len(frames) < num_frames:
# Repeat last frame to fill
while len(frames) < num_frames:
frames.append(frames[-1])
frames = frames[:num_frames]
video = np.stack(frames, axis=0)
video = torch.from_numpy(video).float()
video = video / 127.5 - 1.0 # [0,255] -> [-1,1]
video = video.permute(3, 0, 1, 2).unsqueeze(0) # [1,C,T,H,W]
return video
|