I am trying to generate a video using Wan 2.2. My goal is to take a motion sequence from an input video and a single reference image, and then generate a new video where the character in the reference image performs the actions from the input video.For this, I am using this [workflow][1] provided in ComfyUI-WanVideoWrapper.
However, when I run the workflow, I consistently get the following error:
```bash
Sizes of tensors must match except in dimension 2.
Expected size 887 but got size 880 for tensor number 1 in the list.
```
I replaced the following node from the workflow:
```json
{
"id": 73,
"type": "DownloadAndLoadDepthAnythingV2Model",
"pos": [-1430.5018, -396.4539],
"size": [441, 82],
"order": 24,
"mode": 2,
"outputs": [
{
"name": "da_v2_model",
"type": "DAMODEL",
"links": [82]
}
],
"properties": {
"cnr_id": "comfyui-depthanythingv2",
"ver": "003d7b44bafd3a8a4c3693a9ca3ddcd72f4883ab"
}
}
with a DWPose node:
"156": {
"inputs": {
"detect_hand": "enable",
"detect_body": "enable",
"detect_face": "enable",
"resolution": 512,
"bbox_detector": "yolox_l.onnx",
"pose_estimator": "dw-ll_ucoco_384_bs5.torchscript.pt",
"scale_stick_for_xinsr_cn": "disable",
"image": ["96", 0]
},
"class_type": "DWPreprocessor",
"_meta": {
"title": "DWPose Estimator"
}
}
I already resized both the input video and the reference image to the recommended resolution of 480x832. The fps is set to 60 with a total frame count of 81, and the model files are placed according to the workflow’s requirements. Currently, the resolution in the DWpose node is set to 512.
Could this error be caused by the resolution setting in DWpose, or is there another preprocessing step in the workflow that I am missing?