Moment Reenacting: Inverse Motion Degradation with Cross-shutter Guidance

Abstract

Motion degradation, manifested as blur in global shutter (GS) images or rolling shutter (RS) distortion in RS counterparts, remains a fundamental challenge in computational imaging, especially under fast motion or low-light conditions. While prior works have treated blur decomposition and RS temporal super-resolution as separate tasks, this separation fails to exploit their intrinsic complementarity. In this paper, we propose a unified framework to invert motion degradation and reenact imaging moment by jointly leveraging the complementary characteristics of GS blur and RS distortion. To this end, we introduce a novel dual-shutter setup that captures synchronized blur-RS image pairs and demonstrate that this combination effectively resolves temporal and spatial ambiguities inherent in both modalities. For allowing flexible performance-cost trade-offs, we further extend this dual-shutter setup to a stereo Blur-RS configuration with a narrow baseline. In addition, we construct a triaxial imaging system to collect a real-world dataset with aligned GS-RS pairs and ground-truth high-speed frames, enabling robust training and evaluation beyond synthetic data. Our proposed network explicitly disentangles motion into context-aware and temporally-sensitive representations via a dual-stream motion interpretation module, followed by a self-prompted frame reconstruction stage. Extensive experiments validate the superiority and generalizability of our approach, establishing a new paradigm for realistic high-speed video reconstruction under complex motion degradations.

Method Overview

In general, the model framework is structured into two sequential stages: motion interpretation followed by frame reconstruction. The motion interpretation (MI) phase is designed to iteratively exploit the merits of our Blur-RS combination, employing three motion interpretation blocks under the guidance of a teacher module. With the enhancement of temporal positional encoding, it explicitly emphasizes the contextual characterization and temporal abstraction from disentangled blur and RS streams, respectively. The frame reconstruction part is implemented through GenNet in an encoder-decoder architecture. We propose a novel self-prompter based on motion residues to precisely and adaptively refine the warped latent frames.

Optical System and RealBR Dataset

Rather than a biaxial system for image-to-image deblurring, we develop a triaxial imaging system that simultaneously captures Blur-RS pairs along with high-speed ground truth, and collect a real dataset named RealBR (video samples shown as below).