
Retrieving a sequence from degradation is highly ambiguous for either blur or RS inputs. The Cross-Shutter strategy provides a way to simultaneously deal with the ambiguities of blur decomposition and RS interpolation.
Motion degradation, manifested as blur in global shutter (GS) images or rolling shutter (RS) distortion in RS counterparts, remains a fundamental challenge in computational imaging, especially under fast motion or low-light conditions. While prior works have treated blur decomposition and RS temporal super-resolution as separate tasks, this separation fails to exploit their intrinsic complementarity. In this paper, we propose a unified framework to invert motion degradation and reenact imaging moment by jointly leveraging the complementary characteristics of GS blur and RS distortion. To this end, we introduce a novel dual-shutter setup that captures synchronized blur-RS image pairs and demonstrate that this combination effectively resolves temporal and spatial ambiguities inherent in both modalities. For allowing flexible performance-cost trade-offs, we further extend this dual-shutter setup to a stereo Blur-RS configuration with a narrow baseline. In addition, we construct a triaxial imaging system to collect a real-world dataset with aligned GS-RS pairs and ground-truth high-speed frames, enabling robust training and evaluation beyond synthetic data. Our proposed network explicitly disentangles motion into context-aware and temporally-sensitive representations via a dual-stream motion interpretation module, followed by a self-prompted frame reconstruction stage. Extensive experiments validate the superiority and generalizability of our approach, establishing a new paradigm for realistic high-speed video reconstruction under complex motion degradations.
In general, the model framework is structured into two sequential stages: motion interpretation followed by frame reconstruction. The motion interpretation (MI) phase is designed to iteratively exploit the merits of our Blur-RS combination, employing three motion interpretation blocks under the guidance of a teacher module. With the enhancement of temporal positional encoding, it explicitly emphasizes the contextual characterization and temporal abstraction from disentangled blur and RS streams, respectively. The frame reconstruction part is implemented through GenNet in an encoder-decoder architecture. We propose a novel self-prompter based on motion residues to precisely and adaptively refine the warped latent frames.
Rather than a biaxial system for image-to-image deblurring, we develop a triaxial imaging system that simultaneously captures Blur-RS pairs along with high-speed ground truth, and collect a real dataset named RealBR (video samples shown as below).
Quantitative comparisons of reconstructed latent frame sequence with lengths of 3, 5 and 9 on RealBR.
The reconstructed latent video demos on realBR.
The reconstructed latent video demos on GOPRO-BR.
Low-lit Scenes: We further explore the effects of noisy RS observations to our method.
Misaligned Views: We randomly shift RS view in image space and provide comparisons with strictly-aligned views.
We also provide video demos on third-party testset directly using our model trained on realBR.
More visualized samples on real captured stereo data.