The final research paper is found here.

Video Super-Resolution

Video Super-Resolution (VSR) seeks to generate high-resolution (HR) video from its low-resolution (LR) degraded counterpart. This area of computer vision has seen increased attention in recent years due to the growing demand for high-quality video and image services. Research in this area is extensive and many different approaches have been proposed. Currently, the best performing models have been score-based diffusion models.

Often, VSR is divided into two subdomains: synthetic and real-world. In synthetic VSR, LR-HR training pairs are formed using a known degradation approach. Typically, synthetic methods assume overly idealized degradations and use fixed blur and bicubic downsampling. Unfortunately, using these degradations in training leads to poor model generalizability since these convenient abstractions do not fully model real-world degradation processes. For example, viral internet videos often incur unknown amounts of lossy compression, camera shake, motion blur, and intensity artifacts. Unlike synthetic VSR, real-world VSR seeks to super-resolve videos whose degradation process is completely unknown.

Summary

Diffusion models have shown promising results for real-world VSR tasks. However, due to their novelty, little research has been done with respect to their performance on realistic real-world data. We address this issue by synthesizing our novel Augmented Video Low Quality Dataset (AVLQ) and investigating the performance of a state-of-the-art diffusion model known as the Motion Guided Latent Diffusion Model (MGLD).

In this paper, we address two things:

  1. Synthesize an accurate, realistic benchmark dataset for real-world VSR.
  2. Using our newly created dataset, compare the perforance of MGLD with older VSR models (such as ESRGAN).

Our AVLQ dataset addresses two novel degradations not found in previous benchmarks. Using a stochastic degradation approach, we add both video shake and intensity scaling to input video, along with typical degradation procedures (downsampling, compression, blur, etc.)

We find that MGLD continues to significantly outperform previous VSR models (GAN and CNN-based). Our quantitative and qualitative findings may be found in the paper linked above.