
Extreme Monocular Dynamic View Synthesis: Generative Camera Dolly ECCV2024
"Explore the cutting-edge research on Dynamic Novel View Synthesis using Generative Camera Dolly technology, addressing the challenge of single video-based viewpoint generation. Follow the innovative approach and experiments by Columbia University, Stanford University, and Toyota Research Institute. Learn about novel scene reconstruction techniques and video conditioning methods for dynamic synthesis."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis Columbia University Stanford University Toyota Research Institute (Computer Vision -ECCV2024)
Outline Introduction Related work Approach Datasets Choice of camera trajectory Experiments
Introduction Authors aim to tackle the problem of Dynamic Novel View Synthesis View Synthesis Dynamic Novel this task is naturally extremely ill ill- -posed posed and challenging challenging. Free-viewpoint synthesisfrom a single video prior knowledge because it is highly under-constrained. from a single video requires
Introduction GCD(Generative camera dolly) GCD(Generative camera dolly) essentially conceives a virtual camera that can move around with up to six move around with up to six degrees of freedom degrees of freedom, reveal significant portions of the scene that are otherwise unseen otherwise unseen. Reconstruct hidden objects Reconstruct hidden objects behind occlusions, all within complex dynamic scenes complex dynamic scenes, even when the contents are moving contents are moving.
Relative work Dynamic scene reconstruction Dynamic scene reconstruction Video diffusion models Video diffusion models 3D and 4D generation 3D and 4D generation Object permanence and Object permanence and amodal amodal completion completion
Approach Camera viewpoint control Camera viewpoint control Video conditioning Video conditioning
Approach RGB frames captured from a single camera perspective. Input camera extrinsic matrix Target camera extrinsic matrix Model f tasked with predicting a video Cam intrinsics matrix
Approach video conditioning To accurately perform dynamic synthesis: High Level High Level (Infer the occluded regions, based on world knowledge as well as other observed frame.) Low Level Low Level (Analyze the visible geometry shapes appearance.) of the input video required.
Approach video conditioning By SVD, First stream First stream calculates the CLIP embedding of the incoming Image to condiction the U-net via cross attention. Second stream Second stream channel-concatenates the VAE-encoded image with all frames of
Approach video conditioning By SVD, First stream First stream calculates the CLIP embedding of the incoming Image to condiction the U-net via cross attention. Second stream Second stream channel-concatenates the VAE-encoded image with all frames of
Datasets Kubric Kubric- -4D ParrallelDomain ParrallelDomain- -4D Task details Task details 4D 4D
Datasets Kubric Kubric- -4D ParrallelDomain ParrallelDomain- -4D Task details Task details 4D 4D