LoGoColor: Local-Global 3D Colorization for 360$^{\circ}$ Scenes


Abstract

Single-channel 3D reconstruction is widely used in fields such as robotics and medical imaging. While these methods are good at reconstructing 3D geometry, their outputs are typically uncolored 3D models, making 3D colorization necessary for visualization. Recent 3D colorization studies address this problem by distilling 2D image colorization models. However, these approaches suffer from an inherent inconsistency of 2D image models. This results in colors being averaged during training, leading to monotonous and oversimplified results, particularly in complex 360$^{\circ}$ scenes. In contrast, we aim to preserve color diversity by generating a new set of consistently colorized training views, thereby suppressing the averaging process. Nevertheless, mitigating the averaging process introduces a new challenge: ensuring strict multi-view consistency across these colorized views. To achieve this, we propose LoGoColor, a pipeline designed to preserve color diversity by eliminating this guidance-averaging process with a 'Local-Global' approach: we partition the scene into subscenes and explicitly tackle both inter-subscene and intra-subscene consistency using a fine-tuned multi-view diffusion model. We demonstrate our method achieves quantitatively and qualitatively more consistent and plausible 3D colorization on complex 360$^{\circ}$ scenes than existing methods.

TL;DR: LoGoColor eliminates the color-averaging limitations of prior methods by generating locally and globally consistent multi-view colorized training images, enabling diverse and consistent 3D colorization for complex 360$^{\circ}$ scenes.

Method Overview

Overview Figure
We first reconstruct single-channel 3D Gaussians from multi-view grayscale images to recover scene geometry. Using this geometry, we decompose the scene into subscenes and select their corresponding base views. In parallel, we fine-tune a multi-view diffusion model to transfer color from reference views. We then calibrate global consistency among the base views and propagate color across all training views, ultimately producing a fully colorized 3D Gaussian model.

Video Results

Truck

Input Grayscale Video
Our Colorized Video

Counter

Input Grayscale Video
Our Colorized Video

Garden

Input Grayscale Video
Our Colorized Video

Comparison


Counter

ChromaDistill

ColorMNet

ColorNeRF

Ours


Bonsai

ChromaDistill

ColorMNet

ColorNeRF

Ours

Garden

ChromaDistill

ColorMNet

ColorNeRF

Ours


Truck

ChromaDistill

ColorMNet

ColorNeRF

Ours


Horse

ChromaDistill

ColorMNet

ColorNeRF

Ours



54bf355ca7e08ed1bc86f5772e564ac0f92981ca25dab24d86b694e915fc4c43

ChromaDistill

ColorMNet

ColorNeRF

Ours


eb4cf52988f805e6fce11d1b239fa9de32eb157364cff06ebac0aa50e0a46567

ChromaDistill

ColorMNet

ColorNeRF

Ours