LoGoColor: Local-Global 3D Colorization for 360$^{\circ}$ Scenes


Abstract

Single-channel 3D reconstruction is widely used in fields such as robotics and medical imaging. While this line of work excels at reconstructing 3D geometry, the outputs are not colored 3D models, thus 3D colorization is required for visualization. Recent 3D colorization studies address this problem by distilling 2D image colorization models. However, these approaches suffer from an inherent inconsistency of 2D image models. This results in colors being averaged during training, leading to monotonous and oversimplified results, particularly in complex 360$^{\circ}$ scenes. In contrast, we aim to preserve color diversity by generating a new set of consistently colorized training views, thereby bypassing the averaging process. Nevertheless, eliminating the averaging process introduces a new challenge: ensuring strict multi-view consistency across these colorized views. To achieve this, we propose LoGoColor, a pipeline designed to preserve color diversity by eliminating this guidance-averaging process with a 'Local-Global' approach: we partition the scene into subscenes and explicitly tackle both inter-subscene and intra-subscene consistency using a fine-tuned multi-view diffusion model. We demonstrate that our method achieves quantitatively and qualitatively more consistent and plausible 3D colorization on complex 360$^{\circ}$ scenes than existing methods, and validate its superior color diversity using a novel Color Diversity Index.

TL;DR: LoGoColor eliminates the color-averaging limitations of prior methods by generating locally and globally consistent multi-view colorized training images, enabling diverse and consistent 3D colorization for complex 360$^{\circ}$ scenes.

Method Overview

Overview Figure
We first reconstruct single-channel 3D Gaussians from multi-view grayscale images to recover scene geometry. Using this geometry, we decompose the scene into subscenes and select their corresponding base views. In parallel, we fine-tune a multi-view diffusion model to transfer color from reference views. We then calibrate global consistency among the base views and propagate color across all training views, ultimately producing a fully colorized 3D Gaussian model.

Video Results

Truck

Input Grayscale Video
Our Colorized Video

counter

Input Grayscale Video
Our Colorized Video

garden

Input Grayscale Video
Our Colorized Video

Comparison


Truck

ChromaDistill

CD-modified

ColorNeRF

Ours


Train

ChromaDistill

CD-modified

ColorNeRF

Ours

counter

ChromaDistill

CD-modified

ColorNeRF

Ours

bonsai

ChromaDistill

CD-modified

ColorNeRF

Ours

garden

ChromaDistill

CD-modified

ColorNeRF

Ours

flower

ChromaDistill

CD-modified

ColorNeRF

Ours

trex

ChromaDistill

CD-modified

ColorNeRF

Ours

bicycle

ChromaDistill

CD-modified

ColorNeRF

Ours

Horse

ChromaDistill

CD-modified

ColorNeRF

Ours

kitchen

ChromaDistill

CD-modified

ColorNeRF

Ours

M60

ChromaDistill

CD-modified

ColorNeRF

Ours

room

ChromaDistill

CD-modified

ColorNeRF

Ours

stump

ChromaDistill

CD-modified

ColorNeRF

Ours