Free viewpoint
video
construction typically relies
on a multi-camera setup as shown in Fig. 1.
In general, the quality of the rendered views
increases with the number of available cameras. However, equipment
costs and often the complexity costs required for processing increase
as well. We
therefore consider a classical tradeoff between quality and costs by
limiting the number of cameras and compensating this by geometry
extraction.
The first step of
our algorithm consists of deriving intrinsic and
extrinsic
parameters for all cameras that relate the 2D images to a 3D world
coordinate
system as our geometry extraction and rendering algorithms require
knowledge of
these parameters. These parameters are computed from reference points
using a
standard calibration algorithm. In the next step, the object to be
extracted is segmented in all camera views. For that we use the
combination of an adaptive background subtraction
algorithm and Kalman filter tracking. The results of this step are
silhouette videos
that indicate the object’s contour for all cameras. The 3D volume
containing the object is reconstructed from the silhouette
images using an octree-based shape-from-silhouette algorithm.
The process starts by placing a cube into the virtual 3D world that
ideally represents the 3D bounding cube of the
object. The size and position of the cube are obtained by projecting
the 2D
bounding boxes of all views into 3D space and analyzing the resulting
intersections in 3D space. This initial cube of the octree, which is
referred as
level 0, is subdivided into 8 octants each of which being a cube
itself. For
each octant one of the following actions is taken:
- A cube that is completely inside the
silhouettes of all views is not subdivided
further, i.e. it is completely inside the object to be reconstructed,
- A cube that is completely outside of at least
one silhouette is omitted, i.e. it is
outside the object,
- A cube that does not fall into one of the above
two categories is further
subdivided.
This approach is
recursively applied to all
octants, until a contour approximation with a particular accuracy in
each view is achieved. Illustratively, the method of
shape-from-silhouette can be
compared to a carving process or the way a sculptor builds a figure
from a block of marble. After visual hull segmentation the object’s
surface is
extracted from the voxel model by applying a marching cubes algorithm
and representing it by a 3D mesh. Then the object’s surface is smoothed
applying a first order neighborhood smoothing. Finally the number of
surface triangles is drastically reduced using a standard
edge-collapsing algorithm, which mainly analyzes normal vectors of
adjacent faces. The transformation steps from the octree model to the
reduced wireframe surface model are shown in Fig. 2.
Fig.2:
Surface
transformation: (a) voxel model, (b) marching cubes, (c) surface
smoothing and (d) reduced wireframe
View-dependent Texture Mapping
For
photo-realistic rendering, the original videos are mapped onto the
reconstructed geometry. Natural materials may appear very different
from different viewing directions depending on their reflectance
properties and the lighting conditions. Static texturing (e.g.
interpolating the available views) therefore often leads to poor
rendering results, the so-called "painted shoebox" effect. We have
therefore developed a view-dependent texture mapping that more closely
approximates natural appearance when navigating through the scene.
As illustrated in Fig. 3, the textures are projected onto the geometry
using the calibration information. For each projected texture a normal
vector ni is defined
pointing into the direction of the original camera. For generation of a
virtual view into a certain direction vVIEW a weight is
calculated for each texture, which depends on the angle between vVIEW
and ni.
The weighted interpolation ensures that at original camera positions
the virtual view is exactly the original view. The closer the virtual
viewpoint is to an original camera position the greater the influence
of the corresponding texture on the virtual view.
Fig.3: Weighted virtual view
interpolation