xmerge - a program for merging and straightening images

Copyright (C) 2003 Johan Borg 

xmerge is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
               
xmerge is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
                          
You should have received a copy of the GNU General Public License
along with xmerge; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

Included in this distribution is a modified version of the SQP
solver hqp, by Ruediger Franke. hqp in turn includes parts of the
meschach library by David E. Steward & Zbigniew Leyk.


Using xmerge


Things you should know:
* reading the whole documentation is probably quite essential 
* xmerge only operates on ppm-files with 24bpp color in binary form (P6)
* xmerge only supports X color depths of 16, 24 and 32bpp, not all of them
  heavily tested.


Introduction to xmerge:
When the user has provided a set of rules for how the image(s) should look 
(which points should overlap, horizontal and vertical references, etc)
xmerge use an SQP solver for calculating how the images should be placed 
for this rules to be fulfilled, while minimizing more severe types of 
distortion of the images. The sqp solver used is HQP, by R. Franke. 
The original implementation is intended for things related to optimal control 
problems, and quite bloated for simpler applications where only the good 
sqp solver is used. As a result, I decided to strip it down a bit. 
Unfortunately, I don't have the time or interest to complete this effort, 
so the contents of hqp/ is in a somewhat sorry shape (while usable, many 
options and alternative parts of the algorithm are not included). 
In any case, if you wish to use hqp for anything else, I strongly suggest 
you take a look at the original sources and their documentation.
Also, the use of SQP for solving this problem is probably quite a bit of
overkill, it is quite possible that some heuristic can be used to transform
the problem into a simpler type of optimization.
When the "optimal" mapping of the images has been determined, the user
is supposed to select edges, internal to the mapped images, which are used 
for calculating the weighting of the individual images when two images overlap.
(areas of the image, but outside of these edges are not used at all)
The working image used at this stage is an un-interpolated version of the final
image, as the interpolation takes considerable time, and even without
interpolation generating the image is quite slow.
The preview and final images are generated using interpolation by Elliptically
Weighted Averages, which seem to give superior results compared to most other 
methods, while being considerably slower. A truncated Gaussian function is 
used for filtering, but use of sin(x)/x may give superior results if a large 
enough window is used. This option might be implemented in future versions.

The mapping of the images used are on the form 
X=x*(a+b*y)+c*y+d, Y=y*(e+f*x)+g*x+h
which in retrospect may be a suboptimal choice, as the inverse coordinate 
mapping is really messy. Furthermore, other mappings should give better 
results when applied to specific uses, like merging photographs.

There are some plans for implementing more mappings at a later time. 
A problem inverse mappings for photographs is that a surface in 3D only has 
6 degrees of freedom, while parts of the current version assumes 8 degrees.
One rather attractive option may be to implement 2 degrees of 2nd order 
distortion, which could compensate for imperfect optical components.
(Any opinions on whether this is a good idea? Any ideas about better ways of 
modeling distortion from optics? Something with circular symmetry perhaps?)

A more generic approach, where mappings can have arbitrary freedom 
and different mappings may be combined (Ex: correcting for optical errors 
before inverse Z mapping, and then mapping the result on a sphere)
would be really nice, but far much more work. 




Command line options ("xmerge -h"):

  -h            this message
  -o            set output file for final image, if written (default test.ppm)
  -O            set output file for mapping data, if written
  -l            load saved mapping data, mutually exclusive to specifying ppm files
  -m            magnification applied when solution is calculated (default 1.0)
  -M            magnification applied if mapping data are loaded
  -b            background color (default: 255,0,0  (red))
  -r            reference-color for gain calculation (gui: b) (default 255,255,255)
  -R            color gain applied if mapping data are loaded (default 0,0,0)
  -B            black level used when final image is created (default: 0,0,0)
                        (color gain or reference level must be adjusted accordingly)

 options implying non interactive operation: (in order of execution)
  -s            solve loaded problem
  -W            write current mapping data
  -w            write final image

 Input file(s) can be either -l  OR a list of ppm(P6) files.
 When image files are specified, the images are placed on rows from left to right.
 The special file name / causes remaining images to be placed on a new row one position down.
 Empty arguments ("" in most shells) can be used to create empty locations


Example:

$ xmerge -o foo.ppm -O foo.txt foo00.ppm foo01.ppm foo02.ppm / foo10.ppm foo11.ppm foo12.ppm

will start xmerge with six images in two rows and three columns, write final the final
image (when requested in the gui, by pressing "w") to the file foo.ppm, and write the 
current state of the variables used in the mapping to the file foo.txt (when "W" is 
used in the gui). This file can later be loaded into xmerge using "xmerge -l foo.txt".
Using the command line option "-w" together with "-l " it is possible to write
the final image corresponding to the saved data, without using the gui.


User Manual for the graphical user interface:

There are two main modes of operation, referred to as Source and Target. 
Generic navigation in both modes are performed by clicking with the 2nd 
(center/middle) button of the pointing device at the point about which the 
image is to be centered. This is the only means for moving around on an 
image zoomed to a size larger than the program window. 

keyboard commands valid in both modes:
 m	switch between source and target mode
 +   	zoom in
 -	zoom out
 w	write the final image to disk
 W	write a loadable listfile describing the current state of the problem
 s	solve the problem specified by the relations set in the Source mode
 r	reset the solution to its initial state, useful if solving for 
	impossible relations has been attempted, with a solution from which
	the solver is unable to continue, even if the relations are corrected.
 a	auto-improve the matching of the regions specifying point-relations, 
	by using the current solution when mapping pixels. This command usually 
	mess things up more than it improves, and all point relations are 
	affected, even those manually matched.
 q	exists xmerge
 
In Source mode an array of input images are displayed. This mode is used for 
specifying the relations used for calculating the merging of the images.
There are two types of relations available in this version: 

Point relations, specifying two points of two different images, which are 
supposed to map to the same point when the final image is produced. This is 
done by selecting an area of one image, moving the resulting sub-image to 
another input image, and clicking again when the regions of both images match. 
By default, xmerge tries to improve the match, so only a rough match is 
required. This feature can be turned off and on by pressing "i". Additionally, 
when the automatic improvement is turned off, it is possible to force the 
matching regions to stay at integers pixels. This is toggled with "I". 
The center points of these rectangles are used as the actual point-relations.
This type of relation can be removed by clicking in the blue/green rectangle 
specifying one side of a point-pair, or moved, by clicking in the red/yellow 
rectangle specifying the other side.

Directional relations, forcing a line to take a certain direction in the final 
image. When the source mode is entered, creation of point relations is the 
default operation. By typing "1".."4"  horizontal, vertical, 45deg and -45deg 
directional relations, respectively, are created. These relations are removed 
by clicking near the start of the line (featuring a small line indicating the 
desired direction), or moved by clicking near the end of the line.
 is used to return to creating point relations.

Additionally, brightness adjust operations are performed by typing "b" in 
source mode. When active, areas of the source images can be selected for 
calculation of the gain of each color, required to make the average color in 
these areas equal to the reference color specified at the command line. When an 
additional area is specified, the values from all areas selected within the 
image are combined. The gains can be set to 1 (with weight 0 when new areas are 
specified) by typing "B".  is used to return to creating point relations.

It should be remembered that each image has exactly 8 degrees of freedom, that
a point constraint share 2 relations between 2 images, while a direction 
constraint contributes only one relation, possibly shared between two images.
Specifying too many relation for an image or a set of images will invariably 
result in something useless, usually collapsing all images to cover one point. 
Even if the total number of constraints are not exceeded, it is still possible 
to achieve fucked up results if the relations can impossibly be satisfied 
(example: 3 horizontal relations in one image). Bad results will result when 
badly placed relations are used (example: it is possible to solve a case 
where 2x2 images are used, each sharing two points, but in order to match all 
the 4 point relations where all images meet at the center, the images are 
(except for special cases) stretched far from their desired shapes (the correct
thing is to only use 3 point relations where 4 images meet)).
Finally, 2 degrees of freedom of the whole problem is used for origin of the 
target (which is later discarded) and 1 degree for the scaling of the image

When all relations have been specified, "s" is used for solving the problem by 
using an SQP optimizer on a problem where the relations form equality 
constraints and a function describing absolute and relative scaling, skewing, 
stretching and rotation of the images is minimized. When the optimizer is 
finished (or has given up from failing to find a solution) xmerge is 
automatically switched to target mode. 

Keyboard commands in Source mode (summary):
1..4	angle relation submode
b	brightness adjust
B	reset brightness (for all images)
i	toggle automatic improvement of point-relations
I	toggle integer placement of point relations in manual mode
	abort operation in progress, or return to point relation creation,
	if no operation in progress

In Source mode, pressing and holding the 3rd (usually right) pointing device 
button hides the swapped images covering point relations, useful when manually 
matching images or for verifying the performance of the automatic match 
improvement.


Target mode displays a "fast" but inaccurate rendering of the final image.
In it's normal mode of operation, the outer edges of each image (yellow/black 
lines) used when the images are merged, are controlled, these edges are 
modified either by clicking on a corner and moving it or clicking on an edge to 
insert a new corner to be placed. Corners can be removed by pressing "d" when 
a corner is being moved. When two images overlap, ratio of the products of the 
distances to the two closest edges (of this type, not the actual edge of the 
image) of each image covering the pixel is used for weighting their individual
contributions.

By typing "c" the crop rectangle for the final image is selected (red/black 
lines) can be modified (click on an edge and drag to a new position

By typing "p" preview images of selected regions can be generated, without 
writing the final image to disk. These preview images are displayed by 
executing execvp "xv -" (so you better make sure you have xv somewhere in your 
PATH, if you wish to use this feature). 

keyboard commands in Target mode: (summary)
c	select crop rectangle (final image size) selection mode
p	select preview mode 
d	delete current corner (when moving image-edge)
	abort operation in progress, or return to image edge modifications,
	if no operation in progress

In Target mode, clicking the 3rd (usually right) pointing device button is 
used for changing the order in which overlapping images are drawn by placing
the bottom image at the top. 

xmerge prints rather large amounts of debugging information to stdout, most of
which are probably rather incomprehensible, but occasional messages about
the progress of current operation, may still be useful.


That should be enough, hopefully...

 SourceForge.net Logo