|  | @@ -1,4 +1,4 @@
 | 
	
		
			
				|  |  | -%!TEX root = ceres.tex
 | 
	
		
			
				|  |  | +%!TEX root = ceres-solver.tex
 | 
	
		
			
				|  |  |  \chapter{Solving}
 | 
	
		
			
				|  |  |  Effective use of Ceres requires some familiarity with the basic components of a nonlinear least squares solver. 
 | 
	
		
			
				|  |  |  
 | 
	
	
		
			
				|  | @@ -59,7 +59,7 @@ For all but the smallest problems the solution of~\eqref{eq:normal} in each iter
 | 
	
		
			
				|  |  |  \subsection{\texttt{DENSE\_QR}}
 | 
	
		
			
				|  |  |  For small problems (a couple of hundred parameters and a few thousand residuals) with relatively dense Jacobians, \texttt{DENSE\_QR} is the method of choice~\cite{bjorck1996numerical}. Let $J = QR$ be the QR-decomposition of $J$, where $Q$ is an orthonormal matrix and $R$ is an upper triangular matrix~\cite{trefethen1997numerical}. Then it can be shown that the solution to~\eqref{eq:normal} is given by
 | 
	
		
			
				|  |  |  \begin{align}
 | 
	
		
			
				|  |  | -	\Delta x^* = -R^{-1}Q^\top f
 | 
	
		
			
				|  |  | +    \Delta x^* = -R^{-1}Q^\top f
 | 
	
		
			
				|  |  |  \end{align}
 | 
	
		
			
				|  |  |  Ceres uses \texttt{Eigen}'s dense QR decomposition routines.
 | 
	
		
			
				|  |  |  
 | 
	
	
		
			
				|  | @@ -67,7 +67,7 @@ Ceres uses \texttt{Eigen}'s dense QR decomposition routines.
 | 
	
		
			
				|  |  |  \subsection{\texttt{SPARSE\_NORMAL\_CHOLESKY}}
 | 
	
		
			
				|  |  |  Large non-linear least square problems are usually sparse. In such cases, using a dense QR factorization is inefficient. Let $H = R^\top R$ be the Cholesky factorization of the normal equations, where $R$ is an upper triangular matrix, then the  solution to ~\eqref{eq:normal} is given by
 | 
	
		
			
				|  |  |  \begin{equation}
 | 
	
		
			
				|  |  | -	\Delta x^* = R^{-1} R^{-\top} g.
 | 
	
		
			
				|  |  | +    \Delta x^* = R^{-1} R^{-\top} g.
 | 
	
		
			
				|  |  |  \end{equation}
 | 
	
		
			
				|  |  |  The observant reader will note that the $R$ in the Cholesky factorization of $H$ is the same upper triangular matrix $R$ in the QR factorization of $J$. Since $Q$ is an orthonormal matrix, $J=QR$ implies that $J^\top J = R^\top Q^\top Q R = R^\top R$.
 | 
	
		
			
				|  |  |  
 | 
	
	
		
			
				|  | @@ -142,7 +142,7 @@ factorization. Ceres implements this strategy as the \texttt{SPARSE\_SCHUR} solv
 | 
	
		
			
				|  |  |  \subsection{\texttt{CGNR}}
 | 
	
		
			
				|  |  |  For general sparse problems, if the problem is too large for \texttt{CHOLMOD} or a sparse linear algebra library is not linked into Ceres, another option is the \texttt{CGNR} solver. This solver uses the Conjugate Gradients solver on the {\em normal equations}, but without forming the normal equations explicitly. It exploits the relation
 | 
	
		
			
				|  |  |  \begin{align}
 | 
	
		
			
				|  |  | -	H x = J^\top J x = J^\top(J x)
 | 
	
		
			
				|  |  | +    H x = J^\top J x = J^\top(J x)
 | 
	
		
			
				|  |  |  \end{align}
 | 
	
		
			
				|  |  |  When the user chooses \texttt{ITERATIVE\_SCHUR} as the linear solver, Ceres automatically switches from the exact step algorithm to an inexact step algorithm.
 | 
	
		
			
				|  |  |  
 | 
	
	
		
			
				|  | @@ -198,21 +198,30 @@ Thus, the aim of the \texttt{SCHUR} ordering algorithm is to identify the larges
 | 
	
		
			
				|  |  |  \texttt{Solver::Options} controls the overall behavior of the solver. We list the various settings and their default values below.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |  \begin{enumerate}
 | 
	
		
			
				|  |  | -\item{\texttt{minimizer\_type}}(\texttt{LEVENBERG\_MARQUARDT}) The  minimization algorithm used by Ceres. \texttt{LEVENBERG\_MARQUARDT} is currently the only valid value.
 | 
	
		
			
				|  |  | +    
 | 
	
		
			
				|  |  | +\item{\texttt{trust\_region\_strategy\_type}}(\texttt{LEVENBERG\_MARQUARDT}) The  trust region step computation algorithm used by Ceres. \texttt{LEVENBERG\_MARQUARDT} is currently the only valid value.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |  \item{\texttt{max\_num\_iterations}}(\texttt{50}) Maximum number of iterations for Levenberg-Marquardt.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -\item{\texttt{max\_solver\_time\_sec}}(\texttt{1e9}) Maximum amount of time (in seconds) for which the solver should run.
 | 
	
		
			
				|  |  | +\item{\texttt{max\_solver\_time\_sec}} ($10^9$) Maximum amount of time (in seconds) for which the solver should run.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |  \item{\texttt{num\_threads}}(\texttt{1})
 | 
	
		
			
				|  |  |  Number of threads used by Ceres to evaluate the Jacobian.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -\item{\texttt{tau}}(\texttt{1e-4}) Initial value of the regularization parameter $\mu$ used by the Levenberg-Marquardt algorithm. The size of this parameter indicate the user's guess of how far the initial solution is from the minimum. Large values indicates that the solution is far away.
 | 
	
		
			
				|  |  | +\item{\texttt{initial\_trust\_region\_radius} ($10^4$)} The size of the initial trust region. When the \texttt{LEVENBERG\_MARQUARDT} strategy is used, the reciprocal of this number is the initial regularization parameter.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +\item{\texttt{max\_trust\_region\_radius} ($10^{16}$)} The trust region radius is not allowed to grow beyond this value.
 | 
	
		
			
				|  |  | +\item{\texttt{max\_trust\_region\_radius} ($10^{-32}$)} The solver terminates, when the trust region becomes smaller than this value.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +\item{\texttt{min\_relative\_decrease}}($10^{-3}$) Lower threshold for relative decrease before a Levenberg-Marquardt step is acceped.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -\item{\texttt{min\_relative\_decrease}}(\texttt{1e-3}) Lower threshold for relative decrease before a Levenberg-Marquardt step is acceped.
 | 
	
		
			
				|  |  | +\item{\texttt{lm\_min\_diagonal} ($10^6$)} The \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to regularize the the trust region step. This is the lower bound on the values of this diagonal matrix.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | +\item{\texttt{lm\_max\_diagonal} ($10^{32}$)}  The \texttt{LEVENBERG\_MARQUARDT} strategy, uses a diagonal matrix to regularize the the trust region step. This is the upper bound on the values of this diagonal matrix.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -\item{\texttt{function\_tolerance}}(\texttt{1e-6}) Solver terminates if
 | 
	
		
			
				|  |  | +\item{\texttt{max\_num\_consecutive\_invalid\_steps} (5)} The step returned by a trust region strategy can sometimes be numerically invalid, usually because of conditioning issues. Instead of crashing or stopping the optimization, the optimizer can go ahead and try solving with a smaller trust region/better conditioned problem. This parameter sets the number of consecutive retries before the minimizer gives up.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  | +\item{\texttt{function\_tolerance}}($10^{-6}$) Solver terminates if
 | 
	
		
			
				|  |  |  \begin{align}
 | 
	
		
			
				|  |  |  \frac{|\Delta \text{cost}|}{\text{cost}} < \texttt{function\_tolerance}
 | 
	
		
			
				|  |  |  \end{align}
 | 
	
	
		
			
				|  | @@ -224,9 +233,9 @@ where, $\Delta \text{cost}$ is the change in objective function value (up or dow
 | 
	
		
			
				|  |  |  \end{equation}
 | 
	
		
			
				|  |  |  where $\|\cdot\|_\infty$ refers to the max norm, and $x_0$ is the vector of initial parameter values.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -\item{\texttt{parameter\_tolerance}}(\texttt{1e-8}) Solver terminates if 
 | 
	
		
			
				|  |  | +\item{\texttt{parameter\_tolerance}}($10^{-8}$) Solver terminates if 
 | 
	
		
			
				|  |  |  \begin{equation}
 | 
	
		
			
				|  |  | -	\frac{\|\Delta x\|}{\|x\| + \texttt{parameter\_tolerance}} < \texttt{parameter\_tolerance}
 | 
	
		
			
				|  |  | +    \frac{\|\Delta x\|}{\|x\| + \texttt{parameter\_tolerance}} < \texttt{parameter\_tolerance}
 | 
	
		
			
				|  |  |  \end{equation}
 | 
	
		
			
				|  |  |  where $\Delta x$ is the step computed by the linear solver in the current iteration of Levenberg-Marquardt.
 | 
	
		
			
				|  |  |  
 | 
	
	
		
			
				|  | @@ -236,6 +245,8 @@ where $\Delta x$ is the step computed by the linear solver in the current iterat
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |  \item{\texttt{preconditioner\_type}}(\texttt{JACOBI}) The preconditioner used by the iterative linear solver. The default is the block Jacobi preconditioner. Valid values are (in increasing order of complexity) \texttt{IDENTITY},\texttt{JACOBI}, \texttt{SCHUR\_JACOBI}, \texttt{CLUSTER\_JACOBI} and \texttt{CLUSTER\_TRIDIAGONAL}.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | +\item{\texttt{sparse\_linear\_algebra\_library} (\texttt{SUITE\_SPARSE})} Ceres supports the use of two sparse linear algebra libraries, \texttt{SuiteSparse}, which is enabled by setting this parameter to \texttt{SUITE\_SPARSE} and \texttt{CXSparse}, which can be selected by setting this parameter to $\texttt{CX\_SPARSE}$. \texttt{SuiteSparse} is a sophisticated and complex sparse linear algebra library and should be used in general. If your needs/platforms prevent you from using \texttt{SuiteSparse}, consider using \texttt{CXSparse}, which is a much smaller, easier to build library. As can be expected, its performance on large problems is not comparable to that of \texttt{SuiteSparse}.
 | 
	
		
			
				|  |  | +
 | 
	
		
			
				|  |  |  \item{\texttt{num\_linear\_solver\_threads}}(\texttt{1}) Number of threads used by the linear solver. 
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |  \item{\texttt{num\_eliminate\_blocks}}(\texttt{0}) 
 | 
	
	
		
			
				|  | @@ -258,12 +269,20 @@ For Schur reduction based methods, the first 0 to num blocks are
 | 
	
		
			
				|  |  |      to it if the \texttt{ordering\_type} is set to \texttt{USER} and the ordering vector is
 | 
	
		
			
				|  |  |      non-empty.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | +\item{\texttt{use\_block\_amd} (\texttt{true})} By virtue of the modeling layer in Ceres being block oriented,
 | 
	
		
			
				|  |  | +all the matrices used by Ceres are also block oriented.
 | 
	
		
			
				|  |  | +When doing sparse direct factorization of these matrices, the
 | 
	
		
			
				|  |  | +fill-reducing ordering algorithms can either be run on the
 | 
	
		
			
				|  |  | +block or the scalar form of these matrices. Running it on the
 | 
	
		
			
				|  |  | +block form exposes more of the super-nodal structure of the
 | 
	
		
			
				|  |  | +matrix to the Cholesky factorization routines. This leads to
 | 
	
		
			
				|  |  | +substantial gains in factorization performance. Setting this parameter to true, enables the use of a block oriented Approximate Minimum Degree ordering algorithm. Settings it to \texttt{false}, uses a scalar AMD algorithm. This option only makes sense when using \texttt{sparse\_linear\_algebra\_library = SUITE\_SPARSE} as it uses the \texttt{AMD} package that is part of \texttt{SuiteSparse}.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |  \item{\texttt{linear\_solver\_min\_num\_iterations}}(\texttt{1}) Minimum number of iterations used by the linear solver. This only makes sense when the linear solver is an iterative solver, e.g., \texttt{ITERATIVE\_SCHUR}.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  |  \item{\texttt{linear\_solver\_max\_num\_iterations}}(\texttt{500}) Minimum number of iterations used by the linear solver. This only makes sense when the linear solver is an iterative solver, e.g., \texttt{ITERATIVE\_SCHUR}.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -\item{\texttt{eta}}(\texttt{1e-1})
 | 
	
		
			
				|  |  | +\item{\texttt{eta}} ($10^{-1}$)
 | 
	
		
			
				|  |  |   Forcing sequence parameter. The truncated Newton solver uses
 | 
	
		
			
				|  |  |      this number to control the relative accuracy with which the
 | 
	
		
			
				|  |  |       Newton step is computed. This constant is passed to ConjugateGradientsSolver which uses
 | 
	
	
		
			
				|  | @@ -329,12 +348,12 @@ when \texttt{lsqp\_iterations\_to\_dump} is non-empty.  There are three options
 | 
	
		
			
				|  |  |       results are compared, and if they differ substantially, details
 | 
	
		
			
				|  |  |       are printed to the log.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -\item{\texttt{gradient\_check\_relative\_precision}}(\texttt{1e-8})
 | 
	
		
			
				|  |  | +\item{\texttt{gradient\_check\_relative\_precision}} ($10^{-8}$)
 | 
	
		
			
				|  |  |    Relative precision to check for in the gradient checker. If the
 | 
	
		
			
				|  |  |    relative difference between an element in a Jacobian exceeds
 | 
	
		
			
				|  |  |    this number, then the Jacobian for that cost term is dumped.
 | 
	
		
			
				|  |  |  
 | 
	
		
			
				|  |  | -\item{\texttt{numeric\_derivative\_relative\_step\_size}}(\texttt{1e-6})
 | 
	
		
			
				|  |  | +\item{\texttt{numeric\_derivative\_relative\_step\_size}} ($10^{-6}$)
 | 
	
		
			
				|  |  |   Relative shift used for taking numeric derivatives. For finite
 | 
	
		
			
				|  |  |       differencing, each dimension is evaluated at slightly shifted
 | 
	
		
			
				|  |  |       values, \eg for forward differences, the numerical derivative is
 | 
	
	
		
			
				|  | @@ -348,8 +367,8 @@ when \texttt{lsqp\_iterations\_to\_dump} is non-empty.  There are three options
 | 
	
		
			
				|  |  |       The finite differencing is done along each dimension. The
 | 
	
		
			
				|  |  |       reason to use a relative (rather than absolute) step size is
 | 
	
		
			
				|  |  |       that this way, numeric differentiation works for functions where
 | 
	
		
			
				|  |  | -     the arguments are typically large (e.g. 1e9) and when the
 | 
	
		
			
				|  |  | -     values are small (e.g. 1e-5). It is possible to construct
 | 
	
		
			
				|  |  | +     the arguments are typically large (e.g. $10^9$) and when the
 | 
	
		
			
				|  |  | +     values are small (e.g. $10^{-5}$). It is possible to construct
 | 
	
		
			
				|  |  |       "torture cases" which break this finite difference heuristic,
 | 
	
		
			
				|  |  |       but they do not come up often in practice.
 | 
	
		
			
				|  |  |  
 |