A Comparison of the Iterative Regularization Technique and of the Markov Chain Monte Carlo Method: Estimation of the Source Term in a Three-dimensional Heat Conduction Problem

This work aims at the comparison between a classical regularization technique and a method within the Bayesian framework, as applied to the solution of an inverse heat conduction problem. The two solution approaches compared are Alifanov's iterative regulariza-tion technique and the Markov chain Monte Carlo method. The inverse problem examined in this work deals with the estimation of a transient source term in a heat conduction problem.


INTRODUCTION
Inverse heat transfer problems deal with the estimation of unknown quantities appearing in the mathematical formulation of physical processes in thermal sciences, by using measurements of temperature, heat flux, radiation intensities, etc. Originally, inverse heat transfer problems have been associated with the estimation of an unknown boundary heat flux, by using temperature measurements taken below the boundary surface of a heat conducting medium [1][4] [5][8] [9][11] [12][14] [16].Inverse problems are mathematically classified as illposed, whereas standard direct (forward) heat transfer problems are well-posed.The solution of a well-posed problem must satisfy the conditions of existence, uniqueness and stability with respect to the input data.The existence and uniqueness of a solution for an inverse heat transfer problem can be mathematically proved only for some special cases, and the inverse problem is very sensitive to random errors in the measured input data.Therefore, special techniques are required for the solution of inverse problems [1][4] [5][8] [9] [11] [12][14] [16].Classical regularization techniques have been developed and successfully applied to the solution of inverse problems, such as Tikhonov's regularization technique [16], Alifanov's itera-tive regularization technique [1] and Beck's sequential function specification technique [5].On the other hand, with the recent availability of fast and affordable computational resources, sampling methods have become more popular within the community dealing with the solution of inverse problems.These methods are backed up by the statistical theory within the Bayesian framework, being quite simple in terms of application and not restricted to any prior distribution for the unknowns or models for the measurement errors [6][8] [9] [11][12].This work aims at a comparison between a classical regularization technique and a method within the Bayesian framework, as applied to the solution of an inverse heat conduction problem.The two solution approaches compared are Alifanov's iterative regularization technique [1][2] [3][10] [11] [12][14] [15] [17] and the Markov chain Monte Carlo (MCMC) method [6][8] [9][11] [12].
In Alifanov's Iterative Regularization technique [1][2] [3][10] [11][12] [14][15] [17], the number of iterations is so chosen that reasonably stable solutions are obtained.Therefore, there is no need to modify the original objective function, as opposed to Tikhonov's regularization approach.Another difference between Alifanov's and Tikhonov's regularization approaches is that, in the first, the unknown function is not required to be discretized a priori.On the other hand, with Alifanov's approach all the required mathematical derivations are made in a space of functions.The discretization of the function, resulting from the fact that measurements are taken at discrete times and positions, is then only made a posteriori.Nevertheless, the iterative regularization approach is quite general and can also be applied to the estimation of functions parameterized a priori, as well as to linear and non-linear inverse problems.It is an extremely robust technique, which converges fast and is stable with respect to the measurement errors.Also, it can be systematically applied to different types of inverse problems, by following the same basic steps.
Classical regularization methods are not based on the modeling of prior information and related uncertainty about the unknown parameters.On the other hand, in the statistical inversion approach, which is based on Bayesian statistics, the probability distribution models for the measurements and for the unknowns are constructed separately and explicitly [6][8] [9][11] [12].The solution of the inverse problem within the Bayesian framework is recast in the form of statistical inference from the posterior probability density, which is the model for the conditional probability distribution of the unknown parameters given the measurements.The measurement model incorporating the related uncertainties is called the likelihood, that is, the conditional probability of the measurements given the unknown parameters.The model for the unknowns that reflects all the uncertainty of the parameters without the information conveyed by the measurements, is called the prior model.The formal mechanism to combine the new information (measurements) with the previously available information (prior) is known as the Bayes' theorem.Therefore, the term Bayesian is often used to describe the statistical inversion approach, which is based on the following principles [2]: 1.All variables included in the model are modeled as random variables; 2. The randomness describes the degree of information concerning their realizations; 3. The degree of information concerning these values is coded in probability distributions; and 4. The solution of the inverse problem is the posterior probability distribution, from which distribution point estimates and other statistics are computed.
The inverse problem examined in this work involves the estimation of the transient heat source term, located in a known region of a three-dimensional semi-infinite medium.Analytic solutions for the Direct, Sensitivity and Adjoint Problems were obtained with the Classical Integral Transform Technique (CITT) [13] [15].Simulated temperature measurements non-intrusively taken at the top boundary of the region, as in a thermal tomographic approach, were used for the solution of the inverse problem.The measurement errors were assumed to be Gaussian, additive, with zero mean and known covariance matrix.A smooth prior was used for the transient heat source estimation with MCMC.The two solution approaches were compared as applied to the estimation of different functional forms for the heat source function, as described next.

PHISICAL PROBLEM AND MATHEMATICAL FORMULATION
The physical problem examined in this paper consists of a three-dimensional domain, infinite in the x and y directions and semi-infinite in the z direction, as illustrated in figure 1.Heat transfer is neglected at the upper surface of the domain.It is assumed that the whole region under study is initially at a temperature T 0 .For times t > 0, a transient heat source term g(x,y,z,t) generates energy in the medium.It is assumed that heat transfer takes place only by conduction, and that the properties of the medium are constant.The source term is assumed to be limited to the regionx 1 < x < x 1 ,y 1 < y < y 1 and h < z < (h+z 1 ), so that it is zero outside this region (see figure 1).The mathematical formulation of this physical problem is given by:

T x y z t T T T g x y z t
Equations (1-3) are written in dimensionless form by defining the following dimensionless variables:  ( , , , )  ( , , , ) .
where h is the depth of the heat source inside the medium (see figure 1).Equations (1-3) become:

DIRECT PROBLEM
When the geometry, physical properties, heat source term, initial and boundary conditions are known, the formulation given by equations (5-7) provides a direct (forward) heat conduction problem, whose solution gives the temperature field in the time and space domains.The solution of the direct problem in this work was obtained analytically, by using the Classical Integral Transform Technique (CITT) [13] [15].The solution of the direct problem given by equations (5)(6)(7)

INVERSE PROBLEM
The inverse problem of interest in this paper is concerned with the estimation of the strength of the source term G(X,Y,Z,), known to be located inside the regionx 1 < x < x 1 , y 1 < y < y 1 and h < z < (h+z 1 ) (see figure 1).For the solution of the inverse problem, transient temperature measurements taken at the surface z = 0 are considered available.Further-more, all the quantities appearing in the mathematical formulation of the physical problem, except the source term, are considered as deterministically known.However, the temperature measurements contain experimental errors.
This paper presents two different techniques for the solution of the inverse problem of estimating the source term, namely: (i) The Conjugate Gradient Method with Adjoint Problem for function estimation, which assumes that the unknown function belongs to the Hilbert space of square integrable functions in the domain of interest [1] [17]; and (ii) The Markov Chain Monte Carlo (MCMC) method within the Bayesian framework, implemented via the Metropolis-Hastings algorithm [6][8] [9][11] [12].
These two techniques, as applied to the present inverse problem, are described next.

Conjugate Gradient Method with Adjoint Problem for function estimation.
For the solution of the inverse problem with the Conjugate Gradient Method with Adjoint Problem, no assumption is made a priori about the functional form of the unknown source term, except for the functional space that it belongs to.Despite the fact that the test cases examined in this paper involve the estimation of the transient strength of a heat source with known location, for the sake of generality, the conjugate gradient method with adjoint problem is derived below as applied for the estimation of a function that varies spatially as well as in time.The function estimation is performed by minimizing an objective functional, which is defined in terms of the difference between the experimental and estimated temperatures.It is assumed that the experimental errors are additive, uncorrelated and have a Gaussian distribution with zero mean and constant standard deviation [4] [9].For simplicity in the mathematical analysis, it is also assumed that the temperature measurements are continuous in the time domain.With such assumptions, the objective functional is defined by: where () and [ , , , ; ( , , , )] The values of the estimated temperatures are obtained through the solution of the direct problem, given by equation ( 8), by using estimates for the source term ( , , , ) G X Y Z  .In equation ( 9),  f is the duration of the experiment, and (.)  is the Dirac delta function.
The Conjugate Gradient Method with Adjoint Problem for function estimation is used to minimize the objective functional given by equation ( 9).This iterative minimization method requires the solution of two auxiliary problems, known as Sensitivity Problem and Adjoint Problem [1] The Sensitivity Problem is obtained by assuming that when the source term ( , , , ) . Thus, by replacing the temperature and the source term with their perturbed quantities in equations (5-7) that define the direct problem, and then subtracting from the resulting equations the original direct problem, we obtain the sensitivity problem defined by: ( , , , ) The solution of the Sensitivity Problem is also obtained with the CITT, as: ' ' ' '. dZ dY dX d For minimizing the objective functional given by equation ( 9), the estimated temperatures [ , , , ; ( , , , )]  must satisfy a constraint that is the solution of the Direct Problem, defined by equations (5-7), whose solution is given by equation (8).Therefore, in order to transform the constrained minimization problem into an unconstrained minimization problem, a Lagrange multiplier ( , , , ) X Y Z  is introduced in the formulation.The Lagrange multiplier is obtained with the solution of a problem adjoint to the sensitivity problem given by equations (10) to (12).The derivation of the Adjoint Problem for * ( , , , ) X Y Z  in the present inverse problem is presented in detail in [15].It is given by: where: ** , f and The solution of the Adjoint Problem is also obtained by using the CITT in the form: The Gradient Equation is obtained from the Adjoint Problem solution.By assuming that the unknown function belongs to the Hilbert space of square integrable functions in the space and time domains, the Gradient Equation is given by: As shown in [14], the mathematical development presented for the inverse problem results in three distinct problems called Direct Problem, Sensitivity Problem and Adjoint Problem defined by equations ( 5) to ( 7), ( 10) to ( 12) and ( 14) to (16) , respectively.With the solution of these problems, we obtain the functions ( , , , ) S G X Y Z  , given by equation ( 9).Thus, the estimation of the source term function is performed with an iterative procedure, by properly selecting the search direction and the search step size in each iteration.The iterative procedure of the Conjugate Gradient Method with Adjoint Problem is given by [ where k  is the search step size and ( , , , ) k d X Y Z  is the direction of descent, defined by: 1 ( , , , ) [ ( , , , )] ( , , , ).
where k  is the conjugation coefficient.
There are different versions of the Conjugate Gradient Method available, depending on the form of calculation of the direction of descent, given by equation ( 20 , , , )]} .( , , , ) ( , , , ) .
( , , , ) The iterative procedure defined by equations ( 19) to ( 22) is applied until the stopping criterion based on the discrepancy principle is satisfied.The Conjugate Gradient Method can provide stable solutions for the inverse problem, if the discrepancy principle is used for stopping the iterative procedure, thus avoiding that the high frequency oscillations be incorporated into the estimated functions.In the discrepancy principle, the iterative procedure is stopped when the following criterion is satisfied: where the value for the tolerance  is set to obtain stable solutions.In this case, we stop the iterative procedure when the residuals between measured,   m D  , and estimated,

 
, , , X Y Z  , temperatures are of the same order of magnitude of the random errors in the temperature measurements, that is: for m= 1, 2, …, M, where i  is the standard deviation of the measurement error at time i  .
For constant standard deviations, i.e., i   constant, we obtain the following value for the tolerance  , by substituting equation (24) into the objective functional given by equation ( 9): where M is the total number of sensors in the region.When the measurements do not contain experimental errors, the value for the tolerance  may be chosen as a sufficiently small num- ber, since the minimum value expected for the objective functional is zero.
The estimation of the unknown function   , , ,

G X Y Z  with the Conjugate Gradient
Method with Adjoint Problem, may be carried out by using the following basic steps: Step 1 -Suppose an initial guess G X Y Z  is available for the function to be esti- mated.Set k = 0.
Step 2 -Solve the Direct Problem ( 5) to (7) to calculate the estimated temperatures Step 3 -Check the stopping criterion given by equation (23).Continue if not satisfied.
Step 4 -Knowing the temperatures calculated in step 2,  

Markov Chain Monte Carlo (MCMC) method
Numerical sampling methods might be required when the posterior probability density function is not analytical [6][8] [9][11] [12].The numerical method mostly used to explore the state space of the posterior is the Monte Carlo simulation that approaches the expected local values of a function (vector of parameters P) by the sample mean.The Monte Carlo simulation is based on a large sample of the probability density function, in this case the posterior probability density function, ( | )  where D is the vector of measurements.Several sam- pling strategies were proposed, including the method of Markov Chain Monte Carlo (MCMC) that is the most powerful.The basic idea is to simulate a random walk the space of P that converges to a stationary distribution, which is the distribution of interest.
The MCMC method is an iterative version of the traditional Monte Carlo methods.The idea is to obtain a sample of the posterior distribution and calculate sample estimates of characteristics of this distribution using iterative simulation techniques based on Markov chains.A Markov chain is a stochastic process {P 0 , P 1 , P 2 , ...} such that the distribution of P i given all the previous values P 0 , ...,P i-1 depends only on P i-1 , that is: .
for any subset A. MCMC Methods require, in order to obtain a single distribution of equilibrium, that the chain be: -Homogeneous, that is, the possibility of transition from one state to another is invariant; -Irreducible, that is, each state can be reached from any other in a finite number of iterations; and -Aperiodic, that is, there are no absorbent states.Thus, a sufficient condition to obtain a single stationary distribution is that the process meets the following balance equation:  An important practical issue is how the initial values influence the behaviour of the chain.Thus, as the number of iterations increases, the chain gradually forgets the initial values and eventually converges to an equilibrium distribution.Thus, it is common that the early iterations are discarded.The most commonly used MCMC algorithms are the Metropolis-Hastings and Gibbs sampling [6][8] [9][11] [12].The Metropolis-Hastings algorithm is used in this work.
For the cases examined below, the transient variation of a source term with known spatial distribution is estimated, so that the vector of parameters of interest is written as   where π posterior (P) is the posterior probability density, that is, the conditional probability of the parameters P given the measurements D; ()  D | P is the likelihood function, which expresses the likelihood of different measurements outcomes D with P given; π prior (P) is the prior density distribution of the parameters, that is, the coded information about the parameters prior to the measurements; and ()  D is the marginal probability density of the measurements, which plays the role of a normalizing constant.In general, the probability ()  D is not explicit and is difficult to calculate.However, the knowledge of ()  D can be disregarded if the space of states of the posterior can be explored without knowing the normalization constant.Thus, the probability density function of the posterior can be written as: By assuming that the measurement errors are Gaussian random variables, with zero means and known covariance matrix W and that the measurement errors are additive and independent of the parameters P, the likelihood function can be expressed as [6][8][9] [11][12]: where N (=IM) is the total number of measurements and (P) is the vector of estimated temperatures, obtained from the solution of the direct problem at the specific times and sensor locations, with an estimate for the parameters P, and calculated with equation (8).The prior distribution is the codified knowledge about the parameters before D is measured.For this study, a non-informative smoothness prior is used for the estimation of the transient strengths of the source term   12 , ,..., . The smoothness prior is given in the form [9]  Equations ( 31) resemble first-order Tikhonov's regularization.However, there is a fundamental difference between the classical Tikhonov's regularization and the Bayesian approaches.Tikhonov regularization focuses in obtaining a stabilized form of the original minimum norm problem and is not designed to yield error estimates that would have a statistical interpretation.In contrast, Bayesian inference presumes that the uncertainties in the likelihood and prior models reflect the actual uncertainties.
The parameter α appearing in the smoothness prior is treated in this work as a hyperparameter, that is, it is estimated as part of the inference problem in a hierarchical model [2].The hyperprior density for this parameter is taken in the form of a Rayleigh distribution, given by [9]: where 0  is the center point of the Rayleigh distribution.
Therefore, the posterior distribution is given by: In order to implement the Markov Chain, a proposal distribution q(P*,P (n-1) ) is required, which gives the probability of moving from the current state in the chain P (n-1) to a new state P*.In this work, the proposal distribution q(P*,P (n-1) ) is taken as Gaussian, uncorrelated and with a standard deviation of 2x10 -2 P (n-1) , that is: * The new value P * is then tested and accepted with probability given by the ratio of Hastings (RH), that is: It is interesting to note that we need to known ( , ) P | D just up to a constant, since we are working with ratios between densities and such normalization constant is cancelled out in equation (35).We present next the steps for the Metropolis-Hastings algorithm [6][8][9] [11][12]: Step 1 -Set the iteration counter n = 0 and specify an initial value for the parameters to be estimated 0 P ; Step 2 -Generate a candidate value * P from the auxiliary distribution function () q * P | P ; Step 3 -Calculate the acceptance probability given by equation (35); Step 4 -Generate an auxiliary random sample from an uniform distribution u ~U[0,1]; Step 5 -If u ≤ ( 1) *

RH( )
n P , P , accept the candidate value and set ( 1)   * n  PP .Other- wise, reject the candidate value and set ( 1)  nn   PP ; Step 6 -Set the iteration counter n equal to n+1 and return to step 2.

RESULTS AND DISCUSSIONS
For all the cases examined below, the solution of the inverse problem was obtained by using simulated measurements with random errors.Temperature measurements were obtained from a known source term through the solution of the Direct Problem, given by equation ( 8), with added simulated noise, which was uncorrelated, Gaussian, with zero mean and known constant standard-deviation.Two values were examined for the standard deviation of the measurement errors: σ=0.016 that corresponds to 0.5 o C and σ=0.032 that corresponds to 1 o C.
It was assumed that the source term was inserted in a medium made of aluminum with the following thermophysical properties: ρ = 2787 kg/m 3 ; c p = 867 J/kg K; and k = 164 W/mK.A uniform time-varying heat source was applied in a cubic region of side 2 mm, centered with respect to the x and y axes, and at a depth h = 2 mm below the surface.Hence, in dimensionless terms the heat source was applied in the region -0.5 ≤ X ≤ 0.5; -0.5 ≤ Y ≤ 0.5 and 1 ≤ Z ≤ 2, as illustrated by figure 2. For the cases examined involving a constant strength of the heat source, its magnitude was assumed as 18.75 x 10 8 W/m 3 , which corresponds to a dimensionless value of 1.43.Functional forms involving discontinuities (step function) and discontinuities in the first derivative (triangular function) were also examined.For such cases, the functions were supposed to vary from 18.75 x 10 8 W/m 3 to 37.5 x 10 8 W/m 3 (2.86 in dimensionless form).For the solution of the inverse problem, the unknown transient strength of the heat source was supposed constant in order to initialize the iterative procedure of the conjugate gradient method or the Markov chains.The effects of such constant values on the final solu-tions were examined, by assuming initially the source term as 12.5 x 10 8 W/m 3 or 75 x 10 8 W/m 3 , corresponding in dimensionless terms to 0.96 and 5.72, respectively.
The duration of the experiment was taken as 60 seconds, equivalent to a dimensionless final time of τ = 1018.The temperature measurements were supposedly taken at the surface Z = 0, in a grid formed by only 9 pixels, uniformly distributed in the region -1.5 < X < 1.5 and -1.5 < Y < 1.5.The measurements were supposed available with a frequency of 1 Hz, so that 60 transient measurements were available for each pixel.
In the inverse analysis, the numbers of states used in the Markov chains were 20,000 for all functional form cases examined, with 2,000 burn-in states.Such numbers of states were selected based on numerical experiments.
To examine the accuracy of the two estimation approaches under analysis in this paper, three different functional forms for the source term were used, namely: -Functional form with constant intensity: ( , , , ) 1.43.
-Functional form with step variation: For all cases using the MCMC method the value adopted for the center point of the Rayleigh distribution, 0  , was 10.This value was selected based on numerical experiments.The computational times for each of the test cases examined are presented in the table 1.We note in this table that the computational cost is much larger when the MCMC method was used, as compared to the conjugate gradient method.CPU times were obtained using the FORTRAN platform, on a desktop computer with an AMD Athlon (tm) II X2 250 CPU and 2GB of RAM.Table 1 also presents the acceptance ratio of the Metropolis-Hastings algorithm of each test case examined for a constant standard-deviation σ=0.016.We can notice that the acceptance ratios for all cases were very low, although the acceptance ratio was of the order of 50% during the burn-in period.
The Root-mean-square (RMS) error, defined by: G  is the value estimated, while () G  is the exact value of the source term function at time i  , are presented in tables 2 and 3 for = 0.016 and in tables 4 and 5 for  = 0.032.Tables 2 and 4 correspond to an initial guess of 0.96, while tables 3 and 5 correspond to an initial guess of 5.72.For the cases of an initial guesses equal to 0.96, the accuracy of the solutions obtained with the conjugate gradient method are more accurate than those obtained with MCMC method.This behavior becomes also clear from the analysis of figures 3.a to 8.a which present the exact and estimated functions for constant, step and triangular variations, respectively, for initial guesses equal to 0.96 and for = 0.016 and for  = 0.032.On the other hand, for an initial guess equal to 5.72, the solutions obtained with the MCMC method are more accurate than those obtained with the conjugate gradient method, as presented in table 3    b.Initial guess G 0 = 5.72. Figure 8 -Estimation of the source term with a triangular variation and constant standard-deviation σ=0.032.

CONCLUSIONS
In this paper, two different approaches were examined for the solution of an inverse heat conduction problem that involves the estimation of the transient heat source term, located inside a known region in a three-dimensional semi-infinite medium, namely: The Conjugate Gradient method with Adjoint Problem and the Markov Chain Monte Carlo (MCMC) method within the Bayesian framework, by using the Metropolis-Hastings algorithm.These two methods were compared for constant and transient source terms with functional forms containing sharp corners and discontinuities.Simulated temperature measurements were used in the inverse analysis.
The Conjugate Gradient method with Adjoint Problem was capable of providing accurate and stable estimates for the imposed source term, with quite low computational costs, for all cases examined.On the other hand, the computational costs associated with the MCMC method were substantially larger than those for the conjugate gradient method.Although the accuracy of the conjugate gradient method was substantially affected for initial guesses far from the exact values of the sought function, the accuracy of the MCMC method was not affected by the initial guess.Indeed, the accuracy of the solutions obtained with the MCMC method were equivalent for all cases examined, independent of the initial guess and the simu-
)[1][2][3][10][11] [12][14][15][17].For linear inverse problems as the present one, in which the sensitivity problem does not depend on the unknown function, the different versions of the Conjugate Gradient Method are theoretically identical.Thus, it is used in this paper the expression of Fletcher
are distinct states of the distribution of interest. :

Figure 2 .
Figure 2. Investigated region with the source term.

Figure 6 -
Estimation of the source term with a constant functional form and constant standard-deviation σ=0.032.a.Initial guess G 0 = 0.96.b.Initial guess G 0 = 5.72. Figure 7 -Estimation of the source term with a step variation and constant standard-deviation σ=0.032.02 Jun 2012  a.Initial guess G 0 = 0.96.

Table 1 .
and 5 and illustrated by figures 3.b to 8.b.Figures 3.b-8.bcorrespondto the same cases examined in figures 3.a-8.a,but with an initial guess of 5.72.The larger RMS errors of the solutions obtained with the conjugate gradient method, for an initial guess of 5.72, are probably due to the effects of the null gradient at the final time (see equations 16 and 18).As a result, the initial guess at the final time is not modified and oscillations in the solution appear as the final time is approached.This fact is clear from the analysis of figures 3.b-8.b.Test cases examined for constant standard-deviation σ=0.016.

Table 3 .
Values of RMS errors obtained initial guesses equal to 5.72 and constant standarddeviation σ=0.016.

Table 5 .
Values of RMS errors obtained initial guesses equal to 5.72 and constant standarddeviation σ=0.032.