ADAPTIVE NUMERICAL SIMULATIONS OF A TURBULENT JET

This work is concerned with assessing the performance of a numerical method, which combines an adaptive mesh refinement technique, an implicit-explicit time stepping strategy, and a linear multilevel-multigrid methodology, when applied to a challenging reallife problem: a three-dimensional turbulent jet flow. Typically, whenever a moving fluid emerges from a narrow opening into an otherwise quiescent fluid, shear is created between the entering and the ambient fluids, causing fluid instabilities, turbulence, and mixing at downstream. Turbulent jets represent an important class of fluid flow phenomena which occurs in many instances both in environmental and in industrial applications such as waste water discharges into rivers, plumes from smokestacks, and flames on combustion nozzles. Mathematically, the fluid dynamics is modeled by the non-steady Navier-Stokes equations for a three-dimensional incompressible flow whose material properties vary. The turbulence modeling is given by the large eddy simulation approach for which a careful selection of the Smagorinsky constant is performed. To resolve accurately and efficiently sharp gradients, vorticity shedding, and localized small length scale flow features (e.g. the ones present in high turbulence regions), dynamic adaptive mesh refinements are employed which form a level hierarchy composed by a set of nested, Cartesian grid patches (block-structured grid). That spatial adaptation is used in conjunction with a variable time step, linearly implicit time integration scheme, based on a semi backward difference formula (SBDF), especially designed to work with the non-linear diffusive term arising from the turbulent viscosity. The NS solver is based on an increment-pressure projection method. Information on how often the mesh adapts itself, on the number of computational cells in use, on the stability and size of the integration time step, and on the behavior of the multilevel-multigrid solvers is collected, showing the performance and testing the capabilities of the overall methodology.


INTRODUCTION
Adaptivity is an important component to be considered for efficient numerical solutions of partial differential equations.Many techniques appeared through the years and, nowadays, the term "adaptive mesh refinement" (or simply AMR) embraces an entire collection of approaches which spread to a variety of different application fields [12].However, the key idea behind of all those approaches is still the same: to concentrate computational power where it is most needed by increasing the resolution in space on regions of special interest in the computational domain.As an example, in Computational Fluid Dynamics, one needs more grid resolution where there are immersed interfaces/bodies to describe accurately intricate geometry details, where there is high vorticity/turbulence, around boundary layers, and in the vicinity of other flow features of special interest (and with highly localized span).
In the present work, a combination of an AMR strategy [2,3,4,16] with an implicitexplicit (IMEX [1]) projection method [10,14,17] have been applied to solve the equations modeling an incompressible turbulent jet.Section 2 presents the mathematical model which describes the problem, Section 3 the numerical methodology in use, and Section 4 presents the numerical results for several performance tests designed to expose the strengths and weaknesses of the approach adopted.Section 5 concludes the text with remarks on the pros/cons, pointing to directions to improve even further the efficiency of the overall methodology.

MATHEMATICAL MODEL
Mathematically, the fluid dynamics is modeled by the non-steady Navier-Stokes (NS) equations for a three-dimensional incompressible flow whose material properties may vary, where ρ is the specific mass, the vector (u 1 , u 2 , u 3 ) is the fluid velocity, p is the pressure, τ ij is the viscous tensor, with µ the dynamic viscosity (molecular plus turbulent terms, detailed next).In (2), f i represents the sum of all external forces acting on the fluid.Note that the summation convention is in use.The mathematical model ( 1)-( 2) requires boundary and initial conditions which will be detailed in Section 4. From here on, ρ (but not µ, needed in the turbulence modeling) will be assumed to be a constant.The turbulence modeling is given by the large eddy simulation (LES) approach, which explicitly compute the largest structures of the flow (typically, structures larger than the finest computational mesh size), modeling the influence of the smaller scales [13].The state variables in (1)-( 2) are filtered, that is, decomposed into a sum of a resolved component plus a residual (or subgrid scale (SGS)) component.The filtered equations can be written in the standard form, with (2) containing the residual-stress tensor that arises from the residual motions.The closure of the set of equations is obtained by modeling the residual-stress tensor by Smagorinsky model [13], for which the turbulent viscosity is where ∆ = (∆x 1 ∆x 2 ∆x 3 ) 1 3 is the filter size, S is the characteristic filtered rate of strain given by and C s is known as Smagorinsky constant.Its value is chosen between 0.1 to 0.18 [13].

Discretization in time
Discretization in time is performed by a variable time step, linearly-implicit time integration scheme based on a second-order, two-step Gear's Method [10,17] which is capable of handling the non-linear diffusive term arising from the turbulent viscosity.The idea is to rewrite (2) as where fi = and λ is an ad hoc constant chosen through numerical experimentation.Based only on Gear's Method, all terms in the right hand side of (6) should be treated implicitly.However, here, fi is extrapolated in time.The resulting time integration scheme is where a 0 = ∆t 2 /(∆t 0 ∆t 1 ), a 1 = −∆t 1 /∆t 0 , and a 2 = (∆t 0 + 2∆t)/∆t 1 , b 0 = −∆t/∆t 0 and b 1 = ∆t 1 /∆t 0 , with ∆t = t n+1 − t n , ∆t 0 = t n − t n−1 , and ∆t 1 = ∆t 0 + ∆t.
The pressure-velocity coupling present in ( 7)-( 8) is treated with an increment-pressure projection method [7,10,14], such that one solves in turn the equations where u ⋆,k i is a "preliminary" velocity field obtained from the diffusion equation ( 9), and q is the pressure increment obtained from ( 10)- (11).Together, they give rise to an elliptic equation for q, being responsible for enforcing the incompressibility constraint.Once q is determined, updates for the pressure and velocity fields are given by p n+1,k = p n+1,k−1 + q, and by u n+1,k i = u ⋆,k i − (∆t/a 2 ρ) ∂q/∂x i .Note that the iteration in k is needed to split the solution into two parts, one for the velocity and one for the pressure increment.The number of iterations per time step is fixed to two (that is, k = 1, 2), which is enough to get a second-order scheme in time, and p n+1,0 = p n is the initial approximation taken for the pressure.
To complete the description of the integration in time, a word is needed on the timestep stability constraints.From numerical experimentation, λ is taken in (6) such that one succeeds in relaxing the parabolic stability constraint induced by the viscous dissipation term (for the jet considered here, λ = 2||µ|| ∞ has proven to be sufficient).In this context, the only stability constraint left comes from the explicit discretization of the advection term,

Discretization in space
Discretization of the flow domain follows closely the adaptive mesh refinement technique first proposed by Berger and Colella [2] with grid adaptation given by Berger and Rigoutsos' Algorithm [4].Such technique is based on a grid structure given by a set of nested, Cartesian grid patches which forms a level hierarchy, usually referred to as composite grid.Grid patches obey certain rules, targeting for easiness in their construction and efficiency in their use: 1. a fine grid starts and ends at the corner of a cell in the next coarser grid, and 2. all fine grid cells at level l must be surrounded either level l cells or by level l − 1 cells except when it touches the border of the physical domain.Note that, differently from [2], in the implementation being used here [10], grid patches are disjoint from each other, that is, given two grid patches belonging to the same hierarchical level, they do not share interior computational cells.Figure 1 shows an example a threedimensional composite grid and Figures 2(a)-2(b) show the grid patches that compose this three-dimensional mesh.It is important to highlight that it is an essential part of this technique the existence of ghost computational cells, which form layers around each grid patch (see a simple 2D example given by Figure 3).Ghost cells furnish boundary conditions from polynomial interpolations of values at neighboring coarse/fine levels.Another important difference regarding the original work by Berger and Colella [2] is that, here, there is no time-step refinement: the numerical solution of the problem, on all grid patches, in all levels, evolves with the same time step from the finest level.The reason is that, for incompressible flows, discontinuities have no finite speed of propagation since the incompressibility constraint couples the solution (through the pressure field) at every point of the domain at every instant of time (hence, no subdomain may evolve in time separately).On the composite grids, variables are placed in a "marker and cell" (MAC) fashion: scalars at the cell centers (e.g.specific mass and viscosity), and vectors have their components at cell faces.Standard second-order finite difference operators are employed in the discretization of gradient, divergent, and stress tensor differential operators, including for the nonlinear transport term [10,11].

Linear multilevel-multigrid method
Once the discretizations in space have been introduced, the pressure-increment projection method in use requires the solution of eight linear systems per time step (four for k = 1 and four for k = 2): six of them are of parabolic type (for the velocity components), and two are of elliptic type (for the pressure-increment).To solve these linear systems, an in house implementation of a multilevel-multigrid method is employed [6,10,14,15].Here, "multilevel" refers to the fact that refinement levels belonging to the grid structure are also considered to be multigrid levels.It is important to recall that refinement levels differ from usual multigrid levels in that they do not necessarily cover the entire domain of computation.The multilevelmultigrid used was the V-cycle with convergence criterium O(∆x 2 ) where ∆x = min{∆x i } is the mesh size, with 1 ≤ i ≤ 3. Details of implementation can be found in [10,11].

The flow domain is given by the Cartesian product
, a parallelepiped, where a 1 = a 2 = a 3 = 0, b 1 = b 3 = 0.288 m, and b 2 = 0.576 m.Initially, the fluid is at rest with null pressure, and the inflow boundary contains in its center a main jet with diameter d 0 = 7.2 mm which is surrounded by a secondary jet whose diameter is d 1 = 18.2 mm.More specifically, the inflow boundary conditions for the velocity are given by u 1,in = u 3,in = 0, and u 2,in = u d 0 + u d 1 , with where r j = (x 3 − 0.5 The computation employs a composite grid having a base level with 32 × 64 × 32 computational cells and three refinement levels (four levels in total).The finest level has ∆x 1 = ∆x 2 = ∆x 3 = 1.125 × 10 −3 m.The refinement criterium is based in the maximum norms of the viscous tensor and of the turbulent viscosity, constrained to a 85% efficiency (i.e. the ratio between the number of cells in need of refinement over the total number of cells that end up being refined to get a grid patch).A detail of the flow on the plane x 2 = 0.432 m and the refined mesh employed can be seen in Figure 4     The velocity decay along the centerline of the computational domain for turbulent jets [5,9] is given by where u c is the velocity in the flow direction along the centerline, u d 0 is the velocity of the main jet, and the parameters B u and x 0 are 5.8 and 4d 0 , respectively [5].The expression above is obtained from velocity measurements in a turbulent jet [9]. Figure 6(a) compares the decay obtained in the present work with the decay given by ( 14).Next, several performance tests aiming to expose strengths and weaknesses of the methodology adopted, applied to the turbulent jet flow described above, are reported below.Tests are performed to evaluate the linear multilevel-multigrid method, the AMR and timestep strategies, and other implementation aspects.
Regarding the multigrid methodology, by timing individual parts of the code, it is observed that 69.5% of the computational time in each time step is spent in resolving the pressure correction equation, an ill conditioned elliptic equation.Typically, 15 V-cycles are performed for elliptic equation.Relaxing the multilevel-multigrid convergence criterium to O(∆x), 12 V-cycles are performed.By the few difference it was decided to maintain the convergence criteria of O(∆x 2 ), as described in Section 3.3.
Regarding the discretization in space, to fairly compare the AMR approach to the uniform mesh approach, one must estimate how the performance on a uniform mesh would be, if that uniform mesh had spacing equal to the finest level in the AMR grid (to achieve the same accuracy as on the AMR grid).Initially, taking few integration time steps on both meshes, in the same machine, revealed that the AMR approach can be 190 times faster and in the final time steps (when the jet is well developed), 5 times faster.The advantage of the AMR approach over the uniform grid approach is directly connected to the number of computational cells used.Note that only a small percentage of processing time is spent in managing the AMR grid (e.g.switching grids, initializing).Typically, less the 2% of that time is spent.Attempts made to decrease the computational time by relaxing the CFL time-step constraint ( 12) (e.g. by switching from u i ∞ to u i 2 ) led to unstable solution.In few integration time steps the residue in the multilevel-multigrid decreases slowly reaching the maximum V-cycles number.In this case the ∆t = O(∆x 0.73 ).
Regarding compilers, the computation time of a single time step employing both the gnu (version 4.4.4) and intel (version 11.0) compilers using the flag "-O3", on an Intel based machine (Intel Core 2 Quad processor 2.33GHz and 16Gb RAM) are measured.The best performance is achieved by the gnu compiler, which has proven to be 1.2 times faster than the intel compiler, for the test case considered.Profiling the code using the flag "-pg" and executing gprof, it is possible to identify the parts of the current implementation which spend most of the processing time.The results show that the set of subroutines/functions involved in the computation of ghost cell values is the one called the most, spending about 43% of the cpu time.One reason is the large number of grid patches needed when the jet is well developed (see the Figure 2(b)).To increase efficiency, by decreasing the number of patches while keeping the jet region well resolved, a grid patch composed by a single parallelepiped along the flow direction (a "tube") is employed to replace a large set of smaller grid patches, thus decreasing the number of calls needed to set up ghost cell values.Table 1 contains the number of cells used on AMR, on AMR (with a tube), and on uniform meshes.The column "Total" counts all kinds of cells employed (that is ghost, interior, on multigrid levels below base level, and underneath finer levels), the column "Refinement levels" excludes those at multigrid levels below base level, and the column "Visible" also excludes those covered cells, underneath refinement patches.Employing the tube strategy, the time spent in a single time step is 1.27 times faster than the original strategy, spending about 63% in solving for the elliptic equation (increment pressure equation).

CONCLUSION
The performance of a numerical methodology based on an adaptive mesh refinement technique coupled with a semi-implicit time integration scheme is assessed by conducting a series of numerical experiments while solving a turbulent jet flow.Aspects regarding the multilevel-multigrid methodology employed, the use of AMR and time step scheme, and other implementation aspects are explored.One of the strongest sides is the capability of achieving high grid resolution locally, only around regions which bares special interest.If those regions grow in time, the number of grid patches increases and, beyond a certain point, that same capability becomes one of its downsides: the overload of computer time at coarse-fine grid interfaces (interpolation at ghost cells).Numerically, the multigrid methodology employed requires stop criteria on the order of O(∆x 2 ), and typically 15 V-cycles are performed for elliptic equation (pressure increment) and 4 V-cycles, for parabolic equations (velocity components).The time spent in the solution of the linear system of the elliptic equation is bottleneck of the code.At least that solution should be performed in parallel.The linear CFL time step constraint results, typically, ∆t = O(∆x 1.65 ).Both intel and gnu compilers have been tried out, with gnu performing better for the current code implementation and for the test under consideration.
9 m/s, v 1 = 11.4 m/s, and the main jet velocity v 2 = 49.6 m/s, where r 1 e r 0 are, respectively, d 1 /2 and d 0 /2.At the inflow boundary, the turbulence is modeled by 0.01 ω u 2,in , with ω being a random number in [0, 1] and homogeneous Neumann boundary condition is adopted for the pressure.At the other computational boundaries, one has homogeneous Neumann boundary conditions for the velocity and Dirichlet boundary conditions for the pressure increment[5].The Reynolds number for the simulation being considered is approximately 2.0 × 10 4 .
(a) and Figure 4(b), respectively.Figure 5(a) shows the component of the velocity along the flow direction on the plane x 1 = 0.144 m with a zoom of the refined mesh (Figure 5(b)).
(a) Velocity in the flow direction.(b)Refined mesh.

Figure 4 .
Figure 4. Detail of the velocity along the flow direction with its refined mesh in the plane x 2 = 0.432 m.
(a) Velocity in the direction flow.(b) Zoom of the box in the Figure 5(a).

Figure 5 .
Figure 5. Velocity along the flow direction with the refined mesh on the plane x 1 = 0.144 m.

Figure 6 (
b) shows the turbulent spectrum and the decay at − 5 3 slope.In the turbulence model, the Smagorinsky constant was set to be C s = 0.15.* Hussein/Boersma (14) x 2 /d 0 uc/u d 0 -o Present work (a) Velocity decay in the x 2 −centerline.

k E 11
(k) (b) The turbulent energy cascade in the point (0.144 m, 0.324 m, 0.144 m).

Figure 6 .
Figure 6.The velocity decay in the centerline and the turbulent energy cascade.