A STUDY ON MAINTENANCE TASK INTERVAL OPTIMIZATION BY METAHEURISTICS

The maintenance is considered one of the strategic factors for a complex system's high productivity. In the aeronautical industry, the system's safe operation, availability, and costs are directly affected by the maintenance strategy established during the product development phase. Hence, the maintenance analysis needs to be effective in assuring that the product will achieve the required operational performance. The Reliability-Centered Maintenance (RCM) analysis applicable to aircraft systems provides a well-defined logic and rules to evaluate the consequences of the system functional failures and to define the tasks that are applicable and effective, based on the failure cause characteristics. However, it is not enough to perform an accurate assessment of the task interval and its effectiveness. Thus, additional guidelines and a multi-objective model to support the optimization of the maintenance task intervals are worthwhile. This paper develops a model for a problem frequently encountered during the maintenance analysis and evaluates the use of different meta-heuristics to define the best allocation of the maintenance task interval, aiming to minimize the system maintenance cost without compromising the system safety requirements. As an example, it is considered a system comprised of four components, including a dual redundancy and hidden failures. The optimization algorithm considers the minimum and maximum allowable intervals for each item, and other relevant factors, such as the preventive and corrective maintenance costs, as well as, the probability of system total failure. Future research will analyze the influences of aircraft fleet profiles and the use of prognostic health systems in maintenance optimization.


INTRODUCTION
Maintenance is a strategic integrated logistic support element that contributes to the success of a product, ensuring its safe operation with maximum availability and minimum cost throughout the product life cycle.
Safety, reliability, and Life Cycle Cost (LCC) requirements are directly affected by the maintenance tasks established during product development. The intervals between maintenance stoppages, the number of resources and times involved in each activity are key parameters that will affect the system availability and maintenance costs. During the development of a new system usually, normally all the information necessary for the correct definition of the initial maintenance intervals are not available, and this implies in the selection of more conservative task intervals, that will need revisions after long periods of operation. The efforts to evaluate the efficacy and to evolve the initial maintenance task intervals, which generally occur after 5 to 10 years of service, have demonstrated that the implementation of an optimized maintenance program from the beginning of operation should bring significant gains to the system operability and costs. In the aircraft industry, the maintenance costs range approximately from 10% to 16% of typical airline direct operational (PERIYARSELVAM et al., 2013). The challenge to the maintenance analysts is to define the appropriate maintenance strategies and intervals during the product definition and design phases. The MSG-3 (ARLINES FOR AMERICA, 2015) methodology is a useful tool used in the aviation industry to define the initial scheduled maintenance requirements for a new product. Nevertheless, it only gives a general guideline to determine the initial maintenance interval, and most of the cases the results of the analysis are based on the experiences of analysts on a similar system. Thus, a multi-criteria and objective method to be used in the integrated product development process is desirable to refine the scheduled maintenance task definition.
The model should consider the trade-off analysis between product safety, reliability, and maintenance program requirements and associated costs. The industry has made several efforts to improve the methods and to develop tools to help define maintenance tasks and intervals. Moreover, many researchers have been studying the maintenance optimization subject and publishing papers about reliability analysis and maintenance task interval optimization. Most of the studies focus on the oil and energy industries and include methods that use a hidden Markov process (ALEBRANT MENDES et al, 2014), Genetic Algorithm (LAPA et al, 2006) and Ant Colony Optimization (ABRAHÃO and GUALDA, 2006) metaheuristics, mean fractional dead time concept (AHMADI and KUMAR, 2011). Due to the non-linearity nature of the model, no linear programming solution could be adopted.
Despite bringing valuable contributions, there could be opportunities to improve those methods considering the technology growth, e-Maintenance concepts, and aeronautical industry objectives. This paper aims to evaluate the use of metaheuristics as an auxiliary tool in the maintenance optimization process to minimize the total maintenance cost of a system, considering several influence parameters including system architecture, redundancies, components reliability, task data, and safety requirements.

METHODOLOGY
The study in this paper evaluates the use of metaheuristics to define the optimum set of maintenance task intervals to be applicable in a complex system that incorporates several components in its architecture. Firstly, it was performed a bibliography review of problems cited in academic literature, and the task interval definition problem described by Deschamps (DESCHAMPS and CATTEL, 2014) was selected since it deals with a situation that involves the interface between the maintenance and safety analysts. Additional considerations that are part of the analysis performed during the maintenance program and certification maintenance requirements development were included to complete the scenario of study. The model included aspects of the safety margins required by the system safety assessment, consideration of system redundancies and exposure to a hidden failure, the minimum and maximum interval limits imposed by components characteristics or customer requirements, and finally the cost of preventive and corrective maintenance actions.
In this study, we assumed that all system components failure fits an exponential distribution and that the system comprises a dual redundant function where the back-up component becomes operative just after the detection of a failure in the main element.
The objective function was defined to minimize the total life cycle preventive and corrective maintenance costs by setting the best interval for each component without exceeding the limits imposed by the safety assessment.
The problem was modeled using the open source Code Block, a C++ language environment, and the tests run in the LOF-MH (LEV Optimization Framework -Metaheuristics) framework (SABA, 2019) using different metaheuristics and parameter settings. It was performed more than a hundred tests by using the Tabu Search, Simulated Annealing, Particle Swarm Optimization, and Black Hole meta-heuristics.
According to the results presented in section 6, it was demonstrated that the use of meta-heuristics together with complementary analysis, can effectively enhance the task interval definition and maintenance program development and help the work of maintenance analysts in the conception, development, and establishment of the final maintenance plan to the operators.

MAINTENANCE
According to (KINNISON, 2004), maintenance is the process of ensuring that a system continually performs its intended function at its designed-in level of reliability and safety, It includes all actions necessary for retaining a system or product in, or restoring it to, a desired operational state (BLANCHARD et al, 1995).
The retention and restoration are denominations that can lead to two main types of maintenance (MÁRQUEZ, 2007), scheduled preventive and unscheduled corrective maintenance ( Figure 1)

Preventive Maintenance
The preventive maintenance includes all scheduled maintenance actions performed to retain a system or product in a specified operational condition. Scheduled maintenance covers the following types of tasks accomplished at specified intervals, to prevent deterioration of the inherent safety and reliability levels of the system (ARLINES FOR AMERICA, 2015). Table 1 presents the primaries objectives of preventive maintenance and the related MSG-3 task types that attend the preventive action necessary to comply with the objective. Operational and Visual Checks

Corrective Maintenance
It includes all unscheduled maintenance actions performed as a result of system failure to restore the system to a specified condition. The corrective maintenance is intrinsically more expensive than the preventive one, since it can occur at any time and place, and frequently requires the failure identification and verification (based on some symptom), localization and fault isolation, disassembly to gain access to the faulty item, removal and replacement with a spare or repair in place, reassembly checkout, and condition verification.

Maintenance requirements for systems with redundancies
The use of redundancies is one of the means to reduce the probability of failures that leads to a total loss of system function. Due to their cost, they are usually employed only for critical functions and systems (HECHT, 2004). In the aeronautical industry, the use of redundancies is an alternative design solution to comply with the certification requirements. Following the criteria established on FAR 25.1309 (FEDERAL AVIATION ADMINISTRATION, 1992), it is required a system safety assessment to assure that risks are identified and appropriately managed within established limits, which are defined according to the evaluation of hazard level considered. Hazard, is a potentially unsafe condition resulting from failures, malfunctions, external events, errors, or a combination thereof. According to MIL-STD-882 (DEPARTMENT OF DEFENSE, 2000), it is any real or potential condition that can cause injury, illness, or death to personnel; damage to or loss of a system, equipment or property; or damage to the environment. The required availability of function depends on the hazard level identified, and the acceptable risk for the event considered, in case of function loss (SOCIETY OF AUTOMOTIVE ENGINEERS, 2010).
Hidden failures regularly appear in a redundant system that includes two or more components where k out of n components are alternative means to accomplish a required system function in case of failure or degradation of the primary means. Systems that are subject to hidden shortcomings need to be checked, by a failure finding tasks at a specified period, to assure the adequate availability of back-up function.
For aircraft system certification, the demonstration of compliance with safety objectives requires the development of several levels of analysis, including Functional Hazard Analysis (FHA), Failure Mode and Effects Analysis (FMEA) and Fault Tree Analysis (FTA). The FTA is a graphic model to represent the various parallel and sequential combinations of faults that can lead to the occurrence of the predefined undesired event (KANČEV and ČEPIN, 2011).
The FTA uses logical gates to integrate the primary events to the top event. Thus, it joins the qualitative analysis and probabilistic quantitative analysis to demonstrate compliance with requirements. Usually, redundant items are subjected to a hidden failure and require a failure finding check at periodic intervals (T) specified on FTA.  This interval depends on the architecture of the system, component failure rate, and average probability per flight hour required according to the hazard classification of the top event under analysis. Typically, this interval (T) of failure finding inspection is the maximum allowable for certification but does not mean the optimum interval. These tasks resulting from the safety analysis may be considered a certification maintenance requirement (CMR), which is a limitation of type certification, depending on the evaluation performed by a dedicated certification maintenance requirement committee (FEDERAL AVIATION ADMINISTRATION, 2011).
The Maintenance Review Board process (FEDERAL AVIATION ADMINISTRATION, 2012) occurs in parallel to the safety analysis process and aims to develop the initial minimum maintenance plan to assure the aircraft continued airworthiness. This development has the MSG-3 methodology as a baseline and, besides the safety considerations, the MSG-3 also considers operational and economic consequences of failure. It is not only a probabilistic assessment, but it includes the evaluation of maintainability and maintenance characteristics of each significant item and additionally includes the field experience acquired by operators, manufacturers, and authorities. Thus, the resulting task interval proposed can be different from the maximum allowable calculated in the certification process. As regarding the hidden-failure, the MSG-3 analysis defines that to be applicable, the failure finding task must determine if the item is fulfilling its intended purpose. These tasks are Operational (OPC) or Visual (VCK) checks and do not require quantitative tolerances. Nevertheless, other types of tasks can be defined, and in the case of safety consequences, the maintenance analysis group can choose a combination of different tasks as one viable means to ensure adequate availability of the hidden function and thus reducing the risk of multiple failures

METAHEURISTICS
Metaheuristics have been used extensively to solve several optimization problems, where the evaluation of alternatives and the determination of an optimal or at least suboptimal solution is an essential but challenging task.
An optimization problem can be solved, using an exact or stochastic method, being the last one subdivided into approximate and heuristic methods. Metaheuristics are stochastic and not specific heuristic methods, and defined as guided search strategies in the development of fundamental heuristics to solve specific optimization problems (PASSARO A, 2019b). Metaheuristics are algorithms inspired by biological, social or ethnological behaviors or physical phenomena, The study of metaheuristics and the proposal of new metaheuristics have been advancing, as they allow that most of many types of real-world problems, with a large number of variables, be solved satisfactorily in polynomial time (PASSARO, 2019b) The single-solution based metaheuristics are memory-oriented algorithms, where each iteration improves a single solution by favoring a local search. Population-based metaheuristics make extensive exploration of research space using a set of individuals and refining them with each iteration to improve the search for a global optimum (PASSARO, 2019b). Besides the number of metaheuristics used nowadays, they differ only in their source of inspiration, the diversification and intensification strategies, parameters to be adjusted, and in the evolution mode.
The study in this paper used two single-solution and two population-based metaheuristics, available on the LOF-MH framework:

Tabu Search
Single solution-based metaheuristic originally proposed by Fred Glover in 1986, to allow the Local Search method to overcome local optima. It uses a Tabu Queue restriction, managed by the short-term memory process, to prevent the reversal, or repetition of certain moves (GLOVER et al, 1993) (GENDREAU and POTVIN, 2018).

Simulated Annealing
It is also a single solution-based metaheuristic, inspired by the physical process of cooling fluids. This search strategy has been established in the 1980s by Kirkpatrick and Cerny independently from each other (ZÄPFEL; BRAUNE; BÖGL, 2010). The goal of the cooling process is the alignment of atoms in the most regular possible crystalline structure, by slowly decreasing the temperature of metals submitted to a high temperature. The actual formulation of Simulated Annealing as a heuristic optimization strategy is based on the Metropolis algorithm originated in the statistical mechanic's area. It simulates a thermodynamic system by creating a sequence of states or configurations at a given temperature. The value of the objective function to be optimized is related to the energy of the system (PASSARO, 2019a). In high energy systems (or in a high temperature phase of the meta-heuristic) more diversification is present, while in lower energy, more intensification is applied to the search.

Particle Swarm Optimization (PSO)
The idea of PSO stems from biology, where a swarm coordinates itself in order to achieve a goal. This analogy was transferred by James Kennedy and Russel Eberhart, a psychologist and an engineer respectively, to optimization heuristic. The idea is that each individual searches for the best position or goal, based on the knowledge acquired by the group or by itself, in the previous iteration (ZÄPFEL et al, 2010).

Black Hole (BH)
This populational metaheuristic, inspired on the black hole phenomena, was originally proposed by Zhang in 2008, but it corresponded only to a PSO algorithm with some variation to attract the swarm. It was revisited and improved by Hatamlou in 2013, to consider the Schwarzschild radius from which not even light can escape the gravitational pull of Black Hole (PASSARO, 2019c). The algorithm starts with a population of stars (potential solutions), generated at random. Then, the objective function value of each of these stars is calculated, and the one that presents the best value is then a Black Hole. Also, the region of attraction is defined. Every star that is close to it at a distance less than the radius is destroyed, and then a new star is randomly generated. This process repeats until a stopping criterion is met. The main difference between PSO and BH is related to the mechanism of star destruction, which has as its primary objective to prevent particles from accumulating around a local minimum and, therefore, there is the possibility of better exploration of the search space.

PROBLEM STATEMENT AND FORMULATION
This paper evaluates a problem faced during the maintenance program development that is to harmonize and optimize the tasks that are originated by the MRB and Safety assessment processes, as described in section 3. The problem studied was partially reviewed by Deschamps (DESCHAMPS and CATTEL, 2014) where the authors point out important topics to be considered in the analysis. Besides, in this paper, we evaluate the system configuration and component interface, as well as the probability of failure and costs of corrective maintenance.
The interval optimization should consider the cost of preventive maintenance versus the cost of system failure and corrective actions. Figure 3 shows a typical change in the total cost as a function of the frequency of the preventive task. These intervals should attend the limits established by the maintenance and certification requirements. They should also comply with the minimum probability of system failure according to the consequences of failure.
The total cost of a single component is the sum of preventive maintenance cost (Cp), and corrective maintenance cost (Cr) predicted for all maintenance interventions during its operational life (Figure 4).

Figure 4 -Inspection Cycles
The cost of preventive maintenance includes the cost of labor and material needed for the inspection, while the expenses for corrective and repair actions include labor and material costs, and in addition, the cost of unexpected failure consequences: aircraft production loss, delays, more complicated maintenance actions, and eventually a cost of accident due to multiple failures.  The predicted cost of corrective maintenance between two consecutive k-1 and k inspection is proportional to the failure probability density function f(t). In summary, the problem consists in defining the best task interval Tpi for each component that minimizes the objective function Subject to the following restrictions: The task interval Tpi should not exceed the minimum and maximum limits defined by system requirements.
-Q(t) ≤ The maximum failure probability required.

TESTS AND RESULTS
The problem used as an example to evaluate the metaheuristics is a hypothetical fuel feeding system comprising a pump and two redundant valves that control the fuel flow on the feeding line. The monitoring and controlling computer manages the operation of valves. (Figure 5). For each component was considered its interval range and preventive maintenance costs adopted in the previous study by (DESCHAMPS and CATTEL, 2014) and presented in Table 2. Moreover, a maintenance corrective action cost was assumed: This study considers that the components fit an exponential distribution of failure and the event of failure logically represented by the fault tree in Figure 6. To perform the test was used the Lof-Mh framework (SABA, 2019), with a code in C++ Programming, representing the problem model. This code file interfaces with the Lof-Mh framework that provides means to test several metaheuristics and help to solve the task interval allocation problem. The tests were divided into two parts, as demonstrated in the next paragraphs.
• Part I -Single Solution Based Metaheuristics Tests  The best results for the problem was found by the simulated annealing metaheuristic using an initial temperature parameter of 5000 and an alpha of 0,95. One test using the Tabu Search did not found an acceptable solution since it did not comply with the failure probability restriction values.
Additionally, two more tests were done using the single solution-based metaheuristics, with different setting parameters, as shown in Table 4. The population-based metaheuristics tests presented a better performance for this type of problem. The lowest objective function value, of all tests, was found by the Particle Swarm Metaheuristic with the parameters depicted in Table 5. The tests with the populational metaheuristic rapidly converged to the optimal minimum value of the objective (fo), keeping the intervals at the maximum limits, except for the component C. The results also show an optimization on the use of safety margin ( Figure  7).

CONCLUSION
The tests on this study show that the metaheuristics can be used to help the maintenance engineer in the allocation of maintenance task intervals of complex system components. The case study considered a four-component system that was conceived only for this analysis. Nevertheless, it brought a lot of details faced during the development of a maintenance plan, as part of the integrated logistic support activities. Besides improvement on the tradeoff evaluation between the preventive and corrective maintenance costs, the test demonstrated the importance of jointly assessing the influence of component task intervals on the system safety margins and the final cost of maintenance.
As regarding the metaheuristics, practically all of them were able to find an optimum solution, in a time considered acceptable for the problem, but the Particle Swarm Optimization metaheuristic presented the best results in this research.
The model should be revised to include a more effective penalization in the objective function for better usage of the safety margin. Additional consideration of other factors that could influence the interval determination should be made to complement this research: -Systems with components subject to aging, increase of the failure rate, fitting a Weibull distribution. -Fleet utilization and missions -Maintenance task packaging -Inclusion of availability goals -Dynamic change on component state -Health monitoring capability Complementary study and metaheuristic`s tests will be performed with the inclusion of proposals mentioned in this paper.