ISO 26262-Compliant Safety Analysis Methods

Yuzhu Yang 杨玉柱 & Prof. Dr. Mirko Conrad

The development of safety-related electrical and electronic (E/E) systems in the automotive industry is predominantly associated with functional safety. The crucial aspect of functional safety system development is the safety analyses in compliance with the ISO 26262 standard. The ISO 26262 standard provides recommendations for the methodology of safety analyses. What analytical methods are available for automotive safety-related systems? How to classify and apply of these methods effectively in the context of safety system analyses to support the ISO 26262-compliant product development? This article addresses these questions, and introduces the analytical methods used in the development of safety-related systems in the automotive industry, along with the best practices.

The Purpose of Safety Analyses

Starting with the purpose of safety analyses in the automotive industry: why is it necessary to conduct safety analyses when developing safety-related automotive E/E systems?

ISO 26262 standard defines functional safety as: "absence of unreasonable risk due to hazards caused by malfunctioning behavior of E/E systems"[ISO 26262:2018]. Typically, the malfunctions in E/E systems are caused by two types of failures:

  • Systematic failure: Failures that are definitely related to a cause and can only be eliminated by modifying the designs or manufacturing processes, operating procedures, documents, or other associated elements.
  • Random hardware failure: Failures that occur unintentionally within the life cycle of hardware components and follow a probabilistic distribution.

Consequently, the aim of safety analyses is to ensure that the risk of safety goal violations due to systematic or random failures is sufficiently low.

It is important to note that, according to ISO 26262, the analysis of systematic failures does not discuss their occurrence probability. However, measures against systematic failures help to reduce the overall risk of safety goal or requirement violations.

Scope of Safety Analyses

The scope of safety analyses includes:

  • Validation of safety goals and/or safety concepts
  • Verification of safety concepts and/or safety requirements
  • Identification of conditions and causes (including faults/failures) that could lead to the violation of a safety goal or safety requirement
  • Identification of additional safety requirements for detection of faults/failures
  • Determination of the required responses to detected faults/failures
  • Identification of additional measures to verify that the safety goals or safety requirements are complied with

Implementation of Safety Analyses

Depending on the specific application, safety analyses can be conducted through:

  • Identify new hazards not previously identified during the HARA analysis
  • Identify faults or failures that can lead to violations of a safety goals/safety requirements
  • Identify potential causes of faults/failures
  • Support the definition of safety measures for fault prevention/fault control
  • Provide evidence for the applicability of safety concepts
  • Support the verification of safety concepts, safety requirements
  • Support the verification of design and test requirements

For the related items to be analyzed, based on their safety concepts, safety goals are derived from Hazard Analysis and Risk Assessment (HARA) analysis, and the safety requirements are subsequently established. Further consideration is given to potential faults or failures, leading to the determination of additional safety requirements for detecting these faults and failures. Then, according to detected faults or failures, the following processes or measures are determined. Finally, additional measures are determined to verify whether the implemented safety measures meet the corresponding safety requirements and/or goals.

Introduction to Safety Analysis Methods

Qualitative and Quantitative Methods

1. Qualitative Safety Analysis Methods

The methods of qualitative safety analysis primarily include:

  • Qualitative Failure Mode and Effect Analysis (FMEA) at the system, design, or process level
  • Qualitative Fault Tree Analysis (FTA)
  • Hazard and Operability Analysis (HAZOP)
  • Qualitative Event Tree Analysis (ETA)

Qualitative analysis methods are particularly suitable for software safety analyses where no other specific methods are appropriate.

2. Quantitative Safety Analysis Methods

Quantitative safety analysis methods complement qualitative safety analysis methods and are primarily utilized for evaluation based on hardware architectural metrics and random hardware failure rates. The resulting safety goals violate the defined objectives of the assessment to validate the hardware design (please refer to ISO 26262:2018, 5 and 8). Quantitative safety analysis requires additional information on quantitative failure rates of hardware components.

The methods of quantitative safety analyses primarily include:

  • Quantitative FMEA Analyses
  • Quantitative FTA Analyses
  • Quantitative ETA Analyses
  • Markov Model
  • Reliability Block Diagrams (RBDs) Analyses

3. Differences and Connections Between Quantitative and Qualitative Analyses

This section clarifies the relationship and differences between quantitative and qualitative analyses.

3.1 Differences Between Quantitative and Qualitative Analyses

The difference is that quantitative analyses predict failure rates, whereas qualitative analyses identify faults but do not predict their failure rates. Qualitative safety analysis methods are general and can be applied at the system level, hardware level, and software level. Conducting a quantitative safety analyses require additional knowledge of the quantitative failure rates of the relevant hardware components. In ISO 26262, quantitative analyses are used to validate the assessments of hardware architectural design metrics that evaluate violations of safety objectives due to random hardware failures.

3.2 Connections Between Quantitative and Qualitative Analyses

Both methods rely on understanding relevant failure types or failure modes. Quantitative safety analyses act as complements to qualitative analyses. Two methods should be used in combination in engineering applications.

Inductive and Deductive Analyses

In addition to qualitative and quantitative methods, safety analysis methods can also be categorized by their approach as inductive and deductive analyses.

1. Introduction to Inductive and Deductive Analyses

Inductive analyses, known as bottom-up methods, start from known causes and work upward to determine the potential consequences of these causes,thereby identifying possible failures. In contrast, deductive analyses is a top-down approach, beginning with known consequences and the seeking out possible causes.

Common Safety Analysis Methods

There are various safety analysis methods used in engineering applications. For example, FMEA and FTA are two common methods for analyzing faults and failures of items and components within the ISO 26262 framework. If the developed system has specific Automotive Safety Integrity Level (ASIL) requirements, Failure Modes, Effects and Diagnostic Coverage Analysis (FMEDA) is typically implemented. Additionally, ETA and RBD can also be applied to conduct safety-related analyses.

Figure 1. FMEA Handbook
Figure 1. FMEA Handbook

Failure Mode and Effects Analysis (FMEA)

Failure Mode and Effects Analysis (FMEA) is one of the earliest fault analyses techniques, developed by reliability engineers in the late 1940s to study possible malfunctioning issues that may arise from military systems. It was later adopted by the automotive industry as international standards in the 1970s. The most widely used procedures at present are recorded in the handbook published by Automotive Industry Action Group (AIAG) and German Association of the Automotive Industry (VDA) in 2019 (see Figure 1). This handbook assists suppliers to support their development work.

The handbook was developed by Original Equipment Manufacturers (OEMs) and tier 1 Supplier Matter Experts (SMEs), integrating best practices from AIAG and VDA to form structural methodologies, covering Design FMEA, Process FMEA, and additional content for monitoring and system response FMEA. Primarily targeting technical risks, FMEA serves as an analytical method used for preventive quality management and monitoring in product design and manufacturing processes.

Figure 2. FMEA Diagram, Buttom-up Approach - ISO 26262-10:2018(E)
Figure 2. FMEA Diagram, Buttom-up Approach - ISO 26262-10:2018(E)

The primary characteristics of FMEA is that it begins with analyzing the causes of failures of each architectural component and then deduces the impacts to the overall system, and thus to develop the optimization measures for potentially unacceptable failures. In typical automotive applications, FMEA can be conducted by qualitative or quantitative methods in analyzing failures and malfunctions in safety system designs. Generally implemented as an inductive (bottom-up) approach, FMEA focuses on how failures occur within system components, and how these failures impact the overall system.

Figure 3. FTA Diagram, Top-down Approach - ISO 26262 - 10:2018(E)
Figure 3. FTA Diagram, Top-down Approach - ISO 26262 - 10:2018(E)

Failure Modes, Effects and Diagnostic Coverage Analysis (FMEDA)

The Failure Modes, Effects and Diagnostic Coverage Analysis (FMEDA) method was initially developed by exida in the 1990s, and was adopted as a recommended analysis method in the ISO 26262 functional safety standard in 2011. FMEDA can be seen as a quantitative extension of the FMEA, as it considers quantitative failure rates of hardware components. This includes the failure rates and the distributions of failure modes for these components, while also taking into account the safety mechanisms for the corresponding failure modes and their diagnostic coverage to detect critical failure modes.

FMEDA is mainly utilized during the hardware architectural design and hardware detailed design. At the hardware design level, it is essential to calculate the hardware architectural metrics, such as Single-Point Fault Metric (SPFM) and Latent Fault Metric (LFM). Iterative application of FMEDA can improve the hardware designs.

Fault Tree Analysis (FTA)

Fault Tree Analysis (FTA), developed by Bell Labs in the early 1960s, was used to evaluate the launch systems of ballistic missiles. This analysis method was then standardized by International Electronical Commission (IEC) in 2006 and has been referenced in automotive industry standards such as ISO 26262 as a potential or recommended analysis method.

FTA can be applied in both qualitative and quantitative techniques. For example, starting with qualitative fault analysis, quantitative statistics can then be integrated to strengthen the analysis and resulting in a quantitative variant of the analysis.

In contrast to FMEA, FTA is a deductive (top-down) method (see Figure 3) that enables the identification of base events or combinations of base events that may lead to the defined top event failure. Typically, as undesirable system event, the top event can violates a safety goal or the safety requirements derived from a safety goal.

To perform FTA, it is possible to start with the top undesirable event, then progressively builds a graphical tree structure. The interaction of potential causes for the undesirable even is represented by Boolean logic operations, such as AND, OR, and NOT gates. The quantitative variant of FTA can be implemented to calculate the third Probabilistic Metric for random Hardware Failures (PMHF) metric, which is also a recommended method in ISO 26262.

Figure 4. Classification and Integration of Analytical Methods
Figure 4. Classification and Integration of Analytical Methods

Comprehensive Application of Safety Analysis Methods

In practical engineering applications, the inductive and deductive methods can be combined to form the classification scheme shown in figure 4. In the development of safety-critical E/E systems, combining top-down methods (such as FTA) with bottom-up methods (such as FMEA) can identify detailed failure modes of semiconductor components, which can then be applied at the component level. Starting from a lower level of abstraction, a quantitatively precise failure distribution assessment of semiconductor components can be performed, with the failure distribution based on qualitative distribution assumptions.

Figure 5. FTA and FMEA Combined Analyses Diagram - ISO 26262 - 10:2018(E)
Figure 5. FTA and FMEA Combined Analyses Diagram - ISO 26262 - 10:2018(E)

E/E systems consist of numerous components and sub-components. FTA and FMEA can be combined to provide a complementary safety analysis method that balances top-down and bottom-up approaches. Figure 5 illustrates the possible combination of FTA and FMEA. The base events in the figure originate from different FMEAs (marked as FMEA A-E in this example). These base events are derived from analyses conducted at a lower levels of abstraction (such as sub-components, components or modules). In this case, base event 1 and 2 are derived from failures detected in FMEA D, while failures from FMEA B are not used in FTA.

Figure 6. Safety Analyses in the Safety Life Cycle
Figure 6. Safety Analyses in the Safety Life Cycle

Safety Analysis Methods in the Safety Life Cycle

Mapping of Safety Analyses and Safety Life Cycle

ISO 26262 standard refers to the safety life cycle that includes key safety activities during the concept phase, namely product development, production, operation, service and decommissioning. As a crucial aspect of the product development process, safety analyses must be implemented at the system level, hardware level and software level. The level of detail in the failure model description during safety analysis depends on the level of detail analyzed in the corresponding development sub-phase and is consistent within that sub-phase (see Figure 6). For example, during the concept phase, safety analyses are performed based on the initial architecture at the appropriate level of abstraction. In the product development phase, the necessary level of detail for analysis may depend on the specific analysis phase and the safety mechanisms applied .

Figure 7. Safety Analyses in the Hardware Design Phase of the Safety Life Cycle
Figure 7. Safety Analyses in the Hardware Design Phase of the Safety Life Cycle

Safety analyses are typically associated with design phase activities, such as the concept phase, system development and hardware development phases. These analyses are associated with activities in the concept, system and hardware development phases, such as architectural design and integration verification of system hardware (see Figure 7). Similarly, during the software development phase, safety analyses are linked to software development activities, such as software architectural unit design and verification activities (Figure 8).

Figure 8. Safety Analyses in the Software Design Phase of the Safety Life Cycle
Figure 8. Safety Analyses in the Software Design Phase of the Safety Life Cycle

Safety Analyses in the Concept Phase

In the concept phase of functional safety, ISO 26262 recommends implementing qualitative safety analyses to support the derivation of valid functional safety requirements, especially mentions FMEA, FTA, and HAZOP as suggested methods. In the technical safety concept phase of system development, a qualitative safety analysis of the system architectural design should first be conducted to provide evidence for the suitability of the system design, specifying safety-related functions and properties, such as analyzing requirements for independence or requirements for freedom from interference (FFI) within system components or between them, as well as identifying the causes of failures and the effects of faults. Moreover, if safety-related system elements and interfaces have already been defined, safety analyses can identify or confirm unknown new safety elements and interfaces. Finally, safety analyses support the design specification, and verify the effectiveness of safety mechanisms based on identified causes and effects of failures.

Considering the potential adverse effects of SOTIF and cybersecurity on achieving functional safety contributes to the holistic development of safe E/E systems. Similar considerations apply to the subsequent phases of development, and the content of the SOTIF and cybersecurity are beyond the scope of this article.

Figure 9. Failure Classification of Safety-Related Hardware Components for Relevant Items - ISO 26262 - 5:2018(E)
Figure 9. Failure Classification of Safety-Related Hardware Components for Relevant Items - ISO 26262 - 5:2018(E)

Safety Analyses in Hardware Design Phase

1. Qualitative Safety Analyses in Hardware Design Phase

In the hardware design phase, various safety analysis techniques are applied in combination. One aspect is qualitative safety analysis of the hardware design. For example, the qualitative FTA method aids in identifying the causes of failures and the effects of faults. For safety-related hardware components/parts, the qualitative FMEA method helps to identify different types of faults, particularly whose classified as safe faults, single-point failures or residual failures, and multi-point failures. Similarly, according to the recommendations of the ISO 26262 standard, it is suggested to use a combination of deductive and inductive analysis methods for safety analyses.

2. Quantitative Safety Analyses in Hardware Design Phase

On the other hand, when discussing random hardware failures, it is necessary to perform quantitative safety analyses related to hardware design. Quantitative safety analyses assist in evaluate or calculate metrics related to hardware architectural design. Hardware architectural design metrics include Single-point Fault Metric (SPFM), Latent Fault Metric (LFM), and Probabilistic Metric for Random Hardware Failures (PMHF). Typically, FMEDA methods are conducted in quantitative analyses to evaluate the suitability of the hardware architectural design with respect to the detection and control of safety-related random HW failures. This is done by analyzing scenarios where safety goals are violated due to random hardware failures, and thus calculating specific metrics for the hardware architectural design.

2.1. Failure classification of safety-related hardware components

Failures occurring in safety-related hardware components should be categorized as:

a) Single point fault

b) Residual fault

c) Multiple point fault

d) Safe fault

Multi-point faults need to be differentiated between latent, detected and perceived faults. Thus, safety-related hardware components failures are categorized as in Figure 9.

Figure 10. Classification of Failure Modes for Hardware Elements
Figure 10. Classification of Failure Modes for Hardware Elements

Among these:

-Distance n indicates the number of independent faults that simultaneously lead to a violation of safety goals (single-point or residual faults n = 1, dual-point faults n = 2, etc.).

-Faults at distance n are located between the n-ring and n-1-ring.

-Unless explicitly related to technical safety concepts, multi-point faults with a distance strictly greater than 2 are considered safe faults.

Note that in the case of transient faults, the safety mechanism will restore the affected item to a fault-free state, even if the driver is never notified of its presence, such faults are considered as detected multi point faults. For instance, in the case of protecting memory from transient faults using error-correcting codes, the safety mechanism not only provides the CPU with corrected value, but also repairs the contents of the flipped bits within the memory array (e.g. by writing back the corrected value), thereby returning the affected items to a fault-free state.

2.1.1. Single Point Fault

A fault of a hardware element which is not prevented by any safety mechanism, and can directly lead to a violation of the safety goals. For example, unmonitored resistors with at least one failure mode (e.g. open circuit) may violate the safety goals.

2.1.2. Residual Fault

A fault in a hardware element that has at least one safety mechanism to prevent from violating the safety goals that can directly lead to safety goal violation. For example, checking a random memory (RAM) block using only the safety mechanism of the checkerboard RAM test may fail to detect certain kinds of bridging faults. Violations of the safety goals due to these faults cannot be covered by the safety mechanisms. Such faults are known as residual faults, when the diagnostic coverage of the safety mechanism is less than 100%.

2.1.3. Detected Two-Point Faults

A fault that is detected by the safety mechanism preventing its latent state can only lead to a violation of the safety goals when in conjunction with another independent hardware faults (related to two-point faults). For example, a flash memory fault protected by parity checks can detect a single bit fault according to the technical safety concept and trigger a response, such as shutting down the system and informing the driver through a warning light.

2.1.4. Perceived Two-Point Faults

A fault that can be perceived by the driver, either is detected or undetected by the safety mechanisms within a specific time period, but it can only result in a violation of the safety goals in combination with another independent hardware fault (related to two-point faults). For instance, a two-point fault which the function is clearly and distinctly affected by the consequences of the fault and can be perceived by the driver.

2.1.5. Latent Two-Point Faults

A fault that is neither detected by the safety mechanism nor perceived by the driver, a;pw the system to remain operational all the time, without notifying the driver, until a second independent hardware fault occurs.

For example, in flash memory protected by EDC: ECC corrects a single bit permanent fault value during reading, but this correction is not made within the flash memory, nor is there any signal indication. In this case. The fault cannot lead to a violation of the safety goals (since the fault bit has been corrected), but it is undetectable (due to the lack of signal indication for the single bit fault) nor imperceptible (since it does not impact the functionality of the application). If an additional fault occurs within the EDC logic, it can lead to the loss of control over the single bit fault, leading to a potential safety goal violation.

2.1.6. Safe Faults

Safety faults include the following two categories:

a)All n-point failures with n > 2, unless the safety concept indicates that they are relevant factors that violate the safety objective; or

b)failures that do not lead to a violation of the safety objective.

An example is a single bit fault that is corrected by ECC but not signaled in the case of flash memory protected by ECC and cyclic redundancy check (CRC). The ECC prevents the fault from violating the safety objective, but the ECC does not signal it. If the ECC logic fails, the CRC will be able to detect the fault and the system will shut down. Only when a single bit fault exists in the flash memory, the ECC logics fails, and the CRC checksum and monitoring fails, The safety objective will be violated (n=3).

2.2. Failure Modes and Failure Rates of Hardware Elements

2.2.1. Failure Modes of Hardware Elements

According to the fault classification model, the failure modes of hardware elements are categorized as shown in Figure 10.

Figure 11. Flowchart for Failure Model Classification
Figure 11. Flowchart for Failure Model Classification

2.2.2. Hardware Element Failure Mode Classification Process

The failure mode classification process is shown in Figure 11.

And:

λSPF is the failure rate associated with single-point faults in hardware elements

λRF is the failure rate associated with residual faults in hardware elements

λMPF is the failure rate associated with multi-point faults in hardware elements

λS is the failure rate associated with safety faults in hardware elements

The faulure rate associated with multi-point faults in hardware elements, λMPF, can be expressed according to Equation (1-1) as follows:

λMPF = λMPF,DP + λMPF,L(1‑1)

where:

λMPF,DP is the failure rate associated with multi-point faults in hardware elements

λMPF,L is the failure rate associated with detected or perceived multi-point faults in hardware elements

2.3. Hardware Architecture Metrics

Hardware architecture metrics are used to assess the effectiveness of the associated item architecture in coping with random hardware failures.

The goals of hardware architecture metrics are:

  • Objectively evaluable: The metrics are verifiable and precise enough to distinguish between different architectures;
  • Support the evaluation of the final design (based on the detailed hardware design with accurate calculations);
  • Provide pass/fail criteria for hardware architectures based on ASIL levels;
  • Indicates the adequacy of coverage of safety mechanisms used to prevent the risk of single point or residual failures in the hardware architecture (single point fault metric);
  • Indicates the adequacy of coverage of safety mechanisms used to protect against the risk of latent failures in the hardware architecture (latent fault metric);
  • Deal with single point faults, residual faults, and latent faults;
  • Ensure the robustness of the hardware architecture;
  • Limited to safety-critical elements only;
  • Support applications at different element levels, such as assigning target values for vendor hardware elements. For example, target values can be assigned to microcontrollers or ECUs to facilitate distributed development.

2.3.1. Single Point of Faults Metrics

The single point fault metric reflects the robustness of the item to single point and residual faults through the coverage or design of the safety mechanisms (mainly safety faults). A high single-point fault metric indicates that the proportion of single-point and residual faults in the hardware of the subject item is low.

For hardware designs with safety objectives of ASIL (B), C, and D ratings, Equation (1-2) is used to determine the single point of fault metric:

Figure 12. Graphical Representation of Single-Point Fault Metrics - ISO 26262-5:0:2018(E)
Figure 12. Graphical Representation of Single-Point Fault Metrics - ISO 26262-5:0:2018(E)

Only safety-related hardware elements of relevant items are considered. Hardware elements for safety faults or n-order multipoint faults (n>2) may be omitted from the calculation unless they are explicitly related to technical safety concepts. A graphical representation of the single point fault metric is shown in Figure 12.

2.3.2. Latent Fault Metric

The latent fault metric reflects the robustness of the relevant term to latent faults, either by overriding safety mechanisms or by the driver detecting the presence of a fault before a safety goal is violated, or by the design (mainly safety faults). A high latent fault metric implies a low percentage of latent faults in the hardware.

For hardware designs with ASIL (B), (C), and (D) safety objectives, equation (1-3) is used to determine the latent failure metric:

Figure 13. Graphical Representation of Latent Fault Metrics - ISO 26262-5:2018(E)
Figure 13. Graphical Representation of Latent Fault Metrics - ISO 26262-5:2018(E)

Only safety-related hardware elements of relevant items are considered. Hardware elements for safety faults or n-order multipoint faults (n>2) are omitted from the calculation unless explicitly relevant in the technical safety concept. A graphical representation of the latent fault metric is shown in Figure 13.

Table 1. Hardware Architecture Design Metrics and Standard Requirements
Table 1. Hardware Architecture Design Metrics and Standard Requirements

2.3.3 Probability of Random Hardware Failure (PMHF) Measurement

As shown in Equation (1-4), the formula for calculating PMHF value is:

PMHFest = λSPF + λRF+ λDPF_det × λDPF_latent × Tlifetime (1‑4)

For each failure mode, calculate its contribution to the total PMHF value as a percentage.

2.4 Hardware Architecture Metrics Target Values

For specific metrics of hardware architecture design, the standard provides corresponding target values (as shown in Table 1), which typically depend on the highest ASIL level that the hardware design needs to meet. For ASIL A levels, the standard does not recommend target values, for ASIL D levels, the standard recommends the most stringent target values, and for some cases of ASILB and ASILC, the metrics are recommendations rather than mandatory requirements in the sense of the standard.

Figure 14. Temporal Interference Leading to Cascading Failures - ISO 26262-6:2018(E)
Figure 14. Temporal Interference Leading to Cascading Failures - ISO 26262-6:2018(E)

Safety Analysis in the Software Design Phase

1. Safety-Related Functions and Attributes

The derivation of software safety requirements should consider the safety-related functionality and safety-related attributes required by the software. Failures in safety-related functions or safety-related attributes may lead to violations of the technical safety requirements assigned to the software.

Safety-related functions of software typically include:

  • Functions that enable safe execution of the nominal function
  • Functions that enable the system to achieve or maintain a safe or degraded state
  • Functions related to detecting, indicating, and mitigating failures of safety-related hardware elements
  • Self-testing or monitoring functions related to detecting, indicating, and mitigating failures in the operating system, underlying software, or the application software itself
  • Functions related to onboard or offboard testing during production, operation, servicing, and end-of-life stages
  • Functions that allow modification of software during production and service
  • Functions related to performance or time-critical operations

Safety-related attributes typically include:

  • Robustness to erroneous inputs
  • Independence or non-interference between different functions
  • Fault tolerance of the software, etc.

2. Safety Analysis in the Design Phase of Software Architecture

During the software architecture design phase, corresponding software safety analysis activities should be conducted, which the standard refer to as safety-oriented analysis methods. Safety-oriented analysis is essentially a form of qualitative analysis. Firstly, safety-oriented analysis helps provide evidence that whether the software is suitable to provide the specified safety-related functions and attributes as required by the desired ASIL level. Secondly, safety-oriented analysis helps to identify or validate safety-related software content. Finally, safety-oriented analysis supports the development of safety mechanisms and thus verifies the effectiveness of safety measures.

Relying on the requirement of non-interference or sufficient independence between safety-related elements, the standard recommends performing a Dependent Failure Analysis (DFA). DFA identifies relevant failures or potentially relevant failures and their effects. The objective and scope of the DFA depends on the sub-stage and the level of abstraction at which the analysis is performed. the elements of the failure of interest are defined before the analysis is performed (e.g., in the safety plan). This is also a part of the qualitative analysis.

3. Analysis of Safety-Related Failures at the Software Architecture Level

Embedded software provides the ability to specified functions, behaviors, and attributes, as well as the integrity required by the assigned ASIL. By applying safety-related failure analysis at the software architecture level to check or confirm the corresponding safety functions and attributes, along with the integrity of the corresponding assigned ASIL requirements.

3.1 Purpose of Safety Analysis at the Software Architecture Level

During the software architecture design phase, relevant failure analysis should be conducted to identify possible single events, faults or failures that could lead to failure behaviour of multiple software elements that require independence (e.g., cascading and/or common cause failures, including common mode failures), and single events, faults or failures that may initiate a chain of causality leading to a violation of a safety requirement that propagates from one software element to another (e.g., cascading failures). Through the failure analysis in the software architecture design phase, the degree of independence or non-interference achieved between relevant software architecture elements should then be tested.

3.2 Software Architecture Level Safety Analysis

Relevant failure analysis at the software architecture level shall consider the following aspects:

  • Identifying potential design weaknesses, conditions, errors, or failures that could trigger a causal chain leading to violation of safety requirements (e.g., using inductive or deductive methods)
  • Analyzing the consequences of possible faults, failures, or causal chains on the functions and attributes required of software architecture elements Analyzing functions and attributes that software architecture elements required and the consequences of possible faults, failures, or causal chains

3.3 Application Scenarios for Relevant Failure Analysis at the Software Architecture Level

The following scenarios may require failure analysis at the software architecture level:

  • Applying ASIL decomposition at the software level
  • Implementing software safety requirements, such as providing the evidence for the effectiveness of software safety mechanisms, where the independence between monitored elements and monitoring elements must be ensured

4. Deriving Security Mechanisms from Software Architecture Level Safety Analysis

The safety measures include safey mechanisms derived from safety-oriented analyses that cover issues related to random hardware failures and software failures. The results of the safety analyses performed at the software architecture level require the implementation of error detection safety mechanisms and error handling safety mechanisms.

4.1 Error Detection Safety Mechanism

Error detection safety mechanisms include:

  • Range checks on input and output data
  • Plausibility checks (e.g., using reference models of the desired behavior, assertion checks, or comparing signals from different sources)
  • Data error detection (e.g., error detection codes and redundant data storage)
  • External element monitoring (e.g., ASIC) or another software element that executes the program that performs the watchdog function. The monitoring can be logical, temporal, or both
  • Temporal monitoring of program execution
  • Diverse redundancy design
  • Access violation control mechanisms implemented in software or hardware, used to allow or deny access to safety-related shared resources

The example in Figure 14 illustrates the interference caused by conflicting use of shared resources (e.g., shared processing elements). The QM software element interferes with and prevents timely execution of ASIL software elements (this interference can also occur between software elements with different ASIL levels). The upper half of the figure shows the software execution without interference mechanisms. By introducing "checkpoints" into the software and implementing timeout monitoring of them, timing perturbations can be detected, enabling appropriate countermeasures.

4.2 Safety Mechanisms for Error Handling

Safety mechanisms for error handling include:

  • Deactivation to achieve and maintain a safe state
  • Static recovery mechanisms (e.g., module recovery, backward recovery, forward recovery, and repeated recovery)
  • Graceful degradation by prioritizing functions to minimize the negative impact of potential failures on functional safety
  • Homogeneous redundancy in design, which focuses primarily on controlling the effects of transient or random failures in hardware executing similar software (e.g., temporary redundant execution of software)
  • Diverse redundancy in design, which involves designing different software in each parallel path and focuses mainly on preventing or controlling systematic failures in the software
  • Code for data correction
  • Access rights management implemented in software or hardware to grant or deny access to safety-related shared resources

It is important to note that a review of system-level software security mechanisms (including robustness mechanisms) can be performed to analyze their potential impact on system behaviour and their alignment with technical security requirements.

References

  1. International Organization for Standardization. (2018). Road vehicles — Functional safety — (ISO 26262:2018). https://www.iso.org/standard

Get in Touch with Us

Prof. Dr. Mirko Conrad and Björn Kunze
Prof. Dr. Mirko Conrad & Björn Kunze
tudoor academy

* Mandatory field

Please calculate 1 plus 7.