Skip to main content
  • Original research article
  • Open access
  • Published:

Large-scale energy storage system: safety and risk assessment

Abstract

The International Renewable Energy Agency predicts that with current national policies, targets and energy plans, global renewable energy shares are expected to reach 36% and 3400 GWh of stationary energy storage by 2050. However, IRENA Energy Transformation Scenario forecasts that these targets should be at 61% and 9000 GWh to achieve net zero carbon emissions by 2050 and limit the global temperature rise within the twenty-first century to under 2 °C. Despite widely known hazards and safety design of grid-scale battery energy storage systems, there is a lack of established risk management schemes and models as compared to the chemical, aviation, nuclear and the petroleum industry. Incidents of battery storage facility fires and explosions are reported every year since 2018, resulting in human injuries, and millions of US dollars in loss of asset and operation. Traditional risk assessment practices such as ETA, FTA, FMEA, HAZOP and STPA are becoming inadequate for accident prevention and mitigation of complex energy power systems. This work describes an improved risk assessment approach for analyzing safety designs in the battery energy storage system incorporated in large-scale solar to improve accident prevention and mitigation, via incorporating probabilistic event tree and systems theoretic analysis. The causal factors and mitigation measures are presented. The risk assessment framework presented is expected to benefit the Energy Commission and Sustainable Energy Development Authority, and Department of Standards in determining safety engineering guidelines and protocols for future large-scale renewable energy projects. Stakeholders and Utility companies will benefit from improved safety and reliability by avoiding high-cost asset damages and downtimes due to accident events.

Introduction

The International Renewable Energy Agency (IRENA) forecasts that with current policies and targets, that in 2050, the global renewable energy share will reach 36%, with 3400 GWh of installed stationary energy storage capacity. However, to achieve IRENA’s 2050 energy Transformation Scenario targets of net zero carbon emissions by 2050 and keep global temperature rise within the century to under 2 °C, these targets should be 61% and 9000 GWh, respectively (International Renewable Energy Agency, 2050). Malaysia experienced a growth of solar PV capacity of 279 MW in 2016 to 1787 MW in 2021, largely contributed by the development of large-scale solar (LSS) scheme bidding program by the Energy Commission and domestic and commercial solar PV schemes by the Sustainable Energy Development Authority (SEDA) (IRENA, 2021). The most recent cycle of LSS bidding is expected to contribute a growth of 823 MW in solar PV capacity beginning operations between 2022 and 2023 (Commission, 2022). To date, no stationary energy storage system has been implemented in Malaysian LSS plants. At the same time, there is an absence of guidelines and standards on the operation and safety scheme of an energy storage system with LSS. Despite widely researched hazards of grid-scale battery energy storage systems (BESS), there is a lack of established risk management schemes and damage models, compared to the chemical, aviation, nuclear and petroleum industries. BESS fire and explosion accidents are reported every year since 2017, resulting in human injuries, deaths and asset losses in millions of US Dollars. As power system technologies advance to integrate variable renewable energy, energy storage systems and smart grid technologies, improved risk assessment schemes are required to identify solutions to accident prevention and mitigation. Traditional risk assessment methods such as Event Tree Analysis, Fault Tree Analysis, Failure Modes and Effects Analysis, Hazards and Operability, and Systems Theoretic Process Analysis are becoming inadequate for designing accident prevention and mitigation measures in complex power systems.

This paper proposes an improved risk assessment approach for analysing safety designs in the BESS incorporated in large-scale solar plant as shown in Fig. 1, to overcome the weaknesses of individual traditional risk assessment methods. A literature review is presented in "Literature Review" section on Battery Energy Storage technologies, known BESS hazards and safety designs based on current industry standards, risk assessment methods and applications, and proposed risk assessments for BESS and BESS accident reports. A proposed risk assessment methodology is explained in ''Methodology'' section incorporating quantitative analysis elements with the Event Tree Analysis method and Systems Theoretic Process Analysis method for assessing required mitigation strategies. The case study of the risk assessment is applied with large-scale solar PV projects in Malaysia with varying battery sizes. The results and discussions of the risk assessment findings are presented in ''Results and Discussion ''section. With quantitative and qualitative analyses.

Fig. 1
figure 1

Schematic of large‐scale solar plant with BESS

Problem statement

Intermittency of Variable Renewable Energy (solar and wind) causes power supply stability issues to the grid. For example, voltage stability can be interfered by the varying supply of the power from large-scale solar PV and require reactive power compensation. A mismatch between PV generated power supply frequency and load frequency can cause frequency instability. These guidelines are governed by the Malaysian Grid Code. Battery Energy Storage Systems, along with more complex controller designs are required to ensure reliable operation of the power system network, incurring additional expenditure to operate a large-scale solar farm (Hajeforosh et al., 2020). Smart grid infrastructure requires real time two-way communication and interoperability between components of the power system to optimize grid efficiency by matching loads and distributed generation sources, typically Solar PV with Energy Storage Systems. Such requirements for data and communications technology require increasingly sophisticated equipment and softwares, introducing new hazards and risks to the overall power distribution network (Voima & Kauhaniemi, 2012).

There is a lack of an established framework for the installation and operation of Battery Energy Storage Systems in Malaysia. The range of official guidelines and standards for Solar PV installation covers installation size limits, feed-in tariff rates, grid connection guidelines, safety requirements and incentives. For example, connection guidelines, system components sizing, and basic safety requirements are covered in Malaysian Standard MS1837, while the tariffs, installation limits and total quotas are set by the Sustainable Energy Development Authority and Energy Commission. This clear framework allowed Malaysia to increase its PV installed capacity by up to fivefold from 2015 to 2021, across all residential, commercial, industrial and LSS plant types (Commission, 2022; SEDA FiT Rates, 2021). The lack of such standards and guidelines introduces uncertainty for stakeholders and investors to conduct lifecycle cost analysis on BESS adoption to decide whether they are economically viable investments.

Battery Energy Storage System accidents often incur severe losses in the form of human health and safety, damage to the property and energy production losses. Jimei Dahongmen Shopping Centre 25 MWh Lithium Iron Phosphate battery explosion caused the loss of lives of 2 firefighters (Accident analysis of Beijing Jimei Dahongmen 25 MWh DC solarstorage-charging integrated station project, 2021). ESS facility fire Gimhae, SK Case of overcharging 1.0MW Li-ion BESS, Power conversion system fault caused accidental overcharging leading to thermal runaway (Yoon-seung, 2020). In 2018, South Korea’s electric utility KEPCO reported 23 large-scale Battery ESS fires resulting in over $20million USD in equipment damage losses (Colthorpe, 2019; Pierce, 2019). In 2019, four firefighters were severely injured in the Arizona Public Service 2.16 MWh Li-ion Battery explosion incident, where the fire captain was propelled over a 20 m distance, through the surrounding wire fence (McKinnon et al., 2020). Figures 2 and 3 show the live fire and aftermath of the Jimei Dahongmen and Arizona battery incidents, respectively. Accident reports cited varieties of possible safety system failures without being able to pinpoint exact accident escalation paths, thus unable to target mitigation measure improvement. Evidently, there is need for improvement in the safety and risk assessment and management of these grid-scale renewable energy-integrated Battery Energy Storage systems.

Fig. 2
figure 2

Jimei Dahongmen Li‐ion battery fire (Accident analysis of Beijing Jimei Dahongmen 25 MWh DC solarstorage-charging integrated station project, 2021)

Fig. 3
figure 3

Arizona public service li‐ion battery explosion aftermath, showing the explosion deflagration event (McKinnon et al., 2020)

In this work, the aim is to develop an innovative risk assessment methodology, to incorporate the strengths of a Chain of Events model, systemic view assessment and probabilistic risk assessment to evaluate large-scale solar PV safety with emphasis on essential safety systems. The first of the objectives of this work is to identify hazards of a BESS integrated in an LSS plant and the safety barriers for accident prevention and mitigation. The failure frequencies and probabilities of failure on demand of the safety barriers are included. This is followed by development of the event tree and consequences for a hazardous event, by applying safety barrier success or failures states as event tree branches. The next objective is to evaluate outcome probabilities and frequencies of severe damage from the event tree-based analysis for the case study sites. Final objective is the analysis of safety barrier failure modes, causes and mitigation measures using the STPA-based analysis method.

  • Identify hazards and safety barriers of a LSS + BESS system.

  • Develop Event tree by analysing safety barrier performance in hazard event and its consequences.

  • Evaluate probabilities and frequencies of severe damage outcomes on case study sites.

  • Analyse safety barrier failure modes, causes and mitigation measures via STPA-based analysis.

Literature review

Battery energy storage technologies

Battery Energy Storage Systems are electrochemical type storage systems defined by discharging stored chemical energy in active materials through oxidation–reduction to produce electrical energy. Typically, battery storage technologies are constructed via a cathode, anode, and electrolyte. The oxidation and reduction reactions at the electrodes generate an aggregate potential difference and subsequently, electron flow in the external circuit (Hossain et al., 2020).

Lithium-based battery

Lithium-ion batteries are known for their low self-discharge rate. The anode is made up of graphite in a layering structure and the electrolytes are made up of lithium salt. The cathode is made of a lithiated metal oxide. There are several types of Li-ion batteries based on the metallic element in the cathode, such as lithium nickel manganese cobalt (NMC) oxide, lithium cobalt oxide, lithium nickel cobalt aluminium (NCA) oxide and lithium iron phosphate (LFP) (Behabtu, 2020; Hossain et al., 2020; Kebede et al., 2022). During the discharge phase, the Li atoms at the anode ionize and are carried to the cathode in the electrolyte due to difference in electrolytic concentration on the anode and cathode side, shown in Fig. 4. Lithium-ion batteries have high power densities of 500–2000 W/l, high energy densities of 200–500 Wh/l and high round trip efficiencies of 85–95%. However, they are high power and energy costs up to 4000 $/kW and 3000 $/kWh, which is the highest among the other battery technologies (Behabtu, 2020; Hossain et al., 2020).

Fig. 4
figure 4

Schematic construction of li‐ion battery (Hossain et al., 2020)

Lithium metal batteries use metallic lithium as the anode instead of lithium metal oxide, and titanium disulfide as the cathode. Due to the vulnerability to formation of dendrites at the anode, which can lead to the damage of the separator leading to internal short-circuit, the Li metal battery technology is not mature enough for large-scale manufacture (Hossain et al., 2020).

Lead-acid battery

Lead-acid batteries consist of a sponge lead cathode and a lead dioxide anode submerged in sulphuric acid, shown in Fig. 5. They are the most mature battery technology, being fully commercialized, with low power and energy costs, and high power and energy densities at 10–400 W/l and 50–880 Wh/l. they have moderate lifetime of 5–15 years and 70–90% efficiency (Behabtu, 2020; Hossain et al., 2020).

Fig. 5
figure 5

Lead‐acid battery working principle (Hossain et al., 2020)

Fig. 6
figure 6

Schematic diagram of flow batteries (Antweiler, 2014)

Vented lead-acid batteries, also known as flooded lead acid batteries, contain sulphuric acid electrolyte that is free to move around the battery casement. Internal gases such as hydrogen gas are released directly to the environment during the charging phase through vents. They are known for having low energy cost but also have weak internal construction and high internal resistance (Behabtu, 2020). Valve regulated Lead acid batteries are also known as sealed lead acid batteries. The electrolyte is a coagulated form sulphuric acid contained in a sealed compartment which does not leak, making it safer to use. The batteries also contain vents to release gases (Behabtu, 2020).

Flow battery

For flow batteries as shown in Fig. 6, the energy is stored in chemical form in the active electrolyte, stored in two external reservoirs, and fed into the reactor via pumps. The positive electrolyte is called the anolyte and the negative electrolyte is called the catholyte. A membrane in the reactor separates the catholytic and anolytic side of the electrolyte and only allows limited number of ions to migrate through. The ions participate in reduction–oxidation reaction in the reactor to produce electrical energy to the external circuit (Hossain et al., 2020).

Vanadium Redox Flow Batteries (VRFB) stores ions in an electrolytic solution of vanadium sulphate dissolved in sulphuric acid. The membrane allows H+ ions to migrate across and impedes HSO4 ion migration (Hossain et al., 2020). Vanadium Redox couples are used as electrodes, V2+/V3+ at the anode and V4+/V5+ at the cathode. They are known to be the most mature flow battery technology, with high life cycles above 10,000, and operate at up to 90% efficiency at light loads (Hossain et al., 2020). They have moderate power and energy densities compared to other technologies at 0.2–2 W/l and 20–70 Wh/l, as well as moderate power and energy costs (Behabtu, 2020; Kebede et al., 2022).

Zinc Bromine (ZnBr) Battery is a hybrid flow battery containing a battery electrode and a fuel cell electrode. Zinc is used as the solid negative electrode, while bromine dissolved in an aqueous solution is used as the positive electrode, stored in an external reservoir. It uses aqueous solution of zinc bromide salt as the electrolyte. They have energy densities of 35–75 Wh/l, long life cycle of 10–20 years, round trip efficiency of 65–80%, considered low among battery energy storage technologies. The ZnBr battery is prone to zinc electrode corrosion and bromine is considered a toxic material (Behabtu, 2020).

High temperature battery

Sodium Sulphur (NaS) batteries are constructed using molten sulfur as the anode, molten sodium as the cathode and solid beta alumina ceramic as the electrolyte. The setup only allows sodium ions from the anode to travel to the cathode via the electrolyte at the discharge phase, where it forms sodium polysulfides with the sulphur cathode, shown in Fig. 7. The ideal operating temperature of the battery is 300–360 °C, where both sodium and sulphur electrodes are in molten state (Hossain et al., 2020; Kebede et al., 2022). NaS batteries have high power and energy densities at 150–250 W/l and 150–250 Wh/l, respectively. They also have high power cost at 300 $/kW, efficiency of 80–90% and good service life of 10–15 years (Behabtu, 2020; Hossain et al., 2020).

Fig. 7
figure 7

Schematic of sodium sulphur battery (Hossain et al., 2020)

Fig. 8
figure 8

Components and structure of NaNiCl2 battery (Mexis & Todeschini, 2020)

Sodium Nickel Chloride (NaNiCl2) as shown in Fig. 8 is a type of Sodium Metal Halide battery. Liquid sodium is used as the cathode and solid metal halide is used as the anode, in this case, nickel chloride. The electrodes are separated by a sodium chloroaluminate (NaAlCl4) electrolyte, in this case, sodium chloroaluminate. The operating temperature of this battery is 300 °C, where the electrolyte is molten. Compared to NaS batteries, NaNiCl2 batteries have lower energy densities up to 180 Wh/l, similar service life, lower power cost 100–300 $/kW, power density of 200–300 W/l and energy cost of 100–300 $/kWh (Behabtu, 2020; Hossain et al., 2020).

Recent BESS technologies

Esser et al. reviewed the potential of commercialization of new generation batteries using fully organic compound electrodes or metal–organic hybrid electrodes, using Carbon, Hydrogen, Oxygen, Nitrogen and Sulphur atoms to form the organic polymers. Organic full cells describe batteries with both electrodes using organic compounds and half-cells comprise of the organic polymer cathode and inorganic metal anode, typically Lithium, Sodium or Potassium. New electrode polymer materials are commonly tested in the half-cell configuration. Prototypes of metal-ion organic electrodes such as LiC8H2O6 and K2C6O6 produced specific energies of up to 130 Wh/kg and 35 Wh/kg, respectively. A main motivation for the use of organic material is overcoming the demand for metallic resources through destructive mining processes. Organic polymers electrodes would also offer high structural designability and less energy intensive recycling process, compared to recycling of metallic compounds. Organic compounds commonly have low density and low volumetric energy density compared to inorganic materials, requiring much large size to achieve required battery capacity (Esser et al., 2020).

Elia et al. reviewed the potential for wide application of Aluminium batteries. Aluminium-air battery uses Aluminium metal anode, air cathode and aqueous electrolyte, which can be either an alkaline or neutral salt solution. Low redox potential of Al to Al3+ exchanging three electrons per reaction contributes to high theoretical voltage and capacity. Alkaline aluminium-air batteries have theoretical specific energy of up to 400 Wh/kg. The abundance of aluminium in the earth’s crust, makes it a suitable choice for mass production and its low toxicity contributes to relative ease of recycling or disposal. The redox reaction of Al-air batteries can theoretically reach 8600 Wh/kg at electrical efficiency of 25–45%. The main drawback of this technology is that it is non-rechargeable i.e., the aluminium electrode is to be replaced upon complete oxidation. A molten salt disposition process can be used to regenerate the Aluminium electrode, but the requires high energy consumption. Recent studies suggested using oil to replace the aqueous electrolyte to minimize corrosion (Elia, 2021).

Summary

The characteristics of the battery energy storage technologies discussed in ''Battery Energy Storage Technologies'' section are summarized in Table 1. A comparison of power density and energy density as a measure of required battery size to achieve a certain discharge power or storage capacity is carried out for different types of energy storage technology. Power and energy costs compare per unit costs for discharge power and storage capacity, respectively, to assess the economic viability of the battery technology for large-scale projects. Round trip efficiencies of the discussed battery technologies range from 65% to 95% with lifetimes of 5 years to 20 years.

Table 1 Characteristics of BESS Technologies (Hossain et al., 2020, Behabtu, 2020, Kebede et al., 2022)

Safety hazards

The NFPA855 and IEC TS62933-5 are widely recognized safety standards pertaining to known hazards and safety design requirements of battery energy storage systems. Inherent hazard types of BESS are categorized by fire hazards, chemical release, physical impacts, and electrical hazards.

Thermal runaway refers to a situation in which the temperature of a material increases uncontrollably and rapidly due to a self-reinforcing process. It characteristically occurs when the heat generated by a system surpasses its ability to remove or dissipate heat, leading to a positive feedback loop that further accelerates the temperature rise. The self-heating comes from internal exothermic reactions of decomposition of the anode, cathode or electrolyte material. This condition can be induced by external heating, overcharging, over-discharging or internal short circuit due to mechanical impact (Agency, 2020). The continued heating and temperature increase of the cell can cause an internal battery fire to self-ignite in the absence of oxygen. ‘Once in thermal runaway, the decomposition of active material of the cell releases gas vapour into the BESS containment structure, forming a combustible mixture in the presence of oxygen over time, where a delayed ignition can cause explosion. Once thermal runaway in a cell has started, the heat release spreads to adjacent cells and can induce thermal runaway propagation event known as cascading thermal runaway (Chen et al., 2022).

Zou et al. concluded that the higher state of charge (SOC) battery had greater heat release, had higher flame temperature and shorter time to self-ignition (Zou et al., 2022). Liu et al. concluded that higher ambient temperature and higher state of charge accelerated the self-ignition process (Liu et al., 2020). Yang Jin et al. induced thermal runaway in a lithium–iron–phosphate (LFP) battery by overcharging it. Ethylene Carbonate and Ethyl Methyl Carbonate were identified as the main combustible gases in vented gas mixture (Jin et al., 2021). Lithium-ion batteries vented gas mixtures from thermal runaway at 100% SOC contained H2, CO2, methane CH4, ethylene C2H4, ethane C2H6, ethylene carbonate (CH2O)2 CO and ethyl methyl carbonate C4H8O3, commonly known combustible hydrocarbons. Accumulated concentrations of such gas vapor mixture can lead to catastrophic explosion upon ignition (Wang et al., 2019). The NFPA 855 specifies that the concentration of flammable gas mixture in the BESS enclosure must be kept below 25% of the Lower Flammability Limit (LFL) via exhaust ventilation.

The NFPA 855 classified chemical hazards as corrosive electrolytes, toxic gas releases, reactive or toxic metals and oxidizers. Battery electrolytes with pH levels below 2 or over.

11.5 are considered corrosive. As such corrosive electrolytes are usually sealed, workers are not exposed to the corrosive electrolytes under normal operating conditions. A possible electrolyte leak due to battery shell damage or spillage situation could directly expose workers and emergency responders to the corrosive chemical and cause serious to permanent injury to the eyes or skin (International Electrotechnical Commission, 2020).

Toxic materials are expressed by LC50, a parameter of acute inhalation toxicity. Lethal Concentration (LC) 50 is defined as the dosage of inhaled concentration that will lead to death in 50% of the dosed population, expressed in parts per million (ppm) (Wood, 2014). The lower the LC50, the lesser the material concentration required to cause the same damage, thus more harmful the material. Toxicity levels are divided into 5 levels according to another NFPA guideline, NFPA 704, Standard System for the Identification of the Hazards of Materials for Emergency Response, summarized in Tables 2 and 3.

Table 2 Toxicity level classification by effects by LC50 (Agency, 2020)
Table 3 Summary of hazard type and its standardized thresholds

Reactive metals can cause violent chemical reactions with moisture in the air. By design, reactive metals are protected under normal operating conditions but may become exposed during abnormal situations. Exposure to oxidizer material can increase the flammability potential of other materials present and lead to increased intensity of fires (Agency, 2020).

Physical hazards for batteries include hot parts and moving parts, often discussed in the context of direct harm to human beings exposed to the hazard. Hot surfaces on the battery components can cause burns if it comes into contact with human skin (Agency, 2020). If any mechanical impact affects the battery cells and compromises their internal structural integrity, internal short circuit may be induced, leading to a thermal runaway.

Electrical hazards such as electrical shock and arc flashes can cause serious harm to maintenance workers. Energy storage systems with voltages above 50 V can cause serious harm to workers who may be exposed to live parts. The presence of conductive fluids such as water can worsen the extent of the damage. Electrical arc flashes can occur at high-current contactors and generate high pressure and thermal loads inside the electrical enclosure (Zalosh et al., 2021). Arc flashes with incident energy above 5 J/cm2 are capable of serious harm and the use of personal protective equipment and hazard labelling and markings are required by regulation (International Electrotechnical Commission, 2020). During abnormal conditions, the battery holding a significant amount of stored charge can pose risks of electrical shock and arc flash to the on-site technicians or emergency responders.

The inherent hazards of battery types are determined by the chemical composition and stability of the active materials, potentially causing release of flammable or toxic gases. High operating temperatures pose high risks for human injuries and fires. Electrical hazards are present in each BESS type due to the power control systems for grid integration.

Lithium-ion battery cells vent combustible gases under abnormal conditions. Hydrogen fluoride, HF, hydrogen cyanide (HCN) are toxic gases vented from the battery found in BESS in thermal runaway events (Gully, 2019). Lithium metal batteries contain lithium metal electrodes which can undergo aggressive chemical reaction when exposed to water or air. Lead acid batteries and vanadium redox batteries may vent hydrogen gases, from the sulphuric acid electrolyte. The acid electrolyte is extremely corrosive and can cause serious human injuries. Sodium-based batteries operate at high-temperature ranges (270–350 °C) and contain reactive metal sodium in a molten state. Damages to the air-tight seal may expose sodium to air and moisture and initiate violent chemical reactions. The zinc bromide flow battery contains zinc bromide electrolyte, a corrosive acid with LC50 Level 3 inhalation toxicity. Flow batteries require manual replenishing of electrolytes, where mishandling may cause spill of toxic and corrosive material. Table 4 summarizes the inherent hazards present with each battery storage technology.

Table 4 Summary of batteries and associated hazards (Agency, 2020) (International Electrotechnical Commission, 2020)

Natech events are cascading events involving the release of hazards of a technological system, triggered by the effect of natural events such as floods, earthquakes, and hurricanes. Natech risks are often considered in risk management plans in the chemical section or and oil and gas facilities, where natural disaster events can damage the containment vessels of flammable or toxic substances leading to the atmospheric release of these chemicals (Misuri et al., 2021).

In context of the Malaysian LSSPV scheme, major natural hazard events of concern are floods, flash floods, and landslides. Flash floods are characterized by excessive rainfall within hours causing heavy flow in riverbeds and urban waterways, whereas floods are characterized by overflow of waterways over time spans of days to weeks. Over the past 20 years, Peninsular Malaysia has experienced floods and landslide events with varying severities, most notably floods of 2014 and 2021 caused by heavy monsoon season rainfall affecting simultaneously in multiple states, causing extensive property damage, loss of lives, mass evacuations and billions of Malaysian Ringgit spent on rebuilding, victim support and rescue efforts. Such Natech events would cause extensive damage to power system components of LSSPV and BESS and subsequently release of hazards, such as active chemicals in the battery cells into floodwater or unmitigated battery fires.

Safety and risk assessment

A variety of commonly practiced risk assessment methods are discussed, with applications in aeronautic, automotive, chemical, manufacturing, nuclear and petroleum industries. A hazard is defined as a dangerous substance or state that may lead to a loss in the form or damage to equipment, loss of output, injury, death, or environmental damage. A risk is an expression of the likelihood of an event and the severity of its consequences (Rovins, 2015).

Event tree analysis

The Event Tree Analysis (ETA) evaluates sequences of events leading to different outcomes from an initiating event, usually the event of a release of hazard. This method is a bottom-up approach. First, the initiating event is identified, followed by identification of event tree branches and the final outcomes and consequences are evaluated based on the escalation of each event tree path. If Event 2, E2 is an event succeeding Event 1, E1 on an event tree path, the probability, P(E2) of Event 2 occurring can be expressed in Eq. 1, where P(E2|E1) is the conditional probability of E2 occuring given that E1 has already occurred:

$$P(E_{2} ) = P(E_{1} )P(E_{1} |E_{1} ).$$
(1)

The probability of one outcome j of an event tree with n branches is the product of the probabilities of each branch of its event tree path leading to the outcome:

$$P(E_{if} ) = P(E_{j1} )P(E_{j2} |E_{ji} )P(E_{j3} |E_{j2} ).......\times P(E_{jn} |E_{j(n - 1)} ).$$
(2)

Hermansyah’s demonstration of ETA of the escalation of gas leakage in buildings identified the gas leak as the initial event and four escalation events as the branches, e.g. ignition, delayed, ignition, fire escalation, and evacuation leading to nine possible outcomes, as shown in Fig. 9. Two possible paths were evaluated for each of the four event tree branches, e.g. whether ignition occurred or did not occur.

Fig. 9
figure 9

Example of event tree analysis for gas leakage to fire escalation (Hermansyah et al., 2018)

The probabilities of each sequence in the Event Tree cannot be calculated with absolute certainty, as each failure event in each system is unique to its own conditions, thus probabilities used are often based on failure models with assumptions or statistics with limited sample sizes. The probabilities here serve to guide the safety reviewer to pinpoint areas for mitigation improvement rather than as an absolute reference. Often, only binary states are considered at mitigation stages e.g. detection success or failure, thereby ignoring the case of late detection possibly leading to a different sequence of events (Aitugan & Li, 2020).

Fault tree analysis

The Fault Tree Analysis (FTA) Method is a top-down approach to assessing the contributing factors leading to a failure event. This method is typically used in the risk assessment of complex systems in high-severity risk fields such as the aerospace industry and nuclear power plants. In this method, a single undesired outcome (top event) is first identified, then traced back to lower-level causal factors or failures that led to this outcome via Boolean AND and OR logic operators (Aitugan & Li, 2020). Basic events are often human failures, hardware, or software subsystem failures. Barrerre et al. modelled the failure of a fire protection system using FTA to sensor failures, communication failures, and external cyber-attacks, as shown in Fig. 10 (Barrere & Hankin, 2020).

Fig. 10
figure 10

Example of fault tree diagram of fire protection system failure. (Barrere & Hankin, 2020)

The FTA is systematic in the use of logic operators and flexible in allowing the safety engineer to define the number of levels and detail for each individual branch as needed. An undesired outcome is identified as the top event of the FTA is a failure of a complex system. Thus, the analysis does not extend to the resulting accident and the consequential damage extent caused by this system failure. Like the ETA, this method also allows for probabilistic estimation for the failure event using probabilities of individual component failure and assuming failure probabilities of each subsystem are mutually exclusive. By considering the equations for AND and OR gates, the minimal cut set (MCS) of basic level events or failures are identified, and their probabilities are calculated. The quality of analysis on the FTA method is highly dependent on the knowledge of the analyst on possible failure modes that may otherwise be overlooked (Choo & Go, 2022).

Failure modes and effects analysis

The Failure Modes and Effects Analysis (FMEA) method is an analysis tool that assesses failure of components or processes in a system and identifies failure causes and consequences. This analysis is commonly practiced in the aeronautical, automotive, and chemical and process industry. A semi-quantitative analysis is performed by assigning ratings to likelihood of failure occurrence (OCC), detectability of failure mode (DET) and severity of consequence (SEV), on scale of 0–10. Depending on the industry, the scales of OCC, DET, SEV can be attributed to the corresponding processes. For example, for a production line in a manufacturing plant an OCC score of 0 can be defined as one stoppage occurrence of under 30 min in a month, and a score of 10 can correspond to one stoppage occurrence of over 2 h in 1 week, depending on the reviewer’s knowledge of the processes or system being assessed (Aitugan & Li, 2020).

A Risk Priority Number (RPN) is a calculated risk score for each failure mode described by Eq. 3. The RPN is calculated on the FMEA form and high RPN scores exceeding an acceptable range are evaluated to how the contributing OCC, DET or SEV score can be reduced by modifications of detection and prevention measures. Then, the new RPN score is calculated (American Society for Quality & “American Society for Quality”, 2022):

$$RPN = (OCC)(DET)(SEV).$$
(3)

The standard FMEA table form is easy for the safety reviewer to use, and the quantitative aspect is easy to understand (0–10 ratings are more intuitive than probabilities or 10–6 per year). However, this method is not suitable to deeply investigate the causes of the failures on a low level to develop prevention measures. FMEA is suitable to briefly examine possible failure points of a large system and identify areas for improvement. Further detailed failure analysis can then be extended using Fault Tree analysis.

Hazards and operability

The Hazards and Operability (HAZOP) Analysis is an efficient way to quickly identify possible hazards that by analysing each piece of equipment across a facility, originally developed for the chemical industry. HAZOP analysis is done via brainstorming by a team. The process first draws out the overall design of the system i.e. the machinery and their designated function. Then, possible process deviations or abnormal conditions for each machinery are brainstormed and the resulting hazards are identified. Suitable preventive and mitigation measures are then considered (Aitugan & Li, 2020). It is an effective risk assessment option to identify unforeseen hazards that may arise due to abnormal conditions in operation. At its basic form of application, it is a purely qualitative method. However, HAZOP analyses are often supplemented with other quantitative methods, such as the Fault Tree Analysis method or simple risk ratings. The purpose of having a quantitative element is to help the risk assessment team prioritize mitigation actions (Fuentes-Bargues et al., 2017).

Systems theoretic process analysis

System-theoretic accident model and process (STAMP) is a method that views complex socio-technical systems as a multi-level structure of physical components, engineering activities, organizational hierarchies, and operational instructions. The interactions between the components are modelled as control loops, with multiple control loops within in a large STAMP model. A typical control loop concept around a controlled process follows a signal detected from a sensor sent to a controller. The controller processes the signal and sends a command to an actuator, to perform a control action. The control action affects a corrective change to the controlled process. Figure 11 shows a basic control loop. Within a STAMP system, a large combination of control loops forms a safety network, where the controlled processes are maintained in a safe state via control actions (Rosewater & Williams, 2015).

Fig. 11
figure 11

Basic STAMP control loop structure (Rosewater & Williams, 2015)

Systems Theoretic Process Analysis views hazardous states as a result of unsafe control actions (UCAs). It is a top-down approach, beginning with the identification of hazards or system losses. The STAMP model of the system is generated, the safety constraints are identified, then the causes and effects of UCAs within certain control loops are evaluated. Unsafe control actions are actions that put the safety of a process or system at risk. These actions violate established standard operating procedures and safety protocols which can eventually result in serious dangers or threats to equipment, individuals, or the surroundings.

STPA is a purely qualitative method, with no probabilistic assessment or risk rating aspect to compare risk likelihoods or severities to help the assessor prioritize points of improvement in the system. This method allows failures in very complex systems to be analyzed from a viewpoint of system functions, before tracing it down to lower-level components. The STAMP model also considers human factors and hierarchical organizational structures of complex systems, thereby identifying related causal factors which may otherwise be overlooked by other risk assessment methods previously discussed (Leveson et al., 2018). To cover as many risks as possible, the level of detail and accuracy of the STAMP model is critical. Hence, extensive expert knowledge is required to generate a substantial STAMP model.

Layers of protection analysis

The Layers of Protection Analysis (LOPA) is a semi quantitative technique often used in chemical process industry which allows safety reviewers to assess safeguards between hazardous events and consequences. In LOPA, these safeguards are termed independent protection layers (IPL), which are expected to perform or fail independently of the conditions of the initial event or other IPLs. The LOPA method has been referenced in documents from the Centre of Chemical Process Safety (CCPS), International Electrotechnical Commission (IEC), International Society of Automation (ISA) and Institute of Electrical and Electronics Engineers (IEEE), with suggested failure rates for various types of components and subsystems (Willey, 2014).

The safeguards can be classified into different layers such as inherent safe designs, critical alarms, system automatic response, physical protection barriers and emergency response. An initiating event is identified, that leads to severe outcomes upon the failure of its IPLs. The performance of the IPLs is defined by the probability of failure on demand (PFD), that is the probability that the safety system will fail to operate when required. The frequency of a consequence, fi for scenario i with initial event frequency.

fi0 [per year] and n number of IPLs are described in Eq. 4 (Willey, 2014). Tolerable risk for fi ranges are often set around 10-4 to 10-6 occurrences per year:

$$f_{i} = f_{10} \times PFD_{i1} \times PFD_{i2} \times PFD_{i3}......\times PFD_{in} .$$
(4)

Landucci et al. quantified the risk reduction effect of safety barriers in accident consequences on industrial facilities on vessel leak events and fire escalation, introducing another parameter to the performance of IPLs, effectiveness. Effectiveness describes the probability of success of an IPL in mitigating the escalation scenario, given that it has been successfully activated (Landucci et al., 2017). Misuri et al. assessed the probability of accident outcomes of an industrial facility in Natech events using LOPA-based event tree and fault tree analysis. Worst case outcome frequencies, where all safety barriers failed to activate were calculated to be in the magnitudes of 10–8 to 10–11 per year. Results from both quantified safety risk as individual probability of fatality and risk of multiple fatalities, mapping out safety distance ranges largely dependent on the layout of the facility (Misuri et al., 2021).

Research gaps and reviewed work

A range of literature topics were examined as background for this work. In ''Battery energy storage technologies'' , ''Safety Hazards'' covering battery storage technologies, battery safety hazards and design requirements, failure behaviours were reviewed, from academic journal articles, official safety standards and industrial report. In ''Safety and Risk Assessment'' section risk assessment methods and publications were reviewed from risk assessment handbooks and academic journal articles. Web sources from ASEAN disaster information network, EU Emergency Response Coordination Center, Malaysia National News Agency, and UN Office for Coordination of Humanitarian Affairs were for major floods and landslides history in ''Safety Hazards'' section. The literature review topics are summarized in Tables 5 and 6.

Table 5 History of floods and landslides in Malaysia
Table 6 Summary of reviewed literature topics and assessment parameters

Safety Risk assessments of Li-ion battery safety studies insufficiently analyse failure mechanisms, where correct actions performed as designed, while the system is under unforeseen conditions lead to hazardous states. Due to complexity of the systems such as an LSS Plant with BESS, it can be difficult to predict all possible hazardous system states. Accident reports often reveal system actions performed in the wrong conditions leading to accidents with severe consequences. For example, the Arizona Public Service BESS explosion of 2019 was caused by the action of the HAZMAT team opening the BESS door, introducing fresh air to the combustible air mixture that was inside the BESS space. The action was intended to vent the gas mixture that had built up in the BESS room, but because the gas concentrations had built up over three hours, the act of opening the BESS door caused the escalation of hazardous event (McKinnon et al., 2020). The discrepancy between the safety risk assessment case studies and accident reports highlight that the risk assessment methods failed to facilitate identification of such system states and potential risk of response actions that would otherwise be safe.

There is a lack of quantitative risk analysis models for the safety risk assessment of energy storage systems. Example of Vulnerability and fragility models for the petroleum facility describe escalation thresholds of hazardous states or safety distances based on thresholds in pressure, heat release rate, and radiation intensity (Alileche & Cozzani, 2015). Various studies on BESS fires, thermal runaway performance and explosion pressures present quality data on BESS failure performance. However, there is no consolidation of the available data to develop a fragility model to analyse the safety risk for BESS, that is scalable to most BESS of the same technologies.

Failure modes and causes identified by FMEA, STPA case studies often highlight failures of individual components and ignore failures caused by interactions of subsystems. The analysis of component failures are tied to safety risk assessment and solutions for improvement focus on system components (Baschel & Roy, 2018; Choo & Go, 2022; Wang et al., 2019). While these improvements reduce the likelihood of hazard release, reliability of essential safety subsystems such as detection systems, fire suppression and emergency ventilation are often not considered. They are often suggested as the ‘solutions’ and not further assessed in detail despite being essential in mitigating severe consequences of hazardous events.

Quantitative assessment methods (probabilistic ETA and FTA, FMEA) and qualitative assessment methods (systemic analysis, HAZOP) risk assessment frameworks do not complement each other to identify effective prevention and mitigation measures. ETA and FTA methods can highlight weak points in a system but do not provide a framework for evaluating improvements. STPA and HAZOP methods can produce long lists of failure causes and safeguards but can be redundant and unfocused in its exhaustivity. FMEA provides a good balance for quantitative risk rating and failure mode, causes and effects analysis but can overlook certain failure causes that require deeper analysis.

Methodology

This section ion explains the steps of the proposed risk assessment methodology in its relations to the Event Tree and STPA methods discussed in ''Literature Review'' section. A case study for Malaysian LSS Plant site selection to incorporate BESS is performed to validate the quantification of severe damage frequency. For further referencing in this work, the proposed methodology is called event-centric systemic analysis (EcS) method. The EcS risk assessment method adopts assessment of safety barrier failures in both accident analysis (ETA-based) and systemic-based assessment (STPA-based) to identify more causal scenarios and mitigation measures against severe damage accidents overlooked by conventional ETA, STPA and STPA-H method. Safety barrier failure rates and consequences in event tree-based analysis is used to compute frequencies of severe damage scenarios of BESS in LSS plant. Through inclusion of safety barriers as part of the overall STPA control structure, the STPA-based analysis can be applied to investigate the failure of pre-existing mitigation measures by viewing them as unsafe control actions.

Development of stages of EcS assessment model

Figure 12 shows the flow diagram of the proposed risk assessment method. Steps are labelled 1–12 for references made in the following section ions. Steps 7–9 and 10–12 can be performed simultaneously.

Fig. 12
figure 12

Methodology flow diagram of EcS Method

Steps 1–3 Hazards and safety barriers are identified. These details are available from literature of battery energy safety articles, or NFPA855 and IEC62933 safety standards for varieties of battery energy storage technologies listed in ''Literature Review'' section. The STPA control structure of the grid-connected PV system with BESS is adapted from Rosewater et al., IEC62933 and SANDIA National Laboratories, and modified on project-to-project basis.

Steps 4–9 The primary event of the Event Tree is identified, usually the release of a certain hazard, where unmitigated outcomes lead to severe consequences. For example, start of external fire in the BESS room or uncontrolled toxic gas release. Probabilities of safety barrier failure on demand are listed and used to compute event tree outcomes and the frequency (per year) of the ETA primary event, explained and demonstrated in the following section. The final outcomes of the event tree are calculated, and the frequencies (per year) are evaluated.

Steps 10–12 The STPA control actions are identified based on the control diagram produced earlier in Step 2. This is followed by an assessment of unsafe control actions and corresponding mitigation measures. Mitigation measures can be in the form of additional safety constraints or improved safety design.

Probabilistic event tree analysis

In this approach, the initiating event is described as an event of release of hazard i.e. release of toxic gas, thermal runaway, or an external fire not initiated by a battery unit. The frequency of occurrence of an initiating event can be obtained via historical data and failure rates of failure modes of the battery systems leading to the initial event. The Institute of Electrical and Electronics Engineers (IEEE) and Centre for Chemical Process Safety (CCPS) have specified estimated frequencies of component failures covering frequencies of hazard release events from electrical component failures, mechanical impacts, internal short circuits, overcharging, etc. for electrical power systems.

The release of hazard of the ETA initial event is conceptualized as occurrence of the initial failure and the subsequent failure of prevention barriers e.g. BMS voltage–current control, cooling, shutdown and circuit breakers. Failure modes are considered on 3 levels, as described in Table 7, where a single failure affects one battery rack, one BESS unit or all BESS units. The frequency of flood occurrence as an initiating failure mode is calculated using Monte Carlo simulation of the reported history of natural hazard events of the specified region. In the case study of Malaysia, natural hazard events concerned are flood and landslide events.

Table 7 System level of failure

The frequency of ETA initial event from one failure mode can be described by the following equation (Willey, 2014):

$$f_{m} = (f_{0,m} )\times PFD_{m1} \times PFD_{m2}\times....\times PFD_{mk}$$
(5)

where fm is the frequency of mth base failure mode with k number of prevention barriers to the ETA initiating event and (f0,m) is the frequency of the base failure mode. The frequency of an ETA initiating event, f is then the minimal cut set of n initiating failure modes and subsequent failures of prevention measures leading to it, described by the following equation:

$$f_{a} = \sum\nolimits_{m = 1}^{n} {[f_{m} ]} .$$
(6)

The frequency of initial event on the ETA, f(E0) is then obtained by the sum of minimal cut sets on battery rack level, BESS unit level and global level, described by Eq. 7, where Na and fa are number of units of system level components and frequencies of failures, as described in Table 7:

$$f(E_{0} ) = N_{2} [(N_{1}\times f_{1} ) + f_{2} ] + f_{3} .$$
(7)

The safety barriers identified for the BESS safety analysis are listed in Tables 8, 9, and 10, using failure rates by IEEE and CCPS, based on systems components and softwares. These PFD values are used to compute initiating event frequencies of the event trees or as safety barriers of the event tree to compute the outcomes of event trees. The safety barriers are classified as detection types, passive barriers, active barriers and emergency response barriers. Detection types cover BMS temperature, voltage, current monitoring functions and smoke and gas detectors in the BESS room, where conditions outside acceptable operational limits produce alerts to operators in the control room. Passive barrier types are safety designs that do not require activation or triggering from a detection system i.e. thermal insulation design to prevent thermal spread among battery modules. Active barriers such as the cooling system, fire suppression and ventilation are safety functions dependent on the alert of a detection system. They can be activated manually by operators in the control room or automatically triggered. In context of fire mitigation, the cooling and ventilation are expected to be working as the BESS is in operation, their rates are increased when sudden temperature or vented-gas concentration is detected by the detection barriers.

Table 8 Probability of failure on demand of safety systems (Gully, 2019)
Table 9 Conditional probability formula of each safety barrier (Landucci et al., 2017)
Table 10 Hazards for STPA (Leveson et al., 2018)

The probabilistic event tree is used to evaluate the probability of consequences for thermal runaway starting in one cell and its subsequent propagation to adjacent cells, and modules called cascading thermal runaway event, and escalation to fire or explosion event. The branches of the event tree are constructed based on Misuri and Landucci’s domino effect model of safety barrier performance on escalation scenarios (International Electrotechnical Commission, 2020; NFPA, 2022). Safety barriers are viewed as layers of protection against hazard escalation. For example, early smoke detection and active fire suppression are safety barriers against an internal battery fire spreading to multiple racks.

Applying the Layers of Protection Analysis (LOPA) approach, safety barrier performance can be described by probability of failure on demand, PFD and effectiveness, η. The PFD of a safety barrier describes the conditional probability of failure to activate when it is required. The effectiveness describes its effect on mitigation given the safety barrier is successfully activated (Misuri et al., 2021; Willey, 2014).

For safety barriers considering only probability of failure on demand, two possible event paths considered are the success and failure to activate on demand. Once activated, the safety barrier is assumed to be fully effective in mitigating the escalation of the hazard scenario. For safety barriers described by probability of failure on demand and effectiveness parameter, three outcomes are considered, where the safety barrier failed to activate on demand, activated but not effective in mitigation of escalation of hazard scenario and activated and effective in mitigation.

The conditional probability of each final ETA outcome P(Ei) given the initiating event, with n levels of safety barriers considered, where \(P\left({E}_{i}^{n}\right)\) is the probability of the outcome at each ith safety barrier is described by the following equation (Misuri et al., 2021):

$${\text{P(Ei) = }}\Pi_{i = 1}^{n} P(E_{i}^{n} ).$$
(8)

The frequency of occurrence is described by Eq. 9, where fij is the frequency (per year) of a specified ETA outcome Ei given an initiating event j with occurrence frequency f(Ej) per year. fij is computed for all outcomes for each LSSPV site according to the number of battery units present (Misuri et al., 2021):

$${\text{fij = f(Ej) P(Ei)}}{.}$$
(9)

A demonstration of the event tree is considered for the initiating event of a thermal runaway induced fire in one battery rack, based on Cozzani’s model Event Tree sequences of industrial accident events and Misuri’s demonstration of safety barrier performance assessment using event tree (Cozzani et al., 2010; Misuri et al., 2021). The event tree can be used to analyse events such as external battery fire (fire in BESS space not directly caused by battery cells), toxic chemical release, exposure of reactive chemical to air and their consequences. For example, an initiating event of toxic chemical release can lead to consequences of water contamination, soil contamination and toxic gas dispersion, analysable with the event tree. The initiating event of thermal runaway-induced fire is chosen as it is most commonly cited as the scenario leading to prolonged battery fires and explosion events in high-profile, BESS accidents with severe outcomes.

The safety barriers between the event of fire and catastrophic event identified are the detection system (FD), the automated fire suppression (F1) and emergency fire response (F2), as shown in Fig. 13. Success of each stage of mitigation leads to reduced severity of final consequence i.e. damage to BESS and fire hazard level. Here, fire hazard level represents the risk to the firefighters on site. The effectiveness, η of the Active Fire Suppression considered is 0.953 (Landucci et al., 2017). Therefore, three outcomes are considered for active fire suppression gate. The outcomes considered are labelled 1.1 to 1.7 and their probabilities and frequencies are evaluated in the Results section.

Fig. 13
figure 13

Battery rack fire event tree

STPA-based analysis

The benefit of STPA to apply in this methodology is to identify causal factors of UCAs by considering the LSSPV system, from its main components (PV modules, inverters, Battery units) up to organizational structures (on-site operators, Fire Department, LSSPV owner). First, the system level hazards are defined. As a validity check, Leveson discussed that to keep hazards analysis on a system level, identification of any specific system components should be avoided, and hazard count usually kept under 10 (Leveson et al., 2018).

STPA control structure (step 3)

The control structure is constructed considering based on the Malaysian organizational structure of LSSPV and BESS management, where the Energy Commission governs the scheme for LSSPV and BESS grid-operation, whereas the Department of Standards are responsible for safety standards to protect the equipment and workers in the vicinity of the equipment. Arrows between system elements represent communication of information or commands (control actions) between component elements. The control diagram used for this STPA analysis is shown in Fig. 14, adapted from Choo and Rosewater’s STPA analyses of Grid connected Li-ion Batteries (Choo & Go, 2022; Rosewater et al., 2020).

Fig. 14
figure 14

STPA control diagram of grid connected LSSPV with BESS (Choo & Go, 2022) (Rosewater et al., 2020)

Control actions between component elements are identified. Typically control actions are characterized as commands from a controller type element of higher authority in the control structure, to a lower-level component or subsystem. Here, feedback (e.g. battery module temperature, voltage, current, etc.) and commands (e.g. alarm activation, increased cooling rate or physical actions) are considered as control actions. As Leveson explains, that mischaracterizing feedback and control actions will result in the same causal factors identified in latter steps of STPA. Based on each control action, unsafe control actions (UCAs) are then identified by considering how a purposeful control action not provided, provided, provided too late or too early, or stopped too early or too late may lead to a system hazard state.

Results and discussion

A case study on two LSS sites in Malaysia was used to validate the EcS quantification of frequencies of severe damage per year via Event tree-based analysis. The Energy Commission of Malaysia promotes development of large-scale solar PV plants through its competitive bidding programme. Projects on the current bidding cycle, Cycle 4 are expected to be commissioned between 2022 and 2023. The EC offers two packages based on LSS PV capacity range with their own Power Purchase Agreement Pricing (Commission, 2022).

For the case study of this work, one site from LSSPV P1 Package and one site from LSSPV P2 Package has been chosen for quantitative risk assessment. Referring to Table 11, Site 5 of 13.0 MW capacity in the state of Selangor and Site 9 of 50.0 MW capacity in the state of Perak are considered, labelled site A and site B in Table 12. Based on research carried out by Laajimi et al. (Mahmoud Laajimi, 2021), the total battery storage capacity for each site configuration was calculated using the annually averaged ratio of storage energy output to the energy output from the solar farm. PV sizing is done via 550 W monocrystalline PV modules. For the 13.0 MW capacity site A, 2.0 MVA central inverters units and 2.510 MWh Li-ion NMC BESS units are deployed and for site B, 4.2 MVA central inverters and 4.18 MWh Li-ion NMC BESS units are deployed (Electric, 2018; Siemens & Flyer, 2020; Solar & “Hi-MO5”, 1011, 2021). The configurations are verified in PVSyst to ensure no oversizing or undersizing of PV array and inverters.

Table 11 List of approved bidders for LSS cycle 4 (Commission and “LSSPV Bidding Cycle 4 (LSS@MEnTARI)”, 2022)
Table 12 Component sizing and quantities for LSSPV sites

Event tree analysis and probabilistic assessment

BESS sizing, units and racks quantity

Two configurations for site A, A1–A2 and five configurations for site B, B1–B5 are assessed for the probabilistic event tree analysis, as shown in Table 12. Varying A value from 20% to 60%, the Kuala Selangor site installed BESS capacity required corresponds to 5–10 MWh. For 20–60% A value in Batang Padang site, installed battery storage capacity corresponds to 16–48 MWh. Therefore, Site A will have 2–4 units of the 2510 kWh BESS, housing 12 racks per BESS unit. Site B will have 4–11 units of the 4184 kWh BESS, with 20 racks per unit (Electric, 2018). In total, site A houses 24–48 total battery racks, and site B houses 80–220 racks. The number of battery storage units and total battery racks are used in the evaluation of event tree outcomes.

Event tree outcome evaluation

The probability of outcomes the Battery Rack Fire Event Tree in Fig. 13 is presented in Table 13. Outcomes of safety barriers FD, F1 and F2 are labelled based on Table 9 outcome values 0, 1, 2 or “X” denoting success or failures in mitigation. Outcome Probabilities, P(E) are conditional probabilities of each event tree outcome or path in the event of the initiating event i.e. battery rack fire. Using this analysis, the probability of successful early fire suppression expressed by Outcome 1.1 is 0.8491. This is the ideal situation, where fire detection and active fire suppression system are successful, and no emergency response is required. The worst-case scenario in consideration is Outcome 1.7, where fire detection system fails to produce an alert, the fire suppression system is not activated, and the emergency responders (Fire Team) fail to contain the fire. This worst-case scenario is expected to occur at probability of 0.001 in the event of a battery rack fire. Another severe outcome scenario is Outcome 1.5, where fire detection and alert is successful, but fire suppression fails to activate and emergency responder actions fail, with probability of 0.0099.

Table 13 Table of event tree outcome probabilities

Given the frequency of initiating event of site configurations A1–B5, calculated based on Eq. 7, the frequencies of each outcome of the event tree are tabulated in Table 14. For the worst outcome which is outcome 1.7 in Table 13, site A1 with the least number of battery units and racks resulted in a frequency of occurrence of 2.173 × 10–7 per year. For site B1, which has the most number of battery units and racks, it had a frequency of occurrence of 2.5753 × 10–6 per year. The frequencies of damage levels and BESS damage levels are consolidated in Table 15. Frequency of multiple battery rack damage for sites A1 to A2 ranges from 2.222 × 10–5 to 4.028 × 10–5, whereas for site B1 to B5 ranges from 9.802 × 10–5 to 2.398 × 10–4, due to higher number of total battery racks and BESS units. For the same A values, site A has lower risk of severe damage to BESS from thermal runaway-induced fire by 3.3 to 4.5 times compared to site B. Evaluating from the event tree paths, failures of either the fire detection system or the active fire suppression system leads to unmitigated fire spread inside the BESS room. Targeted mitigation measures should be assessed to reduce the failure rates of these two systems to reduce the risk of severe damage in the event of battery rack fire, demonstrated in ''STPA Results'' section.

Table 14 Table of outcome frequencies (per year) by site
Table 15 Frequencies of damage

STPA results

A list of unsafe control actions is described in Table 16, presenting UCAs focused on the failure modes of the active safety systems i.e. active cooling, active fire suppression and active ventilation and emergency responder actions. UCA types are categorized as “not provided”, “provided”, “provided too early or late” and “stopped too early or late”, all UCA types lead to hazardous states or escalation. The full list is available in supplementary material.

Table 16 Summarized unsafe control actions list

UCAs identified cover failures of the hazard detection systems i.e. BMS temperature, voltage, current monitoring systems and the smoke and gas detection systems of the BESS. Active safety systems are hazard prevention or mitigation systems that require a detection trigger, e.g. for the ventilation system, the ventilation rate is increased once the BESS gas detection sensors detect a quick increase of concentration of flammable gases. UCAs on the failure of active safety systems e.g. fire suppression activation not provided, provided incorrectly, provided late or stopped early lead to similar outcomes i.e. unmitigated fire spread. The effects and causes of these UCAs are generally foreseeable with basic knowledge of safety. However, UCAs where control actions are provided as designed, and lead to hazard escalation provide assessments of abnormal conditions of the system which require different action plans. For example, in the event of combustible gas mix build up in the BESS enclosure, opening the door of the BESS room in attempt to vent the gas would introduce fresh air, increasing the flammability of the gas mixture thereby increasing the risk of instantaneous explosion (McKinnon et al., 2020). UCAs of regulation actions pertaining to safety training and BESS site acceptance test requirements are also considered. Following this step, causal factors and corresponding mitigation measures are suggested.

Causes and mitigation measures

Based on the full list of unsafe control actions, the causal scenarios are assessed and mitigation measures are identified accordingly. Multiple causal scenarios are found to be redundant for different categories of control actions e.g. gas detection system and smoke detection system UCAs are found to have overlapping causal factors, thus grouped together in Table 17. Gas detection and smoke detection systems are grouped together. BMS monitoring sensor systems (voltage, temperature, current monitoring circuits) are grouped together.

Table 17 List of STPA causes and suggested mitigation measures

For safety sensors and alerts, mitigation measures include strategic placement of gas concentration sensors at different height levels in the BESS room, to ensure detectability of gases lighter than air, heavier than air or stratified by coolant compound. Faults in the sensor circuits should also produce alerts to operators in the control room. Failure causes of the active safety systems can be mechanical failure of fans or pumps. Among mitigation measures identified is for the HVAC coolant material to be detectable by gas sensors in case of leakage. For the emergency ventilation system, positive pressure system is suggested, to pump chemically inert gas into the BESS space to displace a toxic or combustible gas mixture safely. Tables 17 and 18 cover causal scenarios and mitigation measures suggested for the safety sensors, BMS monitoring systems, active safety systems, designs to assist the fire and rescue team, and institutional-level measures such as having clear numerical design requirements for the active safety systems for BESS set by the local authority.

Table 18 Comparison of proposed EcS assessment parameters against reviewed risk assessment methods and recently developed STPA‐H method (Choo & Go, 2022)

Risk assessment evaluation

The risk assessment methods reviewed in ''Safety and Risk Assessment'' section adopt different assessment parameters to fit different purposes of assessment e.g. to analyse minimal cut sets of conditions for failure, to evaluate hazard escalation sequences and consequences of failures or to improve detectability and preventability of failures. The combinations of parameters facilitate the intended focus and purpose of the assessment and its results. The parameters of the proposed EcS method are compared against the methods reviewed in ''Safety and Risk Assessment'' section and Choo’s Holistic-STPA method (Choo & Go, 2022). The parameter types are categorized by system definition parameters such as system constraints and control actions. Accident analysis parameters cover contributing causes and consequences of undesired system failure events. Corrective action parameters describe preventive measures, mitigation measures incorporated into system design and actions taken at an organizational level or emergency response level. Parameters for quantitative risk analysis include risk ranking, component or safety barrier failure probabilities, and damage severity.

Traditional applications of chain of events model, namely ETA, adopt a direct, linear and exclusive view on the causality and progression of events. This approach may overlook failures caused by the interactions between system components under abnormal system states. Based on this understanding, the EcS method proposed further analyses the indirect interactions of the LSSPV + BESS systems and components leading to hazardous states. The qualitative findings of the EcS method includes causal scenarios and mitigation measures derived from possible failures contributed by these indirect interactions. For example, the fire suppression system failure and effectiveness, and failure consequence analysis is evaluated in ''STPA-based Analysis'' section and its indirect failure causes and mitigations are assessed in ''STPA Results'' section. The initiating event analysis also incorporates various contributory failure mechanisms, scalable to the component sizing of the LSS + BESS system.

Quantitative assessments for severe BESS damage due to thermal runaway induced fire found that likelihood of total BESS unit damage for 5–46 MWh Li-NMC storage systems ranged from 2.489 × 10–6 to 2.807 × 10–5 occurrence per year. This translates to risk of one worst case outcome per 35,000–400000 years. Worst case scenario unmitigated fire risk to human ranges from 2.489 × 10–5 to 2.807 × 10–4 per year. Higher capacity LSS systems incorporating more BESS units and battery racks require increased monitoring and safety barrier safeguards to lower the risk of hazardous events causing damage to the equipment. The incorporation of LOPA and Event Tree analysis provides a quantitative framework to compare risks of severe outcomes from an undesired initiating event. Mitigation measures can then be considered. For example, Outcome 1.5 of Table 13 where the fire detection system succeeds but fire suppression and emergency responder actions fail, contribute to the same severe consequences as Outcome 1.7, but Outcome 1.5 has higher probability. Mitigation measures can be targeted to reduce the likelihood of Outcome 1.5.

Conclusions

Various research of large-scale solar (Isaac & Ii, 2023; Mohanan, 2020; Rehan Khan & Yun Ii Go, 2019) had been carried out including grid integration, power management and system sizing etc. A literature review covering various energy storage (Citalingam, 2022; Faruhaan & Ii, 2021; Mahmoud, 2019; Mohammed & Go, 2021; Teo & Go, 2021) technologies and hazards had been presented in ''Literature Review''section followed by a review of risk assessment methods and case studies, outlining the advantages and limitations of each method. Industrial safety standards NFPA855 and IEC62933, BESS safety review articles, and BESS accident reports provided crucial information on identifying safety failures that were previously overlooked. The proposed risk assessment methodology was presented and demonstrated in ''Methodology'' section. The formulation of the event tree and quantitative method to evaluate frequencies of outcomes for each site based on probabilities of failures of the LSS + BESS subsystems were presented. The STAMP control structure used in the STPA was introduced, with modifications from references placing more importance on safety systems, along with principles for identifying hazards and unsafe control actions.

For large-scale solar plant with a total capacity of 13.0 MW and 50.0 MW, and A value of 20–60%, it is recommended to adopt BESS capacities that ranging from 5.0 to 10.0 MWh and 16.0–48.0 MWh, respectively. Analysis of the worst-case outcomes for fire hazard to human injuries ranged from 2.368 × 10–5 to 2.807 × 10–4 per year, and for BESS damage ranged from 2.368 × 10–6 to 2.807 × 10–5 per year. Further improvement measures were assessed qualitatively in the STPA analysis with emphasis on failures of safety barriers by indirect causes or abnormal system states. Causal factors identified covered component failures, loss of data to guide emergency response actions and inadequate information or organizational framework pertaining to BESS safety. The mitigation measures identified covered improvements to sensor coverage, emergency responder contingencies for data on BESS state and redundancy measures for safety systems components (pumps and fans).

  • 13.0 MW LSS site with 5–10 MWh Li-NMC BESS, the frequency of worst-case total BESS unit damage due to thermal runaway fire is observed to be 2.368 × 10–6 to 4.363 × 10–6 per year.

  • 50.0 MW LSS site with 16–46 MWh Li-NMC BESS, the frequency of worst-case total BESS unit damage due to thermal runaway fire is observed to be 1.037 × 10–5 to 2.800 × 10–5 per year.

  • Safety barrier failure modes analysed via STPA-based identified causal factors such as component failures, system failures and failures in organizational protocols

  • Mitigation measures analysis identified required improvements to safety design, contingencies for emergency responders and redundancy measures for safety system components

Principles of incorporating both component and systemic view, assessment of safety barrier failures and assessment of indirect causal factors in abnormal system states are necessary to develop an adequate safety framework for complex energy systems such as an LSS with BESS. Stakeholders and LSS owners are expected to benefit from reduced risk of severe equipment damage and asset loss from accident events. Emergency responders benefit from improved safety protocols and safety requirements leading to reduced risk of severe injuries or fatalities in accident events. The EcS risk assessment framework presented would benefit the Malaysian Energy Commission and Sustainable Energy Development Authority in increased adoption of battery storage systems with large-scale solar plants, contributing to IRENA 2050 energy transformation scenario targets for global temperature control and net zero carbon emissions.

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Availability of data and materials

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

LSS:

Large-scale solar

BMS:

Battery management system

MCS:

Minimal cut set

CCPS:

Centre of chemical process safety

NMC:

Nickel manganese cobalt

DET:

Detectability

NaNiCl2 :

Sodium nickel chloride

EC:

Energy commission

NaS:

Sodium sulphur

EcS:

Event-centric system analysis

NFPA:

National fire protection agency

ETA:

Event tree analysis

OCC:

Occurrence

FTA:

Fault tree analysis

SEDA:

Sustainable energy development authority

HAZMAT:

Hazardous materials

STAMP:

Systems-theoretic accident model and process

HAZOP:

Hazards and operability

SEV:

Severity

HVAC:

Heating, ventilation and air conditioning

SOC:

State of charge

IEC:

International electrotechnical commission

STAMP:

Systems-theoretic accident model and process

IREN:

International renewable energy agency

STPA:

System-theoretic process analysis

IPL:

Independent protection layers

UCA:

Unsafe control actions

LC:

Lethal concentration

VRLA:

Valve regulated lead acid

LFL:

Lower flammability limit

VRFB:

Vanadium redox flow battery

LIP:

Lithium iron phosphate

ZnB:

Zinc bromine

References

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

EA involved in data analysis, interpretation of data and manuscript writing, GYI involved in supervision, design of the work, manuscript revision.

Corresponding author

Correspondence to Yun Ii Go.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moa, E.H.Y., Go, Y.I. Large-scale energy storage system: safety and risk assessment. Sustainable Energy res. 10, 13 (2023). https://doi.org/10.1186/s40807-023-00082-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40807-023-00082-z

Keywords