Incident Date: October 3, 2025
On October 3rd 2025, Unichain Mainnet and Sepolia both experienced safe head stall. We will focus primarily on Mainnet issues, as the root cause for both networks’ incidents is the same. Mainnet experienced 4 hours and 10 minutes of safe head stall from 1:29pm ET to 5:39pm ET, and a brief unsafe head stall of 3 minutes when the safe head recovered. The state root was also delayed by 3 hours and 15 minutes, with an updated state root being posted at 5:27pm ET.
During the safe head stall, users experienced reduced throughput of ~5-7TPS, due to our batcher throttling incoming transactions as the unsafe transaction queue built up. Also, withdrawals to L1 were delayed until the proposer posted an updated state root. No reorg occurred, and users’ funds were safe at all times.
The batcher and proposer could not post L2 batches and the state root to the L1, respectively, because the transactions could not be signed. Unichain uses a third-party service to verify and sign these transactions. The third-party service was misconfigured, preventing transactions from being signed successfully.
The incident was first detected by an engineer who realized their misconfiguration would cause issues. Shortly after, we received several alerts of errors with our batcher. Three minutes after initial detection, a war room was created, and several key stakeholders joined, along with engineers from our third-party service provider, to assess the issue and drive towards mitigation.
Our third-party service providers restored the service to a state before the misconfiguration. This required a complicated process, taking multiple hours. In the meantime, Unichain engineers prepared a backup plan of rotating the batcher/proposer addresses in case the safe head stall approached 12 hours, to prevent a reorg from occurring. Once the service was recovered after ~4 hours, the safe head quickly caught up and Unichain Mainnet/Sepolia resumed normal operations.
We are implementing many guardrails to prevent this misconfiguration from happening again, and to reduce downtime in worst-case scenarios: