Storage Collision on Upgrade: The Bug That Bricks Your Proxy
Reorder one variable in your upgradeable contract and you can silently rewrite the owner slot, nuke balances, or freeze the proxy entirely. Storage collisions are the most boring-looking bug in Solidity and one of the most catastrophic. Here's exactly how the layout breaks, how `__gap` saves you…
The Bug in One Sentence
In a proxy pattern, storage lives in the proxy but the *layout* is dictated by the implementation. Change the layout, change what every slot means — including the one holding `owner`.
This is why `delegatecall`-based upgradeability (UUPS, Transparent Proxy, Beacon) is a minefield for teams that treat a v2 contract like a regular refactor. You don't get to reorder fields. You don't get to remove them. You don't get to change their types. And if you inherit from a base contract that adds a variable, every child slot shifts by one — silently.
A Concrete Collision
Here's v1 of a vault:
```solidity
contract VaultV1 {
address public owner; // slot 0
uint256 public totalDeposits; // slot 1
mapping(address => uint256) public balances; // slot 2
}
```
A developer writes v2, decides `totalDeposits` should be tracked off-chain, and removes it:
```solidity
contract VaultV2 {
address public owner; // slot 0
mapping(address => uint256) public balances; // slot 1 ← was slot 2
}
```
After upgrade, every `balances[user]` lookup hashes against slot 1 instead of slot 2. Every existing balance vanishes. The value previously at slot 1 (`totalDeposits`) is now interpreted as the *mapping base* — meaningless, but technically still occupied. Users see zero balances. Withdrawals revert or, worse, succeed against attacker-crafted slots.
Now imagine the dev added a new variable *before* `owner`:
```solidity
contract VaultV2Bad {
uint256 public version; // slot 0 ← overwrites owner
address public owner; // slot 1
// ...
}
```
`owner` is now whatever integer happened to sit in slot 0. `onlyOwner` checks compare against an effectively random address. Goodbye admin access. Goodbye contract.
The Fix: Append-Only Layouts + `__gap`
The two rules:
1. **Append-only.** New variables go at the *end* of the storage layout. Never reorder, never delete, never change types.
2. **Reserve gaps in base contracts.** Any contract intended to be inherited by an upgradeable contract must reserve slots for future variables.
```solidity
contract VaultBase {
address public owner;
uint256 public totalDeposits;
mapping(address => uint256) public balances;
// Reserve 50 slots for future variables in this base contract.
uint256[50] private __gap;
}
contract VaultV2 is VaultBase {
// Safe to add here — base layout is frozen by __gap.
uint256 public withdrawalFee;
}
```
When `VaultBase` later needs a new variable, you shrink `__gap` by one:
```solidity
contract VaultBase {
address public owner;
uint256 public totalDeposits;
mapping(address => uint256) public balances;
uint256 public pausedUntil; // new
uint256[49] private __gap; // was 50
}
```
The total slot count of `VaultBase` stays constant, so child contracts keep their offsets. This is the pattern OpenZeppelin uses in every upgradeable contract — and the reason their upgrades don't blow up.
Inheritance Order Is Layout
Linearization order determines slot order. Swapping the order of base contracts is a storage collision even if no variable changed:
```solidity
// v1
contract Token is ERC20Upgradeable, OwnableUpgradeable { }
// v2 — SAME variables, but layout is now different
contract Token is OwnableUpgradeable, ERC20Upgradeable { }
```
The linter won't yell. The compiler won't yell. The proxy will gladly accept the upgrade. And then `_balances` will read from slots that used to hold `_owner` ancestry.
Real-World Carnage
Proxy upgrade incidents requiring rollback are unfortunately routine — most don't make headlines because teams catch them in staging or revert quickly. A non-exhaustive list of public examples:
**Audius (2022)** — A governance + proxy storage misconfiguration let an attacker drain ~$6M worth of AUDIO. Storage initializer collision was central to the exploit.
- **Compound (2021)** — While not a layout collision per se, the COMP distribution bug (~$80M misallocated) was a vivid reminder that upgrades touching storage semantics need rehearsal, not optimism.
- **Multiple OpenZeppelin Defender post-mortems** describe teams pushing an implementation, watching `owner` go to `address(0)`, and scrambling to deploy a recovery implementation that re-maps the slots manually.
The pattern is always the same: someone reordered, removed, or re-inherited. The implementation deployed cleanly. The proxy pointed at it. State went sideways.
Detection Before Deployment
**Use the tools.** OpenZeppelin's Hardhat/Foundry upgrades plugin compares the storage layout JSON between versions and refuses to upgrade on mismatch. Skipping this check via `unsafeAllow` is the single most common cause of bricked proxies in production.
```bash
npx hardhat run scripts/upgrade.js # uses upgrades.upgradeProxy — validates layout
```
For Foundry, `forge inspect <Contract> storage-layout` dumps the layout. Diff v1 and v2. Any line that isn't an *append* is a red flag.
For a quick static pass on a candidate v2 before you bother with rehearsal, run it through the [free AI audit](https://www.cryptohawking.com/audit) — it'll flag layout drift, missing `__gap`, and inheritance reorders. For protocols where an upgrade error means a treasury rollback, the [manual audit](https://www.cryptohawking.com/audit/manual) ($5,000, paid in ETH/SOL/USDT, 3 business days) includes a slot-by-slot diff and a written upgrade plan.
A Pre-Upgrade Checklist
1. Diff storage layouts (`forge inspect` or OZ plugin).
2. Confirm every base contract reserves `__gap`.
3. Confirm inheritance order is byte-identical.
4. Confirm no variable changed type (e.g., `uint128` → `uint256`).
5. Run the full test suite against a *fork* of mainnet with the new implementation behind the existing proxy.
6. Dry-run the upgrade on a forked mainnet through your real Gnosis Safe / timelock.
7. Have a recovery implementation ready that can re-map slots if step 6 ever surprises you.
The Boring Truth
Storage collisions aren't clever attacks. They're self-inflicted wounds. The fix is a 50-slot array and the discipline to append. Teams that lose funds to this didn't get out-engineered — they got over-confident with a refactor. Treat every upgrade as if it's editing raw storage, because that's exactly what it is.
FAQ
Why can't I just remove an unused storage variable?
Because storage slots in a proxy aren't named — they're numbered. Removing a variable shifts every subsequent variable up by one slot. The new implementation will read and write the wrong data for everything that came after the removed field. If you truly never want to use a variable again, leave it declared (or replace it with a same-sized placeholder like `uint256 private __deprecated_totalDeposits`). The slot must remain occupied with the same type width. Renaming is fine; removing is not.
How big should `__gap` be?
OpenZeppelin's convention is 50 slots per base contract. There's nothing magic about 50 — it's just enough headroom for several future versions without bloating deployment cost meaningfully (gaps are cheap because they're never written). For a leaf contract that nothing inherits from, you don't need a gap at all. For a base contract you expect to evolve heavily, 100 is reasonable. The cost is one-time and tiny; under-reserving has bitten more teams than over-reserving ever will.
Does the Diamond pattern (EIP-2535) solve this?
It changes the failure mode but doesn't eliminate it. Diamond Storage uses struct-based namespaced storage at deterministic slots (keccak hashes of unique strings), which avoids the inheritance-linearization problem entirely. But within each diamond storage struct, you still can't reorder or remove fields — only append. You also gain new footguns: two facets defining storage at the same namespace will collide silently. The discipline shifts from inheritance order to namespace hygiene, not to nothing.
What if I already shipped a bad upgrade?
If admin keys still work, deploy a recovery implementation that explicitly reads from the old slot positions using inline assembly (`sload`), reconstructs the correct state, and writes it back to the intended slots. Then upgrade to a properly-laid-out v3. If `owner` itself got clobbered, you may be locked out — at which point your only options are social recovery (if you have a multisig/timelock that still points correctly), a coordinated migration to a new proxy, or a chain-level fork (don't count on this). Prevention is dramatically cheaper.
Are immutable and constant variables affected?
No. `immutable` and `constant` variables are baked into the contract bytecode at deploy time, not stored in storage slots. You can add, remove, or reorder them freely across upgrades without breaking layout. This is actually a useful trick: values that don't truly need to change (protocol fee recipients during migrations, version numbers, config flags) can live as immutables in the implementation and be updated by deploying a new implementation, without consuming any proxy storage at all.