#### **Microarchitectural Side Channels**

BJÖRN RUYTENBERG EINDHOVEN UNIVERSITY OF TECHNOLOGY

SPECIAL THANKS TO YUVAL YAROM, THE UNIVERSITY OF ADELAIDE AND DATA61 FOR PROVIDING SUPPORT AND SELECTED CONTENT



#### Roadmap

**Introduction to Side Channels Microarchitectural Basics** What is Meltdown? What is Spectre? **Exploitation Scenarios** Mitigations **Attack Variants** 







































• Unintentional, usually covert communication channel leaking potentially sensitive information

#### Various types

- Power analysis
- EM radiation
- Sound/light
- Timing





#### Microarchitectural Side Channels Part 1: Meltdown





#### Meltdown – Basic Outline

- Design flaw that affects most modern Intel CPUs (and some ARMs)
- Uses out-of-order execution to leak data through cache timing attack
- From an unprivileged process, an attacker can:
  - Bypass language-based security
  - Bypass sandboxes, containers/paravirtualization hypervisors
  - Read arbitrary memory, including kernel memory







- Fast processor but slower memory
- Cache utilizes locality to bridge the gap
  - Divides memory into *lines*
  - Stores recently used lines









#### **Instruction Pipelining**

- Nominally, the processor executes instructions one after the other
- Instruction execution consists of multiple steps
  - Each uses a different unit

| Instruction | Instruction | Argumont Eatch | Exocuto | Mrito Back  |
|-------------|-------------|----------------|---------|-------------|
| Fetch       | Decode      | Algument retur | Execute | VVIILE DACK |

| mulq   | \$m0           |
|--------|----------------|
| add    | %rax,\$A[0]    |
| mov    |                |
| 8*2(\$ | Snp),%rax      |
| lea    | 32(\$tp),\$tp  |
| adc    | \\$0,%rdx      |
| mov    | %rdx,\$A[1]    |
| mulq   | \$m1           |
| add    | %rax,\$N[0]    |
| mov    |                |
| 8(\$a, | \$j),%rax      |
| adc    | \\$0,%rdx      |
| add    | \$A[0],\$N[0]  |
| adc    | \\$0,%rdx      |
| mov    | \$N[0],-       |
| 24(\$t | zp)            |
| mov    | %rdx,\$N[1]    |
| mulq   | \$m0           |
| add    | %rax,\$A[1]    |
| mov    |                |
| 8*1(\$ | Snp),%rax      |
| adc    | \\$0,%rdx      |
| mov    | %rdx,\$A[0]    |
| mulq   | \$m1           |
| add    | %rax,\$N[1]    |
| mov    | (\$a,\$j),%rax |
| mov    |                |
| 8(\$a, | \$j),%rax      |
| adc    | \\$0,%rdx      |



### **Instruction Pipelining**

- Nominally, the processor executes instructions one after the other
- Instruction execution consists of multiple steps
  - Each uses a different unit
- Pipelining increases utilization by executing steps of multiple instructions

| Instruction | Instruction | Argument Fetch    | Execute | Write Back  | C |
|-------------|-------------|-------------------|---------|-------------|---|
| Fetch       | Decode      | / inguinent reten | LACCULC | White Back  |   |
| Instruction | Instruction | Argumont Eatch    | Execute | Mrito Back  | Ь |
| Fetch       | Decode      | Algument Fetch    | Execute |             |   |
| Instruction | Instruction | Argument Eetch    | Exocuto | Write Back  |   |
| Fetch       | Decode      | Argument retch    | LXECULE |             |   |
| Instruction | Instruction | Argument Eetch    | Execute | Mrito Back  |   |
| Fetch       | Decode      | Algument Fetch    | Execute | VVIILE DACK |   |
| Instruction | Instruction | Argumont Eatch    | Exocuto | Write Back  |   |
| Fetch       | Decode      | Aigument Fetch    | Execute | WITLE DALK  |   |
|             |             |                   |         |             |   |

|    | mulq   | \$m0                   |
|----|--------|------------------------|
|    | add    | %rax <b>,</b> \$A[0]   |
|    | mov    |                        |
|    | 8*2(\$ | \$np),%rax             |
|    | lea    | 32(\$tp),\$tp          |
|    | adc    | \\$0,%rdx              |
|    | mov    | <pre>%rdx,\$A[1]</pre> |
|    | mulq   | \$mladd                |
|    | %rax,  | ,\$N[0]                |
|    | mov    |                        |
|    | 8(\$a, | ,\$j) <b>,</b> %rax    |
|    | adc    | \\$0,%rdx              |
|    | add    | \$A[0],\$N[0]          |
|    | adc    | \\$0,%rdx              |
|    | mov    | \$N[0],-               |
|    | 24(\$t | cp)                    |
|    | mov    | %rdx,\$N[1]            |
|    | mulq   | \$m0                   |
| h  | add    | %rax,\$A[1]            |
| υ, | mov    |                        |
| -  | 8*1(\$ | \$np),%rax             |
| ち; | adc    | \\$0,%rdx              |
|    | mov    | %rdx,\$A[0]            |
|    | mulq   | \$m1                   |
|    | add    | %rax,\$N[1]            |

# How to deal with dependencies?



= a /

= c +



### **Out-of-Order Execution (1)**

Execute instructions when data is available rather than by program order



c = a / b; d = c + 5; e = f + q;

- Completed instructions wait in the reorder buffer until all previous instructions are retired
- Why not retire immediately?





### **Out-of-Order Execution (2)**



- Completed instructions wait in the reorder buffer until all previous instructions are retired
- Why not retire immediately?
- Out-of-order execution is speculative!
- Need to abandon instructions in the reorder buffer if never executed





#### **Program Flow – Legitimate Behavior**







# Attack Flow (1)

Step 1:

Set pointer to kernel space

#### Step 2:

Due to out-of-order processing, CPU fetches secret value from kernel space

#### Step 3:

Secret value is used to index user space array

#### **Exception triggered:**

Results of out-of-order instructions discarded (i takes previous value)







#### Attack Flow (2)





#### DEMO Spying in Realtime on Password Input

[Source: <u>Schwarz (2018)</u>]



|   | pwd                    | ×      |
|---|------------------------|--------|
| U | nlock Password Manager |        |
|   |                        | Unlock |

|        |           | ×       |               |  |
|--------|-----------|---------|---------------|--|
| File E | Edit View | Search  | Terminal Help |  |
| mschwa | arz@lab06 | :~/Docu | ments\$ []    |  |



#### **Meltdown – Mitigation**

- Kernel Page Table Isolation (KPTI)
  - Linux kernel memory no longer mapped into user space processes
  - User space can no longer access kernel memory
- Approach seemingly solid, but...
  - On-going discussion about soundness
    - SMI handlers: parts of kernel memory must always be mapped into user space processes
    - Protects kernel, but user space programs still vulnerable
  - Introduces overhead when jumping from user mode to kernel mode
    - New capability introduced (CAP\_DISABLE\_PTI), disables KPTI for "safe" processes<sup>1</sup>



<sup>&</sup>lt;sup>1</sup> <u>Re: [RFC PATCH v2 6/6] x86/entry/pti: don't switch PGD on when pti\_disable is set [LWN.net]</u>



# Meltdown - Intel-only? (1)

- Meltdown initially thought to be linked with Transactional Synchronization Extensions (TSX-NI)
  - Intel-only hardware atomic memory operations on Haswell and later
  - Enables Meltdown attack without triggering software exception handling
- TSX not a requirement for Meltdown
  - Does make attack virtually impossible to detect<sup>2</sup>



<sup>&</sup>lt;sup>2</sup> <u>Detecting Attacks that Exploit Meltdown and Spectre with Performance Counters – Trend Micro, 2018</u>



# Meltdown - Intel-only? (2)

• Meltdown initially thought not to affect AMD processors<sup>3</sup>

| From    | Tom Lend | dacky <tho< th=""><th>omas.lenda</th><th>acky@amo</th><th>d.com&gt;</th><th></th><th></th><th></th><th></th></tho<> | omas.lenda | acky@amo | d.com> |     |    |     |            |
|---------|----------|---------------------------------------------------------------------------------------------------------------------|------------|----------|--------|-----|----|-----|------------|
| Subject | [PATCH]  | x86/cpu,                                                                                                            | x86/pti:   | Do not   | enable | PTI | on | AMD | processors |
| Date    | Tue, 26  | Dec 2017                                                                                                            | 23:43:54   | -0600    |        |     |    |     |            |

AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.

- Meltdown paper release: AMD *is* likely vulnerable
- PoC confirms OoO execution occurs across security domains, practical exploitation therefore seems feasible



<sup>&</sup>lt;sup>3</sup> <u>LKML: Tom Lendacky: [PATCH] x86/cpu, x86/pti: Do not enable PTI on AMD processors</u>







#### **Spectre – Basic Outline**

- Design flaw that affects all modern CPUs: Intel, AMD, ARM, PowerPC
- Branch prediction and speculative execution leave traces in cache
- Cache timing attack reveals data from different security domains
- Two variants:
  - **Spectre-v1**: Read from the current user space process
  - **Spectre-v2**: Read from other processes





# **Speculative Execution and Branches (1)**

- When execution reaches a branch
- The processor predicts the outcome of the branch
- Execution proceeds (speculatively) along predicted branch
- Correct prediction  $\rightarrow$  all is well
- Misprediction  $\rightarrow$  abandon and resume







# **Speculative Execution and Branches (2)**

• Branch History Buffer (BHB)

Outcome of conditional branches **JGE 4006c9** 



#### • Branch Target Buffer (BTB)

Target of indirect branches **JMP eax** 







#### Spectre Variant 1 – Bounds Check Bypass















#### Spectre Variant 2 – Branch Target Injection







#### **Spectre – Mitigations**

- Basic idea: prevent speculative execution across branches
- Three approaches:
  - Spectre-v1: Explicitly prevent speculative execution across conditional branches by inserting blocking operation
  - Spectre-v2: Avoid training branch predictor by replacing branch instructions with semantic equivalents
  - Spectre-v2: Disable branch prediction across security domains





# **Spectre-v1 – Insert Blocking Operation**

- Approach: Prevent speculative execution by inserting blocking operation
- LFENCE (serialize load operations), PAUSE (spin loop hint)

```
scanf("%d", &untrusted);
if(untrusted < arrayLength)
{
    value = array[untrusted];
    asm("lfence");
    value2 = array2[value * 64];
}</pre>
```

- Effective, but
  - Need to recompile code or patch binary
  - Significantly degrades performance need static analysis to identify vulnerable code





### **Spectre-v2 – Retpoline (1)**

- **Approach:** Avoid training branch predictor by replacing branch instructions with semantic equivalents
- Return trampoline (retpoline)
  - Indirect branch normally pulls return address off stack ("jump to this address")
  - Replace with PUSH/RET
    - Push target address onto stack
    - Return to target address
  - BTB does not learn about branch due to pattern mismatch





#### **Spectre-v2 – Retpoline (2)**

- Need to recompile code or patch binary
- Degrades performance
  - Somewhat mitigated by Return Stack Buffer (RSB)
- Not a perfect solution: *ineffective* on Skylake and later
  - RSB behavior different: when empty, falls back to BTB prediction
  - Addressed with RSB stuffing<sup>4</sup>; implemented by e.g. Linux kernel<sup>5</sup> and LLVM compiler<sup>6</sup>

- <sup>5</sup> <u>x86/retpoline: Avoid return buffer underflows on context switch Patchwork</u>
- <sup>6</sup> [llvm-dev] LLVM Release Schedules: 5.0.2, 6.0.1



<sup>&</sup>lt;sup>4</sup> <u>Retpoline: A Branch Target Injection Mitigation - Intel, 2018</u>



# **Spectre-v2 – Disable BTB Prediction**

- Approach: Disable BTB prediction across security domains
- Intel microcode update 7,8
  - Introduces new Model Specific Registers (MSRs) to control BTB
  - No learning across hyperthreads
  - Higher security levels do not learn from lower level activity
  - BTB clobbering, wiped on each context switch
  - Major performance impact



<sup>&</sup>lt;sup>7</sup> <u>Microcode Revision Guidance - Intel, 2018</u>

<sup>&</sup>lt;sup>8</sup> <u>Controlling the Performance Impact of Microcode and Security Patches - RedHat, 2018</u>

# **Beyond Spectre and Meltdown**

- Various follow-up publications on variants, other CPU vulnerabilities (nonexhaustive):
  - <u>SGXpectre</u> (2018)
  - Foreshadow, Foreshadow-NG (2018)
  - Microarchitectural Data Sampling: <u>Rogue In-Flight Data Load</u>, <u>Fallout</u>, <u>ZombieLoad</u> (2019)
  - <u>BlindSide</u> (2020)
  - Load Value Injection (2020)
  - <u>CacheOut</u>, <u>SGAxe</u> (2020)
  - <u>CrossTalk</u> (2021)
  - Rage Against The Machine Clear (2021)
- Where available, mitigations usually comprise
  - Compiler- and kernel-based protections
  - Microcode updates for (then) in-market CPUs; partial silicon redesign for newer generations



#### References

#### **Background reading**

- <u>"Meltdown: Reading Kernel Memory from User Space" (2018)</u>
- "Spectre Attacks: Exploiting Speculative Execution" (2018)



#### Admin

#### • Quiz

• Verify your understanding of material

#### • Questions?

• Reach out via email

