ESPHome API Protocol Details
This document describes the low-level protocol formats used by the ESPHome API, including message framing, buffer layouts, and wire formats for both Noise and Plaintext protocols.
Overview
The ESPHome API supports two protocol variants:
-
Noise Protocol: Encrypted communication using the Noise protocol framework
-
Plaintext Protocol: Unencrypted communication with variable-length encoding
Both protocols use protocol buffers for message serialization but differ in their framing and transport layer.
Common Elements
Message Types
- Message types are encoded as 16-bit unsigned values (two bytes)
- Split into
type_high
(upper byte) andtype_low
(lower byte) - Allows for 65,536 different message types (0-65535)
VarInt Encoding
Variable-length integer encoding follows the Protocol Buffers VarInt specification:
- Each byte has a continuation bit (MSB)
- 7 bits of data per byte
- Least significant bits first
- Always encodes unsigned values
Size ranges:
- 1 byte: 0-127
- 2 bytes: 128-16,383
- 3 bytes: 16,384-2,097,151
- 4 bytes: 2,097,152-268,435,455
- 5 bytes: 268,435,456-34,359,738,367
Noise Protocol
Overview
The Noise protocol provides encrypted, authenticated communication using the Noise XX handshake pattern.
Frame Structure
[Indicator][Encrypted Size][Encrypted Payload][MAC]
1 byte 2 bytes Variable 16 bytes
Message Format
Unencrypted Header (3 bytes)
- Indicator: 0x01
- Encrypted payload size: 16-bit unsigned, big-endian
Encrypted Payload
- Message type: 16-bit unsigned, big-endian (encrypted)
- Data length: 16-bit unsigned, big-endian (encrypted)
- Protocol buffer data
MAC (16 bytes)
Data Type Summary
Field | Type | Size | Encoding | Notes |
---|---|---|---|---|
Indicator | uint8 | 1 byte | - | Always 0x01 |
Encrypted Size | uint16 | 2 bytes | Big-endian | Not encrypted |
Message Type | uint16 | 2 bytes | Big-endian | Encrypted |
Data Length | uint16 | 2 bytes | Big-endian | Encrypted |
Data | bytes | Variable | - | Protocol buffer payload, encrypted |
Buffer Layout
Position: [0] [1] [2] [3] [4] [5] [6] [7] ... [N-15] ... [N]
Content: Ind ES₁ ES₂ MT₁ MT₂ DL₁ DL₂ [Payload Data] [MAC-16-bytes]
└─────── Header (7 bytes) ──────┘ └─ Footer ─┘
Where:
Ind
: Indicator byte (0x01 for Noise)ES₁/ES₂
: Encrypted payload size (16-bit unsigned, big-endian)MT₁/MT₂
: Message type (16-bit unsigned, big-endian, encrypted)DL₁/DL₂
: Data length (16-bit unsigned, big-endian, encrypted)Payload
: Actual message data (encrypted)MAC
: 16-byte authentication tag
Encryption Details
- Uses ChaCha20-Poly1305 AEAD cipher
- Encrypts: message type + data length + payload
- The encrypted size field in the header is NOT encrypted
- MAC provides authentication for the entire encrypted payload
State Machine
The Noise protocol uses the following states:
stateDiagram-v2
[*] --> INITIALIZE
INITIALIZE --> CLIENT_HELLO: init()
CLIENT_HELLO --> SERVER_HELLO: receive client hello
CLIENT_HELLO --> EXPLICIT_REJECT: bad indicator
SERVER_HELLO --> HANDSHAKE: send server hello
HANDSHAKE --> DATA: handshake complete
HANDSHAKE --> EXPLICIT_REJECT: auth failed
EXPLICIT_REJECT --> FAILED: after sending reject
INITIALIZE --> FAILED: tcp setup
CLIENT_HELLO --> FAILED: socket error
SERVER_HELLO --> FAILED: socket error
HANDSHAKE --> FAILED: handshake failed
DATA --> FAILED: decrypt failed
DATA --> CLOSED: close()
FAILED --> CLOSED: close()
note right of EXPLICIT_REJECT
Temporary state for
sending error message
end note
note right of DATA
Encrypted data
exchange
end note
States:
-
INITIALIZE: Initial state, waiting for init() to be called
-
CLIENT_HELLO: Waiting for client hello message
-
SERVER_HELLO: Sending server hello with device info
-
HANDSHAKE: Performing Noise XX handshake exchange
-
DATA: Handshake complete, ready for encrypted data exchange
-
CLOSED: Connection closed normally
-
FAILED: Error occurred, connection failed
-
EXPLICIT_REJECT: Temporary state for sending handshake rejection
Handshake Process
During the Noise handshake, the server sends a SERVER_HELLO message:
SERVER_HELLO format:
[Indicator] [Size] [Protocol] [Node-Name] [MAC-Address]
0x01 2B 0x01 null-term null-term
Example SERVER_HELLO:
Hex: 01 00 1C 01 65 73 70 68 6F 6D 65 00 31 32 3A 33 34 3A 35 36 3A 37 38 3A 39 41 3A 42 43 00
^ ^^^^^ ^ ^--------node------^ ^ ^-----------------MAC address--------------------^ ^
| | | | |
| | Protocol (0x01) null null
| Size (28 bytes, big-endian)
Indicator (0x01)
This decodes to:
- Frame indicator: 0x01
- Frame size: 0x001C (28 bytes)
- Protocol: 0x01 (always)
- Node name: "esphome" (null-terminated)
- MAC: "12:34:56:78:9A:BC" (null-terminated)
During the actual Noise handshake phase, if errors occur, the server sends an explicit handshake rejection:
Handshake rejection format:
[Indicator] [Size] [Error-Flag] [Error-Message]
0x01 2B 0x01 Variable
Example handshake rejection:
Hex: 01 00 17 01 48 61 6E 64 73 68 61 6B 65 20 4D 41 43 20 66 61 69 6C 75 72 65
^ ^^^^^ ^ ^----------------------Error message------------------------^
| | |
| | Error flag (0x01)
| Size (23 bytes, big-endian)
Indicator (0x01)
This decodes to:
- Frame indicator: 0x01
- Frame size: 0x0017 (23 bytes)
- Error flag: 0x01 (failure)
- Error message: "Handshake MAC failure" (NOT null-terminated)
Error Handling
Handshake errors can occur during different phases:
-
CLIENT_HELLO phase: Bad indicator or packet length
-
HANDSHAKE phase: Authentication failures, MAC failures, protocol errors
When these errors occur, the server sends an explicit rejection message using the format above, then transitions to the FAILED state.
Actual handshake error messages sent:
- "Bad indicator byte" - Invalid frame indicator
- "Bad handshake packet len" - Packet too large for handshake phase
- "Empty handshake message" - Received empty handshake frame
- "Bad handshake error byte" - Invalid error byte in handshake message
- "Handshake MAC failure" - MAC verification failed during handshake
- "Handshake error" - Generic handshake failure
For post-handshake errors (during encrypted data exchange):
- MAC verification failures close the connection immediately
- Invalid frame structure closes the connection immediately
- No error messages are sent for these failures to prevent information leakage
Wire Format Example
Sending a temperature reading (value: 23.5°C):
Hex: 01 00 0E 00 08 00 06 12 04 08 96 42 10 B4 46
[C H A C H A 2 0 - P O L Y M A C - 1 6 bytes]
01
: Noise indicator00 0E
: Encrypted size (14 bytes, big-endian unsigned)-
Encrypted payload (before encryption):
-
00 08
: Message type 8 (big-endian unsigned) 00 06
: Data length 6 (big-endian unsigned)-
12 04 08 96 42 10
: Protocol buffer data -
16-byte MAC appended
Plaintext Protocol
Overview
The plaintext protocol uses variable-length encoding to minimize overhead for unencrypted communication.
State Machine
The plaintext protocol uses a simpler state machine:
stateDiagram-v2
[*] --> INITIALIZE
INITIALIZE --> DATA: init()
DATA --> CLOSED: close
INITIALIZE --> FAILED: tcp setup
DATA --> FAILED: read/write error
FAILED --> CLOSED: close
note right of DATA
No handshake required
Direct data exchange
end note
States:
-
INITIALIZE: Initial state, waiting for init() to be called
-
DATA: Ready for data exchange (no handshake required)
-
CLOSED: Connection closed normally
-
FAILED: Error occurred, connection failed
No handshake is required for the plaintext protocol - it transitions directly to the DATA state after initialization.
Frame Structure
[Indicator][Payload Size VarInt][Message Type VarInt][Payload]
1 byte 1-3 bytes 1-2 bytes Variable
Data Type Summary
Field | Type | Size | Encoding | Notes |
---|---|---|---|---|
Indicator | uint8 | 1 byte | - | Always 0x00 |
Payload Size | varint | 1-3 bytes | VarInt | Unsigned |
Message Type | varint | 1-2 bytes | VarInt | Unsigned, max 65535 |
Data | bytes | Variable | - | Protocol buffer payload |
Message Format
Indicator
0x00 (1 byte)
Payload Size
VarInt encoding of payload size (unsigned)
Message Type
VarInt encoding of the 16-bit message type (unsigned)
Payload
Protocol buffer data
Buffer Layout
The plaintext protocol dynamically adjusts the header position to minimize padding:
Small Messages (header length = 3 bytes)
Position: [0] [1] [2] [3] [4] [5] [6] ...
Content: XX XX XX Ind FSz MTp [Payload]
└ Padding ┘ └── Header ──┘
(offset=3)
Medium Messages (header length = 4 bytes)
Position: [0] [1] [2] [3] [4] [5] [6] ...
Content: XX XX Ind FS₁ FS₂ MTp [Payload]
└Pad┘ └─── Header ────┘
(offset=2)
Large Messages (header length = 6 bytes)
Position: [0] [1] [2] [3] [4] [5] [6] ...
Content: Ind FS₁ FS₂ FS₃ MT₁ MT₂ [Payload]
└────── Header ───────┘
(offset=0, no padding)
Where:
Ind
: Indicator byte (0x00 for plaintext)FSz/FS₁/FS₂/FS₃
: Frame size varint (unsigned)MTp/MT₁/MT₂
: Message type varint (unsigned)- Padding bytes are unused (can be any value)
VarInt Size Calculation
The plaintext protocol dynamically calculates the optimal header position to minimize unused padding:
Payload Size Computation
The payload size includes only the actual protobuf data bytes, not the header components. This differs from the Noise protocol which includes type and length in its data length field.
VarInt Length Determination
Based on the value being encoded: - Values 0-127: 1 byte - Values 128-16,383: 2 bytes - Values 16,384-2,097,151: 3 bytes - And so on (each additional byte adds 7 bits of capacity)
Header Offset Calculation
The header is positioned as late as possible in the 6-byte padding area: - Total header length = 1 (indicator) + payload size varint + message type varint - Offset = 6 - total header length - This ensures minimal unused bytes at the beginning of the buffer
Dynamic Positioning
As message sizes vary, the header position adjusts accordingly: - Small messages use offsets of 3-5 (leaving 3-5 unused bytes) - Large messages can use offset 0 (utilizing all padding bytes)
This dynamic positioning maximizes buffer efficiency while maintaining a fixed pre-allocation size.
Wire Format Example
Same temperature reading:
Hex: 00 06 08 12 04 08 96 42 10
00
: Plaintext indicator06
: Payload size (6 bytes, varint unsigned)08
: Message type 8 (varint unsigned)12 04 08 96 42 10
: Protocol buffer data
Performance Considerations
Noise Protocol
- Fixed 23-byte overhead (7 header + 16 MAC)
- Constant-time header construction
- Higher CPU usage for encryption
- Better for security-sensitive applications
Plaintext Protocol
- Variable 3-5 byte overhead (depending on sizes)
- Dynamic header positioning
- Lower CPU usage
- Better for high-frequency, non-sensitive data
Implementation Notes
Integer Types
All size and type fields are unsigned integers
Endianness
- Noise protocol: All multi-byte values use big-endian encoding
- Plaintext protocol: Uses VarInt encoding (Protocol Buffers standard)
Buffer Alignment
Both protocols ensure payload data starts at predictable offsets for efficient processing
Error Handling
- Invalid frame indicators or sizes should immediately close the connection
- For Noise protocol specific errors (handshake failures, decryption errors, etc.), see the Noise Protocol section above
Maximum Sizes
- Message types: 0-65,535 (16-bit unsigned)
- Frame/data sizes: up to 2^32-1 bytes (varint can encode up to 64-bit values, but practically limited by memory)