Last modified: January 24, 2026

This article is written in: 🇺🇸

Protocol Buffers Overview

Protocol Buffers (often referred to as protobuf) is a language-neutral, platform-independent method for serializing structured data. Originally created at Google, it excels at enabling efficient data interchange between services, storing information in a compact binary format, and sustaining backward and forward compatibility across different versions of the data schema.

ASCII DIAGRAM: Flow of Working with Protobuf

 +-----------+         +---------+         +---------------------+ 
 |  .proto   |   Use   |  Protoc |   Gen   |   Language Classes  |
 | (Schema)  +-------->+ Compiler+-------->+  (Java, Python, etc.)
 +-----------+         +----+----+         +----------+----------+
                              |                      |
                              |   (Serialize/        |   (Deserialize/
                              |    Deserialize)      |    Manipulate)
                              v                      v
                    +---------------------+   +---------------------+
                    |  In-memory Objects  |   |  In-memory Objects  |
                    +---------------------+   +---------------------+
                             ^                           ^
                             |        (Binary Data)       |
                             +------------<--------------->+

Basic Concepts

  1. .proto Files
  2. Written in a syntax resembling IDLs (Interface Definition Languages).
  3. Contain message declarations representing data structures and fields with typed, numbered entries.
  4. Each field’s number identifies it in the binary encoding, so it should not be changed once deployed.

  5. Generated Code

  6. protoc converts .proto definitions into classes in various languages (Java, Python, C++, Go, etc.).
  7. These classes provide getters, setters, and builder patterns to manipulate field values.

  8. Serialization

  9. Protobuf messages are encoded as a compact, binary format.
  10. Serialization is efficient in terms of both space and time.
  11. Deserialization uses the same schema-based approach to reconstruct the original objects.

Example: Person and AddressBook

A simple .proto file might define a Person message with nested fields and an AddressBook that holds multiple Person messages:

syntax = "proto3";

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    string number = 1;
    PhoneType type = 2;
  }

  repeated PhoneNumber phones = 4;
}

message AddressBook {
  repeated Person people = 1;
}

Compilation and Generated Classes

  1. Compile the .proto file:
protoc --java_out=. addressbook.proto
  1. Output classes (for example, in Java) will include Person, Person.PhoneNumber, Person.PhoneType, and AddressBook.

  2. Usage in Java (example):

Person person = Person.newBuilder()
    .setName("Alice")
    .setId(123)
    .setEmail("alice@example.com")
    .addPhones(
      Person.PhoneNumber.newBuilder()
        .setNumber("555-1234")
        .setType(Person.PhoneType.HOME)
    )
    .build();

// Serialization
byte[] data = person.toByteArray();

// Deserialization
Person parsedPerson = Person.parseFrom(data);
System.out.println(parsedPerson.getName()); // "Alice"

Advantages of Protocol Buffers

  1. Efficiency
  2. Compact binary representation saves bandwidth and reduces latency.
  3. Fast to parse compared to JSON or XML due to the binary encoding approach.

  4. Language-Neutral

  5. Protobuf supports many languages and platforms, making it flexible for cross-language communication.

  6. Backward/Forward Compatibility

  7. Fields can be added or removed from message types over time without breaking existing code.
  8. Each field’s unique numeric tag enables easy evolution of the schema.

  9. Schema-Driven

  10. The .proto file defines a clear contract for data exchange, promoting strong typing and consistent usage across services.

Common Use Cases

Protocol Buffers vs JSON

Aspect Protobuf JSON
Encoding Binary Text (UTF-8, etc.)
Readability Not human-readable Human-readable (plain text)
Size & Performance Smaller, faster to parse Larger, slower to parse
Schema Definition Required (.proto files) Not required (schemaless)
Evolution Facilitated by numeric tags (forward/backward) Relies on optional fields or versioning manually
Tooling Protobuf compiler needed, specialized libraries Widespread support, easy debugging with text format

Choose JSON if easy debugging, simplicity, or direct human editing is a priority.
Choose Protobuf if efficiency, strict schema, or large-scale message passing is crucial.

Best Practices