Last modified: June 06, 2026
This article is written in: 🇺🇸
JSON stands for JavaScript Object Notation. It is a small, text-based format for representing values so that programs can exchange them easily.
A JSON document can describe an employee, a list of orders, API settings, event logs, or almost any other structured information:
{
"employeeId": "E1001",
"name": "Amina Rahman",
"active": true,
"skills": ["Python", "SQL"],
"manager": null
}
A helpful mental model is:
JSON represents values using objects, arrays, strings, numbers, booleans, and
null. It gives data a portable structure, but it does not explain what fields mean or which fields are required unless a separate contract, such as JSON Schema, is provided.
JSON began as a notation related to JavaScript, but it is language-independent in everyday use. Python programs usually turn JSON objects into dictionaries, Java programs into mapped objects or tree nodes, JavaScript programs into native objects, and Go programs into structs or maps.
JSON supports four primitive kinds of value and two structured kinds of value.
flowchart TD
J["JSON value"] --> P["Primitive values"]
J --> S["Structured values"]
P --> STR["string<br/>'Amina'"]
P --> NUM["number<br/>42 or 19.95"]
P --> BOOL["boolean<br/>true / false"]
P --> NULL["null"]
S --> OBJ["object<br/>{ name/value members }"]
S --> ARR["array<br/>[ ordered values ]"]
| Value kind | Example | What it represents |
| String | "Berlin" |
Text, enclosed in double quotation marks |
| Number | 42, -3.5, 6.02e23 |
A decimal number representation |
| Boolean | true, false |
A truth value |
| Null | null |
An explicitly empty or absent value |
| Object | {"name": "Amina"} |
Named members, where each name is a string |
| Array | ["Python", "SQL"] |
An ordered sequence of values |
A complete JSON text may contain any JSON value at the top level:
"ready"
[1, 2, 3]
{"status": "ready"}
Many APIs choose an object as the top-level value because named fields are easier to extend later, but the JSON format itself does not require this.
An object uses curly braces and contains name/value members:
{
"name": "Amina Rahman",
"department": "Engineering",
"active": true
}
The name is always a JSON string. A colon separates the name from its value, and commas separate members.
JSON object names should be unique. Avoid this:
{
"status": "pending",
"status": "approved"
}
Different parsers may handle duplicate names differently: one may keep the final value, another may report an error, and another may preserve both internally. A reliable producer never sends duplicate object names.
Objects are conceptually collections of named values rather than ordered records. Many modern libraries preserve the input order when reading or writing objects, but interoperable application logic should not depend on the order of JSON object members unless an external protocol explicitly defines additional ordering rules.
An array uses square brackets and contains zero or more values:
["Python", "SQL", "Docker"]
Arrays are ordered. Position matters:
["first", "second"]
is not the same sequence as:
["second", "first"]
Array elements can themselves be objects or arrays:
{
"employees": [
{
"employeeId": "E1001",
"name": "Amina Rahman"
},
{
"employeeId": "E1002",
"name": "Jonas Weber"
}
]
}
flowchart TB
R["object"] --> K["employees"]
K --> A["array"]
A --> O1["object at index 0"]
A --> O2["object at index 1"]
O1 --> I1["employeeId: E1001"]
O1 --> N1["name: Amina Rahman"]
O2 --> I2["employeeId: E1002"]
O2 --> N2["name: Jonas Weber"]
JSON is intentionally small. Most JSON mistakes come from borrowing JavaScript or Python syntax that looks similar but is not JSON.
{
"name": "Amina",
"age": 30,
"active": true,
"projects": ["Atlas", "Orchid"],
"manager": null
}
| Rule | Correct JSON | Incorrect JSON |
| Names must be double-quoted strings | {"name": "Amina"} |
{name: "Amina"} |
| Strings use double quotes | {"city": "Berlin"} |
{"city": 'Berlin'} |
| No trailing comma | {"a": 1, "b": 2} |
{"a": 1, "b": 2,} |
| Literal names are lowercase | {"active": true} |
{"active": True} |
null is the JSON null value |
{"manager": null} |
{"manager": None} |
| Comments are not part of JSON | {"port": 8080} |
{"port": 8080 /* dev */} |
Numbers cannot be NaN or infinity in standard JSON |
{"score": 1.5} |
{"score": NaN} |
| Leading zeroes are not allowed on integers | {"month": 7} |
{"month": 07} |
This JavaScript-like text is not valid JSON:
{
employeeId: 'E1001',
active: true,
score: NaN,
}
The valid JSON equivalent is:
{
"employeeId": "E1001",
"active": true,
"score": null
}
Whether null is the correct replacement for a missing score is an application decision; the key point is that NaN is not a standard JSON number.
A JSON string is wrapped in double quotation marks:
"Hello, world"
Characters with a special role can be escaped with a backslash:
| Desired character | JSON form | Example |
| Double quote | \" |
"She said \"hello\"." |
| Backslash | \\ |
"C:\\Users\\Amina" |
| New line | \n |
"line one\nline two" |
| Tab | \t |
"name\tvalue" |
| Unicode escape | \uXXXX |
"\u20AC" for € |
JSON text exchanged between independent systems should be UTF-8 encoded. Unicode characters can usually be written directly in UTF-8:
{
"city": "München",
"currency": "€",
"greeting": "こんにちは",
"status": "✅"
}
They can also be escaped:
{
"currency": "\u20AC"
}
For characters outside the Basic Multilingual Plane, an escaped form uses a surrogate pair, but direct UTF-8 is often clearer:
{
"emojiDirect": "😀",
"emojiEscaped": "\uD83D\uDE00"
}
Both are intended to represent the same Unicode character.
Create a file named employees.json:
{
"generatedAt": "2026-05-24T09:30:00Z",
"employees": [
{
"employeeId": "E1001",
"name": "Amina Rahman",
"active": true,
"department": "Engineering",
"skills": ["Python", "SQL"],
"office": {
"city": "Berlin",
"countryCode": "DE"
},
"managerId": null
},
{
"employeeId": "E1002",
"name": "Jonas Weber",
"active": false,
"department": "Finance",
"skills": ["Excel"],
"office": {
"city": "Hamburg",
"countryCode": "DE"
},
"managerId": "E1001"
}
]
}
Reading it in ordinary language:
employees value is an array containing two employee objects.managerId: null means Amina has no manager value in this document.E1001.flowchart TB
D["Root object"] --> G["generatedAt<br/>timestamp string"]
D --> ES["employees<br/>array"]
ES --> E1["employee object<br/>E1001"]
ES --> E2["employee object<br/>E1002"]
E1 --> SK1["skills array<br/>Python, SQL"]
E1 --> OF1["office object<br/>Berlin / DE"]
E1 --> M1["managerId<br/>null"]
E2 --> SK2["skills array<br/>Excel"]
E2 --> OF2["office object<br/>Hamburg / DE"]
E2 --> M2["managerId<br/>E1001"]
Parsing converts JSON text into values a program can work with. Serializing converts program values back into JSON text.
sequenceDiagram
participant File as employees.json
participant Parser as json.load(...)
participant App as Python dictionaries/lists
participant Writer as json.dump(...)
File->>Parser: UTF-8 JSON text
Parser->>App: dict, list, str, bool, None
App->>App: Read and update values
App->>Writer: Serialize values
Writer-->>File: JSON text output
Create read_employees.py in the same directory as employees.json:
import json
from pathlib import Path
from typing import Any
def load_json_object(filename: str) -> dict[str, Any]:
with Path(filename).open("r", encoding="utf-8") as file:
value = json.load(file)
if not isinstance(value, dict):
raise ValueError("Expected the top-level JSON value to be an object.")
return value
def print_active_employees(document: dict[str, Any]) -> None:
employees = document.get("employees", [])
active_employees = [
employee for employee in employees if employee.get("active") is True
]
print(f"Employees total: {len(employees)}")
print(f"Active employees: {len(active_employees)}")
for employee in active_employees:
skills = ", ".join(employee.get("skills", []))
city = employee["office"]["city"]
print(f"- {employee['employeeId']}: {employee['name']} ({city}) — {skills}")
def write_pretty_copy(document: dict[str, Any], filename: str) -> None:
with Path(filename).open("w", encoding="utf-8") as file:
json.dump(document, file, indent=2, ensure_ascii=False)
file.write("\n")
if __name__ == "__main__":
employees_document = load_json_object("employees.json")
print_active_employees(employees_document)
write_pretty_copy(employees_document, "employees.pretty.json")
print("Wrote employees.pretty.json")
Run it:
python read_employees.py
Expected output:
Employees total: 2
Active employees: 1
- E1001: Amina Rahman (Berlin) — Python, SQL
Wrote employees.pretty.json
Important details in this example:
encoding="utf-8" makes text encoding explicit.json.load(file) parses one JSON value from a file.true, false, and null become Python True, False, and None.ensure_ascii=False lets readable Unicode characters such as ü remain visible when writing UTF-8 JSON.indent=2 creates human-readable output rather than the most compact output.A parser checks whether input is syntactically valid JSON. It does not automatically know whether your application permits that structure.
This is syntactically valid JSON:
{
"employees": "not really a list"
}
It is not valid for an application that requires employees to be an array of employee objects.
flowchart TD
A["Incoming JSON text"] --> B{"Parses successfully?"}
B -- "No" --> C["Syntax error<br/>reject malformed JSON"]
B -- "Yes" --> D{"Matches contract?<br/>JSON Schema or code rules"}
D -- "No" --> E["Validation error<br/>reject wrong shape/value"]
D -- "Yes" --> F["Use in business logic"]
A robust API typically separates these concerns:
JSON Schema is a vocabulary for describing JSON documents and validating instances against defined rules.
For the employee data, create employees.schema.json:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/schemas/employees.schema.json",
"title": "Employee collection",
"type": "object",
"properties": {
"generatedAt": {
"type": "string",
"format": "date-time"
},
"employees": {
"type": "array",
"items": {
"$ref": "#/$defs/employee"
}
}
},
"required": ["generatedAt", "employees"],
"additionalProperties": false,
"$defs": {
"employee": {
"type": "object",
"properties": {
"employeeId": {
"type": "string",
"pattern": "^E[0-9]{4}$"
},
"name": {
"type": "string",
"minLength": 1
},
"active": {
"type": "boolean"
},
"department": {
"type": "string",
"minLength": 1
},
"skills": {
"type": "array",
"items": {
"type": "string",
"minLength": 1
},
"uniqueItems": true
},
"office": {
"$ref": "#/$defs/office"
},
"managerId": {
"type": ["string", "null"],
"pattern": "^E[0-9]{4}$"
}
},
"required": [
"employeeId",
"name",
"active",
"department",
"skills",
"office",
"managerId"
],
"additionalProperties": false
},
"office": {
"type": "object",
"properties": {
"city": {
"type": "string",
"minLength": 1
},
"countryCode": {
"type": "string",
"pattern": "^[A-Z]{2}$"
}
},
"required": ["city", "countryCode"],
"additionalProperties": false
}
}
}
| Schema keyword | Meaning in this example |
$schema |
Says the schema uses the Draft 2020-12 vocabulary |
type |
Specifies the allowed JSON kind, such as object or array |
properties |
Defines known members of an object |
required |
Lists members that must be present |
items |
Specifies what values an array may contain |
$defs |
Holds reusable schema definitions |
$ref |
Reuses a definition instead of copying it |
pattern |
Requires a string to match a regular expression |
additionalProperties: false |
Rejects undeclared members at that object level |
uniqueItems: true |
Disallows duplicate array values according to schema equality |
The format: "date-time" keyword expresses that a string is intended to be a timestamp. Validation libraries may require explicit configuration before they enforce format checking, so do not assume every validator checks formats by default.
Install a JSON Schema validation library:
python -m pip install jsonschema
Create an invalid file named employees-invalid.json:
{
"generatedAt": "2026-05-24T09:30:00Z",
"employees": [
{
"employeeId": 1003,
"name": "",
"active": "yes",
"department": "Engineering",
"skills": ["Python", "Python"],
"office": {
"city": "Berlin",
"countryCode": "Germany"
},
"managerId": null
}
]
}
Now create validate_employees.py:
import json
from pathlib import Path
from typing import Any
from jsonschema import Draft202012Validator, FormatChecker
def load_json(filename: str) -> Any:
with Path(filename).open("r", encoding="utf-8") as file:
return json.load(file)
def validate_file(instance_filename: str, schema: dict[str, Any]) -> bool:
instance = load_json(instance_filename)
validator = Draft202012Validator(schema, format_checker=FormatChecker())
errors = sorted(validator.iter_errors(instance), key=lambda error: list(error.path))
if not errors:
print(f"{instance_filename}: valid")
return True
print(f"{instance_filename}: invalid")
for error in errors:
path = "$"
for item in error.path:
path += f"[{item}]" if isinstance(item, int) else f".{item}"
print(f" {path}: {error.message}")
return False
if __name__ == "__main__":
employee_schema = load_json("employees.schema.json")
validate_file("employees.json", employee_schema)
validate_file("employees-invalid.json", employee_schema)
Run it:
python validate_employees.py
Expected output may be phrased slightly differently between library versions, but it will identify problems similar to these:
employees.json: valid
employees-invalid.json: invalid
$.employees[0].active: 'yes' is not of type 'boolean'
$.employees[0].employeeId: 1003 is not of type 'string'
$.employees[0].name: '' should be non-empty
$.employees[0].office.countryCode: 'Germany' does not match '^[A-Z]{2}$'
$.employees[0].skills: ['Python', 'Python'] has non-unique elements
The valuable idea is not the exact punctuation of a validator's message. It is that malformed or incorrectly shaped input is rejected before application logic relies on it.
These are different states:
{
"managerId": null,
"middleName": "",
"skills": []
}
| Situation | Example | Possible meaning |
| Missing member | No managerId member at all |
The producer did not provide the field, or the field is not applicable |
| Explicit null | "managerId": null |
The field is known and has no current value |
| Empty string | "middleName": "" |
A text field exists but has no characters |
| Empty array | "skills": [] |
The collection is present and currently has no items |
There is no universal rule that an API should always prefer null or always omit missing values. What matters is that the contract specifies the meaning, and producers and consumers follow it consistently.
A JSON Schema can express the difference:
{
"type": "object",
"properties": {
"managerId": {
"type": ["string", "null"]
}
},
"required": ["managerId"]
}
Here, managerId must be present, but its value may be a string or null.
A JSON number is written in decimal form:
{
"count": 42,
"ratio": 0.875,
"distance": 1.2e6
}
The JSON grammar does not label values as int32, int64, float, or decimal. The receiving language or library chooses its own numeric representation.
This can matter when values are exchanged across languages:
{
"orderId": 9007199254740993
}
Some environments commonly represent numbers using IEEE 754 double-precision floating-point values. Such values cannot exactly represent every integer larger than 9007199254740991 (2^53 - 1). If a large identifier must survive across diverse systems exactly, representing it as a string is often safer:
{
"orderId": "9007199254740993"
}
flowchart LR
A["Large identifier<br/>9007199254740993"] --> B{"Send as JSON number?"}
B --> C["Some consumers may<br/>round or lose precision"]
A --> D["Send as JSON string"]
D --> E["Exact digits preserved;<br/>application validates format"]
For prices or financial values, binary floating-point arithmetic can also introduce surprising rounding behavior. Depending on the application contract, common designs include:
{
"amountMinorUnits": 1299,
"currency": "EUR"
}
or:
{
"amount": "12.99",
"currency": "EUR"
}
The first uses integer minor units, such as cents. The second preserves a decimal string for explicit decimal processing. The right design depends on currency rules and the system's numeric libraries.
NaN, Infinity, and -Infinity are not valid standard JSON values. Some programming libraries may allow them by default when generating or accepting JSON-like text, so strict interoperability code should configure its serializer and parser accordingly.
For example, Python can be told to refuse non-standard floating-point output:
import json
json.dumps({"score": float("nan")}, allow_nan=False)
This raises ValueError instead of writing a non-standard NaN token.
JSON is a text format. It does not provide a native byte-array literal for arbitrary binary bytes such as a PDF fragment, an image, or an encryption nonce.
A common approach is to encode the bytes as Base64 text and clearly document that field:
{
"filename": "icon.png",
"contentType": "image/png",
"contentBase64": "iVBORw0KGgoAAAANSUhEUg..."
}
For large files, sending the file separately and putting only metadata or a link in JSON is frequently more efficient than embedding an expanded Base64 value inside a large JSON payload.
A normal JSON array is a perfectly valid way to carry many records:
[
{"event": "login", "employeeId": "E1001"},
{"event": "logout", "employeeId": "E1001"}
]
For logs, imports, and data pipelines, it is often useful to process one record at a time without surrounding the whole collection in one array. JSON Lines, also called newline-delimited JSON or NDJSON, writes one complete JSON value on each line:
{"event":"login","employeeId":"E1001","at":"2026-05-24T09:30:00Z"}
{"event":"view_report","employeeId":"E1001","at":"2026-05-24T09:32:01Z"}
{"event":"logout","employeeId":"E1001","at":"2026-05-24T10:10:12Z"}
The typical file extension is .jsonl.
flowchart LR
F["events.jsonl"] --> L1["Read line 1<br/>parse event"]
F --> L2["Read line 2<br/>parse event"]
F --> L3["Read line 3<br/>parse event"]
L1 --> P["Process or store<br/>without retaining every line"]
L2 --> P
L3 --> P
Three practical rules for JSON Lines:
JSON Lines is not required for all large-data use cases. Streaming parsers can process certain ordinary JSON structures incrementally as well. JSON Lines is popular because record boundaries are simple and appending another event is straightforward.
Given events.jsonl, a program can process line by line:
import json
from pathlib import Path
with Path("events.jsonl").open("r", encoding="utf-8") as file:
for line_number, line in enumerate(file, start=1):
if not line.strip():
raise ValueError(f"Blank line at {line_number} is not a JSON record.")
event = json.loads(line)
print(event["event"], event["employeeId"])
Expected output:
login E1001
view_report E1001
logout E1001
JSON is common in HTTP APIs because it is straightforward to send, inspect, parse, and map into application data.
sequenceDiagram
participant Client
participant API as Web API
participant Parser as JSON parser
participant Schema as Validator
participant Service as Business logic
participant DB as Data store
Client->>API: POST /employees<br/>Content-Type: application/json
API->>API: Apply request-size and access controls
API->>Parser: Decode UTF-8 and parse JSON
Parser-->>API: Structured value
API->>Schema: Validate expected fields and values
Schema-->>API: Valid
API->>Service: Apply business rules
Service->>DB: Store approved data
Service-->>Client: JSON response
For input coming from users, browsers, partners, or mobile clients, consider JSON an untrusted boundary:
Parsing JSON does not make its content safe to use in another language or system.
JSON is simple, but standard JSON deliberately has no comments:
{
"port": 8080
}
This is invalid standard JSON:
{
// Local development port
"port": 8080
}
For machine-to-machine interchange, not having comments avoids uncertainty about whether comments must be retained or discarded. For human-maintained configuration, this limitation can be inconvenient.
Common choices are:
Do not silently send JSON5 features such as comments or trailing commas to a service expecting standard application/json.
| Question | JSON | XML | Protobuf binary |
| Is it human-readable text? | Yes | Yes | Usually no |
| Does it have objects and arrays as direct value types? | Yes | Structure is represented with elements/attributes | Repeated and message fields through schema |
| Does it have a built-in namespace mechanism? | No | Yes | Names are organized through schema/package design |
| Is a schema required to parse the format? | No | No for well-formed parsing | Normally yes |
| Common validation approach | JSON Schema or application validation | XSD, DTD, or application validation | .proto schema plus application rules |
| Convenient for typical web APIs | Very common | Possible, less common for new simple APIs | Common when clients use generated code or gRPC |
| Compact for network transfer | Moderate; compressible text | Often verbose text | Usually compact binary |
| Strong fit | Readable API payloads and configuration-like data | Documents, namespaces, validation-heavy interchange | Typed compact service messages |
Choose JSON when straightforward, readable structured exchange matters. Choose XML when document markup, namespaces, or established XML ecosystems are central. Choose Protobuf when compact binary messages and schema-based generated code are valuable.
| Mistake | Why it causes trouble | Better approach |
| Sending duplicate object names | Parsers may disagree about the retained value | Ensure each object name is unique |
| Depending on object member order | JSON object order is not a portable semantic contract | Use arrays when order carries meaning |
| Sending large identifiers as numbers | Some consumers may lose integer precision | Use validated strings for exact large IDs |
Using NaN or Infinity |
They are not standard JSON values | Use a documented alternative such as null or an explicit state |
| Treating parse success as business validation | Wrong types and unexpected values may still parse | Validate against a contract and business rules |
Assuming null, missing, and empty mean the same thing |
Consumers may behave differently | Define their semantics in the API contract |
| Allowing unlimited request bodies | Large payloads can consume resources | Enforce appropriate size limits |
| Putting binary content directly into strings without encoding rules | Consumers cannot reliably decode it | Use Base64 with explicit documentation or separate binary transfer |
| Hashing pretty-printed JSON directly for signatures | Whitespace and member serialization may differ | Use an agreed canonicalization/signing procedure |
"createdAt" to ambiguous abbreviations such as "ca".camelCase or snake_case for a given API.mindmap
root((JSON))
Values
string
number
boolean
null
object
array
Syntax
Double quotes
No comments
No trailing commas
UTF-8 exchange
Contracts
JSON Schema
Required members
Constraints
Practical concerns
Number precision
Null vs missing
Duplicate names
Request limits
Uses
APIs
Config-like data
JSON Lines logs
Data exchange
The essential ideas are:
null semantics, Unicode, and untrusted-input limits all deserve deliberate design.These notes are aligned with the following primary and security-oriented references: