Posted by Daniel Lemire
https://lemire.me/blog/2025/07/04/just-say-no-to-broken-json/
https://lemire.me/blog/?p=22097

JSON, or JavaScript Object Notation, is a lightweight data-interchange format. It is widely used for transmitting data between a server and a web application, due to its simplicity and compatibility with many programming languages.
The JSON format has a simple syntax with a fixed number of data types such as strings, numbers, Booleans, null, objects, and arrays. Strings must not contain unescaped control characters (e.g., no unescaped newlines or tabs); instead, special characters must be escaped with a backslash (e.g., \n for newline). Numbers must follow valid formats, such as integers (e.g., 42), floating-point numbers (e.g., 3.14), or scientific notation (e.g., 1e-10). The format is specified formally in the RFC 8259.
Irrespective of your programming language, there are readily available libraries to parse and generate valid JSON. Unfortunately, people who have not paid attention to the specification often write buggy code that leads to malformed JSON. Let us consider the strings, for example. The specification states the following:
All Unicode characters may be placed within the
quotation marks, except for the characters that MUST be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
The rest of the specification explains how characters must be escaped. For example, any linefeed character must be replaced by the two characters ‘\n’.
Simple enough, right? Producing valid JSON is definitively not hard. Programming a function to properly escape the characters in a string can be done by ChatGPT and it only spans four or five lines of code, at most.
Sadly, some people insist on using broken JSON generators. It is a recurring problem as they later expect parsers to accept their ill-formed JSON. By breaking interoperability you lose the core benefit of JSON.
Let me consider a broken JSON document:
{"key": "value\nda"}
My convention is that \n is the one-byte ASCII control character linefeed. This JSON is not valid. What happens when you try to parse it?
Let us try Python:
import json
json_string = '{"key": "value\nda"}'
data = json.loads(json_string)
This program fails with the following error:
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 15 (char 14)
So the malformed JSON cannot be easily processed by Python. Not good.
What about JavaScript?
const jsonString = '{"key": "value\nda"}';
let data = JSON.parse(jsonString);
This fails with
SyntaxError: Bad control character in string literal in JSON at position 14 (line 1 column 15)
Not great.
What about Java? The closest thing to a default JSON parser in Java is jackson. Let us try.
import com.fasterxml.jackson.databind.ObjectMapper;
import java.util.Map;
void main() {
String jsonString = "{\"key\": \"value\nda\"}";
Map<String, Object> data = parseJson(jsonString);
}
I get
JSON parsing error: Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value
What about C#?
using System.Text.Json;
string jsonString = "{"key": "value\nda"}";
using JsonDocument doc = JsonDocument.Parse(jsonString);
And you get, once again, an error.
In a very real sense, the malformed JSON document I started with is not JSON. By accommodating buggy systems instead of fixing them, we create workarounds that degrade our ability to work productively.
We have a specific name for this effect: technical debt. Technical debt refers to the accumulation of compromises or suboptimal solutions in software development that prioritize short-term progress but complicate long-term maintenance or evolution of the system. It often arises from choosing quick fixes, such as coding around broken systems instead of fixing them.
To avoid technical debt, systems should simply reject invalid JSON. They pollute our data ecosystem. Producing correct JSON is easy. Bug reports should be filled with people who push broken JSON. It is ok to have bugs, it is not ok to expect the world to accommodate them.
https://lemire.me/blog/2025/07/04/just-say-no-to-broken-json/
https://lemire.me/blog/?p=22097