Why Do Binary Files Look Broken in Text Editors?

Binary files often look broken in text editors because the editor is applying a text interpretation rule to bytes that were structured for a different interpreter.

Learning Question

Why can an image, .class file, archive, or executable look like nonsense when opened as text even when the file is valid?

The usual mistake is to assume that visible nonsense means the file itself is corrupted.

Sometimes it is corrupted.

But often the file is fine.

The problem is that the wrong interpretation rule is being used.

What A Text Editor Tries To Do

A text editor is designed to display characters.

To do that, it usually tries to:

read the file’s bytes
choose or assume a text encoding
decode byte sequences into characters
render those characters on screen

That works well for files whose primary structure is text.

It works poorly for files whose primary structure is not text.

The text editor has not discovered the file’s true meaning.

It has only tried a text-decoding view.

Why Non-Text Files Can Look Broken

Images, class files, executable files, archives, compressed files, and many database files are not primarily text documents.

They may contain:

headers
magic numbers
numeric fields
offsets
compressed blocks
checksums
image data
bytecode
machine code
metadata tables

Those byte regions can be valid and meaningful under the correct format rules.

But a text editor may treat them as character bytes.

When the byte sequence does not form valid text under the selected encoding, the editor may show:

replacement characters such as �
dots or boxes
strange symbols
partial readable fragments surrounded by unreadable regions
long lines with no sensible word boundaries

That output means:

The bytes are not being successfully interpreted as ordinary text.

It does not automatically mean:

The file is broken.

Replacement Characters

The replacement character � is commonly used when a decoder encounters bytes that are invalid for the selected text encoding.

For example, UTF-8 has rules for which byte sequences are valid.

If a byte sequence violates those rules, a decoder may replace the invalid sequence with � rather than failing completely.

The replacement character is a display symptom.

It says that the text-decoding attempt could not map some bytes to valid characters.

It does not prove that the original file is invalid under its real file format.

Example: Image File Opened As Text

A PNG file is a structured binary file.

It may start with recognizable signature bytes, including bytes that can display as PNG.

But the file is not a plain text document.

A PNG decoder reads the bytes according to PNG format rules:

signature
chunks
lengths
chunk types
compressed image data
checksums

A text editor ignores most of that structure and attempts character decoding.

So the same valid image can display as broken-looking text in an editor while displaying correctly in an image viewer.

The difference is the interpreter.

Example: Java `.class` File Opened As Text

A Java .class file is a JVM-defined binary format.

It is not Java source code.

It begins with class-file magic bytes commonly written in hex as:

CA FE BA BE

Then it contains version information, a constant pool, field metadata, method metadata, attributes, and bytecode.

A text editor may show fragments of class names or string literals, but that does not mean the whole file is text.

The JVM, javap, and class-file tools understand the file because they use class-file format rules.

For the deeper JVM-owned explanation, see What a Class File Contains.

Example: Executable File Opened As Text

An executable file is also a structured binary file.

Depending on the platform, it may use a format such as ELF, PE, or Mach-O.

It can contain:

machine-code instruction bytes
read-only data
initialized data
section or segment metadata
dynamic-linking information
symbol or debug information

Some readable strings may appear inside the file.

But the file is not primarily a text document.

The operating-system loader, dynamic loader, CPU, and diagnostic tools interpret it through executable-format and machine rules.

For deeper C-oriented boundaries, see From Source Code to Executable File and From Executable File to Running Process.

Small Experiment

These commands assume a Unix-like shell such as WSL Ubuntu with Python 3 available.

Create a file containing bytes that are not valid UTF-8 text and inspect both the bytes and a replacement-character decode:

printf '\xff\xfeA' > not-utf8.bin
xxd -g 1 not-utf8.bin
python3 - <<'PY'
from pathlib import Path
 
data = Path("not-utf8.bin").read_bytes()
print(data.decode("utf-8", errors="replace"))
PY

The exact terminal rendering may vary, but the decoded output may show replacement characters near A.

What To Observe

The hex dump shows exact byte values such as:

ff fe 41

The text-decoding attempt cannot decode ff fe as valid UTF-8.

With replacement enabled, those invalid byte sequences may become visible as �.

The byte 41 can still display as A.

What This Proves

The broken-looking text is produced by a failed or partial text interpretation.

The bytes still exist exactly as bytes.

A different interpreter could give the same byte sequence a different meaning if it had a format rule where those byte values were valid.

This is the same reason an image viewer, JVM, archive tool, executable loader, or disassembler can understand files that look unreadable in a text editor.

What Broken-Looking Text Means

This chapter does not teach full PNG, class-file, executable, archive, compression, or loader formats.

Its purpose is to preserve one diagnostic distinction:

A file being unreadable as text is different from a file being corrupted.

Wrong interpretation is often enough to explain the broken-looking display.

How To Diagnose The View

A text editor is one interpreter.

It tries to decode bytes as characters.

Many files are valid under different rules:

image decoder rules
class-file rules
archive rules
executable-format rules
loader rules
CPU instruction rules

When a file looks broken in a text editor, ask:

Is the file invalid, or am I using a text tool on bytes whose primary meaning belongs to a different format?

Wrong Reader Before Bad File

Binary files look broken in text editors because text editors apply text decoding to bytes that may be structured for another interpreter.

The correct tool can understand the same bytes because it applies the correct format, runtime, or machine rule.

The broken-looking display is evidence of a mismatched interpretation, not automatic evidence of corruption.

Insight Vault

Browse

Why Do Binary Files Look Broken in Text Editors?

Learning Question

What A Text Editor Tries To Do

Why Non-Text Files Can Look Broken

Replacement Characters

Example: Image File Opened As Text

Example: Java `.class` File Opened As Text

Example: Executable File Opened As Text

Small Experiment

What To Observe

What This Proves

What Broken-Looking Text Means

How To Diagnose The View

Wrong Reader Before Bad File

Backlinks

Graph View

Table of Contents

Insight Vault

Browse

Why Do Binary Files Look Broken in Text Editors?

Learning Question

What A Text Editor Tries To Do

Why Non-Text Files Can Look Broken

Replacement Characters

Example: Image File Opened As Text

Example: Java .class File Opened As Text

Example: Executable File Opened As Text

Small Experiment

What To Observe

What This Proves

What Broken-Looking Text Means

How To Diagnose The View

Wrong Reader Before Bad File

Backlinks

Graph View

Table of Contents

Example: Java `.class` File Opened As Text