Binary files often look broken in text editors because the editor is applying a text interpretation rule to bytes that were structured for a different interpreter.
Learning Question
Why can an image, .class file, archive, or executable look like nonsense when opened as text even when the file is valid?
The usual mistake is to assume that visible nonsense means the file itself is corrupted.
Sometimes it is corrupted.
But often the file is fine.
The problem is that the wrong interpretation rule is being used.
What A Text Editor Tries To Do
A text editor is designed to display characters.
To do that, it usually tries to:
- read the file’s bytes
- choose or assume a text encoding
- decode byte sequences into characters
- render those characters on screen
That works well for files whose primary structure is text.
It works poorly for files whose primary structure is not text.
The text editor has not discovered the file’s true meaning.
It has only tried a text-decoding view.
Why Non-Text Files Can Look Broken
Images, class files, executable files, archives, compressed files, and many database files are not primarily text documents.
They may contain:
- headers
- magic numbers
- numeric fields
- offsets
- compressed blocks
- checksums
- image data
- bytecode
- machine code
- metadata tables
Those byte regions can be valid and meaningful under the correct format rules.
But a text editor may treat them as character bytes.
When the byte sequence does not form valid text under the selected encoding, the editor may show:
- replacement characters such as
� - dots or boxes
- strange symbols
- partial readable fragments surrounded by unreadable regions
- long lines with no sensible word boundaries
That output means:
The bytes are not being successfully interpreted as ordinary text.
It does not automatically mean:
The file is broken.
Replacement Characters
The replacement character � is commonly used when a decoder encounters bytes that are invalid for the selected text encoding.
For example, UTF-8 has rules for which byte sequences are valid.
If a byte sequence violates those rules, a decoder may replace the invalid sequence with � rather than failing completely.
The replacement character is a display symptom.
It says that the text-decoding attempt could not map some bytes to valid characters.
It does not prove that the original file is invalid under its real file format.
Example: Image File Opened As Text
A PNG file is a structured binary file.
It may start with recognizable signature bytes, including bytes that can display as PNG.
But the file is not a plain text document.
A PNG decoder reads the bytes according to PNG format rules:
- signature
- chunks
- lengths
- chunk types
- compressed image data
- checksums
A text editor ignores most of that structure and attempts character decoding.
So the same valid image can display as broken-looking text in an editor while displaying correctly in an image viewer.
The difference is the interpreter.
Example: Java .class File Opened As Text
A Java .class file is a JVM-defined binary format.
It is not Java source code.
It begins with class-file magic bytes commonly written in hex as:
CA FE BA BEThen it contains version information, a constant pool, field metadata, method metadata, attributes, and bytecode.
A text editor may show fragments of class names or string literals, but that does not mean the whole file is text.
The JVM, javap, and class-file tools understand the file because they use class-file format rules.
For the deeper JVM-owned explanation, see What a Class File Contains.
Example: Executable File Opened As Text
An executable file is also a structured binary file.
Depending on the platform, it may use a format such as ELF, PE, or Mach-O.
It can contain:
- machine-code instruction bytes
- read-only data
- initialized data
- section or segment metadata
- dynamic-linking information
- symbol or debug information
Some readable strings may appear inside the file.
But the file is not primarily a text document.
The operating-system loader, dynamic loader, CPU, and diagnostic tools interpret it through executable-format and machine rules.
For deeper C-oriented boundaries, see From Source Code to Executable File and From Executable File to Running Process.
Small Experiment
These commands assume a Unix-like shell such as WSL Ubuntu with Python 3 available.
Create a file containing bytes that are not valid UTF-8 text and inspect both the bytes and a replacement-character decode:
printf '\xff\xfeA' > not-utf8.bin
xxd -g 1 not-utf8.bin
python3 - <<'PY'
from pathlib import Path
data = Path("not-utf8.bin").read_bytes()
print(data.decode("utf-8", errors="replace"))
PYThe exact terminal rendering may vary, but the decoded output may show replacement characters near A.
What To Observe
The hex dump shows exact byte values such as:
ff fe 41The text-decoding attempt cannot decode ff fe as valid UTF-8.
With replacement enabled, those invalid byte sequences may become visible as �.
The byte 41 can still display as A.
What This Proves
The broken-looking text is produced by a failed or partial text interpretation.
The bytes still exist exactly as bytes.
A different interpreter could give the same byte sequence a different meaning if it had a format rule where those byte values were valid.
This is the same reason an image viewer, JVM, archive tool, executable loader, or disassembler can understand files that look unreadable in a text editor.
What Broken-Looking Text Means
This chapter does not teach full PNG, class-file, executable, archive, compression, or loader formats.
Its purpose is to preserve one diagnostic distinction:
A file being unreadable as text is different from a file being corrupted.
Wrong interpretation is often enough to explain the broken-looking display.
How To Diagnose The View
A text editor is one interpreter.
It tries to decode bytes as characters.
Many files are valid under different rules:
- image decoder rules
- class-file rules
- archive rules
- executable-format rules
- loader rules
- CPU instruction rules
When a file looks broken in a text editor, ask:
Is the file invalid, or am I using a text tool on bytes whose primary meaning belongs to a different format?
Wrong Reader Before Bad File
Binary files look broken in text editors because text editors apply text decoding to bytes that may be structured for another interpreter.
The correct tool can understand the same bytes because it applies the correct format, runtime, or machine rule.
The broken-looking display is evidence of a mismatched interpretation, not automatic evidence of corruption.