One important thing to start with is that historically the notion of a “file” evolved very differently in the PC/Unix world versus the classic mainframe world.
The difference is not just technical; it reflects different philosophies of computing:
- PC/Unix systems evolved around streams of bytes
- Mainframe systems evolved around business records
That is why on mainframes the natural abstraction is often a dataset of records, not an unstructured byte stream.
In Unix, Linux, Windows, DOS, and similar systems, a file is fundamentally:
a sequence of bytes with no inherent structure
The operating system does not know:
- where lines begin
- where records begin
- what fields exist
- whether the content is text, database pages, images, or executable code
For example:
48 65 6C 6C 6F 0A 41 42 43
The OS sees only bytes.
Applications impose structure themselves:
- text editors interpret
\n - databases interpret pages
- CSV readers interpret commas
- compilers interpret syntax
This is called a byte-stream model.
Unix strongly standardized this model:
“everything is a stream of bytes”
Even devices, sockets, and pipes follow this philosophy.
Classic mainframe operating systems were designed primarily for:
- banking
- payroll
- insurance
- census systems
- airline reservations
- batch processing
These workloads naturally operate on:
- fixed forms
- account records
- transaction records
- indexed business data
So the OS itself became record-oriented.
A mainframe “file” (often called a dataset) is commonly:
a collection of records with structure known by the OS
The operating system may know:
- record length
- whether records are fixed or variable
- block size
- indexing method
- access method
This is radically different from Unix.
Suppose customer records are exactly 80 bytes.
Mainframe OS:
Record 1 = 80 bytes
Record 2 = 80 bytes
Record 3 = 80 bytes
The OS understands record boundaries.
Applications read:
READ CUSTOMER-FILEand receive one logical record.
No parsing of delimiters is necessary.
Many mainframe systems support indexed datasets natively.
Example:
KEY = customer number
VALUE = customer record
The OS or access method handles:
- indexes
- record lookup
- sequential scans
- keyed access
This existed decades before modern relational databases became dominant.
Classic IBM mainframes introduced sophisticated storage abstractions.
Important IBM access methods include:
| Access Method | Meaning |
|---|---|
| BSAM | Basic Sequential Access Method |
| QSAM | Queued Sequential Access Method |
| VSAM | Virtual Storage Access Method |
| ISAM | Indexed Sequential Access Method |
Especially important:
On IBM mainframes, IBM z/OS VSAM datasets may be:
- sequential
- indexed
- relative-record
VSAM is not “just files”. It is closer to a lightweight storage engine built into the OS.
read(fd, buffer, 4096);Reads arbitrary bytes.
Application decides meaning.
READ EMPLOYEE-FILEReads exactly one logical record.
The OS/access method handles:
- blocking
- buffering
- locating records
- decoding record structure
Mainframes historically optimized heavily for tape and expensive disk I/O.
Records were grouped into physical blocks.
Example:
Block
├── Record
├── Record
├── Record
The OS knew:
- logical record size
- physical block size
Unix generally hides this from applications.
- z/OS
- OS/360
- MVS
- OS/390
These are the canonical record-oriented systems.
Datasets are fundamental there.
OpenVMS also has rich record-oriented files.
Supports:
- fixed records
- variable records
- indexed files
Very different from Unix.
IBM i is especially interesting.
Historically:
- “files” were closer to database tables
- the system deeply integrated DB2
Traditional filesystem semantics were secondary.
- Linux
- Windows
- macOS
- BSD
- Solaris
- AIX
- HP-UX
All use byte-stream files.
Text lines are conventions, not OS concepts.
The byte-stream model proved more flexible.
Advantages:
- simpler abstraction
- language independence
- easier portability
- works for arbitrary binary data
- composable tools and pipes
The Unix philosophy:
keep the OS simple, push structure to applications
became dominant.
Business workloads strongly benefit from record semantics:
- COBOL integration
- fixed forms
- indexed retrieval
- batch processing
- guaranteed structure
Mainframes optimized for:
- reliability
- transaction throughput
- massive batch jobs
not developer flexibility.
Even on IBM mainframes today:
- Unix environments exist
- POSIX APIs exist
- byte-stream files exist
For example, z/OS contains:
- traditional datasets
- Unix System Services (USS)
So modern mainframes support both worlds simultaneously.
File = bytes
Structure belongs to applications.
File = collection of records
Structure partially belongs to the OS and access methods.
Because mainframe files have structure, many properties become part of metadata:
- record size
- key fields
- blocking
- access mode
Copying a file incorrectly may lose semantic meaning.
On Unix:
cp file1 file2is usually enough.
On mainframes, copying datasets historically required preserving dataset attributes as well.