1 Introduction
2 Ground Rules

Building a File System
3 File Systems
4 File Content Data Structure
5 Allocation Cluster Manager
6 Exceptions and Emancipation
7 Base Classes, Testing, and More
8 File Meta Data
9 Native File Class
10 Our File System
11 Allocation Table
12 File System Support Code
13 Initializing the File System
14 Contiguous Files
15 Rebuilding the File System
16 Native File System Support Methods
17 Lookups, Wildcards, and Unicode, Oh My
18 Finishing the File System Class

The Init Program
19 Hardware Abstraction and UOS Architecture
20 Init Command Mode
21 Using Our File System
22 Hardware and Device Lists
23 Fun with Stores: Partitions
24 Fun with Stores: RAID
25 Fun with Stores: RAM Disks
26 Init wrap-up

The Executive
27 Overview of The Executive
28 Starting the Kernel
29 The Kernel
30 Making a Store Bootable
31 The MMC
32 The HMC
33 Loading the components
34 Using the File Processor
35 Symbols and the SSC
36 The File Processor and Device Management
37 The File Processor and File System Management
38 Finishing Executive Startup

Users and Security
39 Introduction to Users and Security
40 More Fun With Stores: File Heaps
41 File Heaps, part 2
42 SysUAF
43 TUser
44 SysUAF API

UCL
45 Shells and UCL

Glossary/Index


Download sources
Download binaries

File Systems

It might seem strange to start with the UOS file system, but it turns out that file system code has some utility outside of UOS. For instance, we can use it to create container files for our applications running on Microsoft Windows. In any case, file support is a central feature for any modern Operating System.

Before we go any further, we should recognize that there is a difference between the file system software and the data structure that is managed by that software. That is, the on-disk structure of a file system simply defines how the file data and folder relationships are stored. The software accesses that data structure to create/read/write/delete files, and usually adds some rules and features (such as locking, which we will discuss later). For now, let us set aside the software side of this and take a brief survey of the data structures used by existing file systems in the real world.

There are multiple file systems in use, depending upon the media that the files reside on. For instance, on a magnetic tape, the media is inherently sequential. There is, therefore, no hierarchy since each file simply occurs, one after another as the tape is read or written. Note that the file system software can present a hierarchical view of the files, but that is not how the files are stored - remember, we are only discussing the data structure on the media for now. Likewise, CDs contain data in a sequential format. But since the read/write heads can move over the CD like a hard disk, a CD can be accessed in a semi-random manner. However, like tapes, they can only be written to in a sequential manner. USB storage devices have their own data structures as well. And we will eventually discuss all of these, but for now we will focus on standard hard disk file system structures.

The first thing we note is that files can be of varying sizes, from 0 bytes to as large as the disk they reside on (or even larger, if the file can span multiple disks). Perhaps there are some specialized cases, where fixed-size files would be useful, but we want the native UOS file system structure to be general-purpose. When files can be any size on the disk, it presents a challenge in how to store and retrieve data from the files. Especially when multiple files may be written to alternatively, thus extending several of them over the course of running one program. And the problem only becomes worse if there are multiple programs running at the same time. Let's say you set aside 4K of disk space for a file. As soon as you write the 4097th byte to that file, it has to be extended to hold the data beyond the 4K that you set aside for it. It may be possible, sometimes, to simply allocate the next 4K on the disk for the file. But it is more likely that the first file area cannot be expanded because immediately after the 4K area, the next 4K area was allocated for another file. This is an example of how it could look:

How can we solve this? Well, one approach could be to move the data following the first 4K buffer so that the file can be expanded in place. In fact, this approach could simplify our data structure and the file system code since each file's data would be contiguous. But there would be a performance issue with having to move the next file's data to a new location on the disk in order to make room for the file each time it is expanded. In fact, a simple write to a file could result in a long pause as the data is moved around on the disk, especially if the next file was quite large. Further, this approach would tend to fragment the free space on the disk such that there may not be a large-enough contiguous free area to move a file into. You could have half of your disk space free, but not have enough free space in one place to be able to extend even a small file. On the other hand, reading the file's data takes very little overhead compared to other means of access. There are certain applications, such as when files are rarely extended, where this might be the right approach. However, in a general case, this is not an ideal solution for the reasons already stated.

How do most file systems handle efficient extending of files on disk? They allocate the files in pieces, and keep track of these pieces. When the file is accessed, the file system trapses down this list of file "extents" or "fragments" to get to the desired data. Since the next extent of the file can be placed where ever there is free space, the disk space is used efficiently. However, when files get too fragmented, system performance can suffer due to the overhead of moving the disk read/write heads all over to get these extents. That is why disk defragmenters are so useful on most computers - they gather all the fragments together to reduce the overhead.

In any programming, there are three things that you can to optimize: speed of execution, amount of storage required, and complexity of the algorithm. Essentially, you can trade one against the other two. That is, you can choose to make it run fast and take less time, but only at the cost of extra code/algorithm complexity. Or you can make the code and data structures simple, and get decent speed, but only at the cost of additional storage (memory and/or disk). All programming ends up being a compromise between these three factors.

Chains of file extents seems like a good compromise, and is a widely used (and therefore well understood) implementation. I can't think of a better, overall, approach, and so UOS will use this approach to store file data. And that is what we will tackle in the next article.