1 Introduction
2 Ground Rules

Building a File System
3 File Systems
4 File Content Data Structure
5 Allocation Cluster Manager
6 Exceptions and Emancipation
7 Base Classes, Testing, and More
8 File Meta Data
9 Native File Class
10 Our File System
11 Allocation Table
12 File System Support Code
13 Initializing the File System
14 Contiguous Files
15 Rebuilding the File System
16 Native File System Support Methods
17 Lookups, Wildcards, and Unicode, Oh My
18 Finishing the File System Class

The Init Program
19 Hardware Abstraction and UOS Architecture
20 Init Command Mode
21 Using Our File System
22 Hardware and Device Lists
23 Fun with Stores: Partitions
24 Fun with Stores: RAID
25 Fun with Stores: RAM Disks
26 Init wrap-up

The Executive
27 Overview of The Executive
28 Starting the Kernel
29 The Kernel
30 Making a Store Bootable
31 The MMC
32 The HMC
33 Loading the components
34 Using the File Processor
35 Symbols and the SSC
36 The File Processor and Device Management
37 The File Processor and File System Management
38 Finishing Executive Startup

Users and Security
39 Introduction to Users and Security
40 More Fun With Stores: File Heaps
41 File Heaps, part 2
42 SysUAF
43 TUser
44 SysUAF API

UCL
45 Shells and UCL

Glossary/Index


Download sources
Download binaries

File System Support Code

We're getting close to writing our file system class, but we still need some support classes before we can start.

File Names
As we know, the file header contains a 4-byte integer for the file's name. Our file system needs to translate between an actual name and that integer.
Older operating systems only supported 7-bit (and visible) ASCII characters, or a subset thereof, in file names (ASCII includes the characters that most of us are familiar with on our PC keyboard). ASCII values are single-byte integer values, where each integer value corresponds to a visible character. For instance, the value 65 is represented with the letter "A". 33 represents the exclamation point ("!"). In ASCII, a byte value of 0 is a NUL and is often used as a string terminator. NULs do not show up as a visible glyph since they were originally intended as special informational codes (NUL was used as a form of empty "fill" space back in the days of teletypes, for instance). In fact, all ASCII values less than 32 were control codes that had no glyph associated with them (a "glyph" is the actual symbol you see - whether a letter, a number, punctuation, or some other visible symbol). To make things more complicated, various operating systems implement 8-bit ASCII, where they extend the 7-bit ASCII character set. Windows TrueType, for instance, has a set of special characters for values 128 to 255. MSDOS defined a different set of symbols for values 128 to 255. And certain fonts, such as Wingdings, associate a completely different set of glyphs for the 255 possible character values.
It was this growing confusion that led to newer operating systems, such as Windows, supporting Unicode characters. There are several types of Unicode: 32-bit, 16-bit, and UTF-8. 16-bit Unicode is a subset of 32-bit Unicode. Like ASCII, each value is associated with a glyph. The difference being that ASCII provides 7-bit numbers (up to 127 glyphs) while 32-bit Unicode supports up to 4 billion glyphs, and each character is essentially an unsigned 32-bit integer value. When represented as data, that means that any of the bytes could be any value, including 0. UTF-8 characters are variable in length. That is, a given UTF-8 character could be anywhere from 1 to 5 bytes in length. For backwards compatibility, the lowest 127 characters in Unicode match the 7-bit ASCII characters. In terms of UTF-8, this means that a string consisting solely of 7-bit ASCII characters will only require one byte per character. Any Unicode value less than 268435455 will require 4 bytes or less, which is as good as 32-bit Unicode. Most characters will take less than that. So, in most cases, using the UTF-8 encoding for names will save us space over either of the other Unicode formats. For "normal" characters, it takes the same space as ASCII codes. Because of the way UTF-8 is encoded, a byte value of 0 also indicates a NUL. It would be very unfriendly to the user if file names contained "characters" that were invisible. How would they tell files apart if the names only differed by invisible codes?
UOS will use UTF-8 as the encoding for native file names. This provides us with the greatest flexibility for file naming. Certain characters are not allowed, including any invisible codes, wildcard characters "*" and "?", and parsing-related characters "\", "/", and ":". We will discuss the use of these characters later, which will make clear why they will not be allowed in file names. But we still haven't described how we translate from a UTF-8 string to a 4-byte integer value.

String table
In my research I discovered that a great number of file names on a disk were duplicates. For instance, there may be a folder named "backup" in several different places. For a mostly-full 1 Terabyte disk, if we ignore the duplicates, we end up with around 200,000 unique names, comprising about 10 Mb of storage space (including overhead that we will discuss). Obviously, this will vary from user to user, and from disk to disk. If we used a fixed 127-byte field for file names in our file headers, the space represented by all the names would be around 30 Mb. The scheme we use will save us space and also allow names of arbitrary length. Granted, since we are using a 4-byte value for the file name, we will be wasting a few bytes on files with really short names, but far less than what we would waste with more traditional file header layouts. But we aren't doing this for space savings, although that is a nice benefit. As you may recall, the point was to keep the file headers as small as possible.
The way we will handle the value/name mapping is by using a file on the store that will contain all of the names used by files on that store. The layout of each entry in the string table is as follows:
Byte OffsetDescription
0-3Reference count
4String length
5-nString contents

Since some of our strings are going to be used in multiple places, we need to keep track of how many references there are, so that when the string is no longer used anywhere, we can reuse the space for something else. We use a single byte for the length, which allows for a value from 0 to 255. However, since we won't allow zero-length strings in our table, we repurpose the value of 0 to mean 256. This means that each string can be anywhere from 1 to 256 bytes long.
A reference to a string is simply the offset in the table where the string entry exists. Since we use 32-bit integers for our string references, that means that our maximum string table size is 4 Gb (plus a few bytes if a string entry begins right at the end of the 4 Gb range). Thus, translating from a string reference value to the actual string is simple and quick - we simply look up the string entry by reading the string table starting at that offset.
When a reference to a string is removed (for instance, when a file is deleted), we decrement the reference count for that string. If the reference count reaches 0, we know that no one is using that string anymore. It is essential that we be able to reuse that space or else we will eventually run out of our 4 Gb maximum string table space.
How do we handle more than 4 billion references to a string? This seems extremely unlikely, but we can handle it by simply creating another entry with the same string value. We can do this indefinitely, or until we run out of our 4 Gb string table space.
We've handled getting an existing string, and deleting a string. How do we add a new string? First, we need to see if the string is already in the table. We could read through the table, looking at each entry until we found the string or hit the end. This would be very slow, especially for a big disk with a large string table - not to mention the complexity of handling deleted entries as we read through the table. The way that we are going to handle this is by keeping a separate index into the string table. It will be kept in sorted order so that we can look up a string location by doing a simple binary search. If the string is found, we use the existing offset and increment the entry's reference count. If the string is not found, we add the string to the table, and insert the offset of the new entry into the sorted list at the proper location.

My first attempt at a string table class used a single-linked list of deleted entries (the entry at offset 0 was the head of the list), but the performance wasn't very good, although the space utilization was good. My second attempt was to use an allocation table to keep track of used/available space. While the allocation table uses additional space, the performance was double or triple that of the first attempt. That is a tradeoff I'm willing to make. Given that the minimum size of our string entries is 6 bytes, we will use an allocation table resolution of 8 (6 rounded up to the next power of 2). That means for every 64 bytes in the table, there is one byte of allocation table.

Here is our string table class:

type TUOS_String_Table = class( TCommon_COM_Interface )
                             public // Constructors and destructors...
                                constructor Create ;
                                destructor Destroy ; override ;

                             private // Instance data...
                                 _AT : TAT64 ;
                                 _Buffer : PChar ;
                                 _Buffer_Length : longint ;
                                 _Index : TInteger_List ;
                                 _Position : longint ;
                                 _Dirty_Data : boolean ;
                                 _Dirty_Index : boolean ;
                                 _Data_File : TUOS_Native_File ;
                                 _Index_File : TUOS_Native_File ;
                                 _AT_File : TUOS_Native_File ;

                             public // Property handlers...
                                 function Get_AT_FIle : TUOS_Native_File ;
                                 procedure Set_AT_File( Value : TUOS_Native_File ) ;
                                 function Get_Data_File : TUOS_Native_File ;
                                 procedure Set_Data_File( Value : TUOS_Native_File ) ;
                                 function Get_Index_File : TUOS_Native_File ;
                                 procedure Set_Index_File( Value : TUOS_Native_File ) ;

                             private // Internal utility routines...
                                 function Indexof( const Value : string ) : longint ;
                                 procedure Read( var Data ; Size : longint ) ;
                                 function Read_Longint : longint ;
                                 procedure Seek( Value : longint ) ;
                                 procedure Update ;
                                 procedure Write( const Value : string ) ;
                                 procedure Write_Longint( Value : longint ) ;

                             public // API...
                                 function Add_String( Value : string ) : longint ;
                                 procedure Delete_String( Index : longint ) ;
                                 function Get_String( Value : longint ) : string ;

                                 property AT_File : TUOS_Native_File
                                      read Get_AT_FIle
                                      write Set_AT_File ;
                                 property Data_File : TUOS_Native_File
                                     read Get_Data_File
                                     write Set_Data_File ;
                                 property Index_File : TUOS_Native_File
                                     read Get_Index_File
                                     write Set_Index_File ;
                         end ; // TUOS_String_Table

The interface to this class is simple: besides the getters and setters, there are only three methods in the API. The use of Delphi properties is to make the code look cleaner. Note the files. The class can operate in two modes: memory-based and file-based. If the files are not assigned to the class, it will operate in a memory buffer which will run faster. However, that increases the memory footprint of each instance of a UOS file system. We use the file-based approach, because it reduces the footprint. At some point in the future, we might allow the user to specify how they want to manage the string table for a given store. We store the data in three different files: the string table itself is in the Data file, the sorted index is in the Index file, and the allocation table is in the AT file.
_Buffer is our memory buffer (the class defaults to memory-mode until we assign files to it). _Index is a list of integers, which is essentially a dynamic array. _AT is our allocation table, and it is always memory-resident. When it is updated, it is written out to the AT file (if one is assigned).

Here are the constructor and destructor:

constructor TUOS_String_Table.Create ;

begin
    inherited Create ;

    _Index := TInteger_List.Create ;
    _Buffer := allocmem( 64 ) ;
    _Buffer_Length := 64 ;
    _Dirty_Data := True ;
    _AT := TAT64.Create( 8 ) ;
    _AT.Size := 1 ;
end ;


destructor TUOS_String_Table.Destroy ;

begin
    Update ;
    Data_File := nil ;
    Index_File := nil ;
    _Index.Free ;
    _Index := nil ;
    freemem( _Buffer ) ;
    _Buffer := nil ;

    inherited Destroy ;
end ;

Really, the only thing needing explanation here is the call to Update, which simply makes sure that any unsaved data is written out, if applicable.

Here are the getters and setters.

function TUOS_String_Table.Get_AT_File : TUOS_Native_File ;

begin
    Result := _AT_File ;
end ;


procedure TUOS_String_Table.Set_AT_File( Value : TUOS_Native_File ) ;

begin
    if( Value <> nil ) then
    begin
        Value.Attach ;
        if( Value.Get_Stream_Size( 0 ) = 0 ) then // File is empty
        begin
            Value.Write( 0, 0, _AT.Size, _AT.Get_Data^ ) ;
        end ;
    end ;
    if( _AT_File <> nil ) then
    begin
        _AT_File.Detach ;
    end ;
    _AT_File := Value ;
end ;


function TUOS_String_Table.Get_Data_File : TUOS_Native_File ;

begin
    Result := _Data_File ;
end ;


procedure TUOS_String_Table.Set_Data_File( Value : TUOS_Native_File ) ;

begin
    if( Value <> nil ) then
    begin
        Value.Attach ;
        if( Value.Get_Stream_Size( 0 ) = 0 ) then // File is empty
        begin
            Value.Write( 0, 0, _Buffer_Length, _Buffer^ ) ;
        end ;
        freemem( _Buffer ) ;
        _Buffer := nil ;
        _Dirty_Data := False ;
    end else
    begin
        _Buffer := allocmem( 64 ) ;
        _Buffer_Length := 64 ;
        _Dirty_Data := True ;
    end ;
    if( _Data_File <> nil ) then
    begin
        _Data_File.Detach ;
    end ;
    _Data_File := Value ;
end ;


function TUOS_String_Table.Get_Index_File : TUOS_Native_File ;

begin
    Result := _Index_File ;
end ;


procedure TUOS_String_Table.Set_Index_File( Value : TUOS_Native_File ) ;

begin
    if( Value <> nil ) then
    begin
        Value.Attach ;
        if( Value.Get_Stream_Size( 0 ) = 0 ) then // New file
        begin
            Value.Write( 0, 0, _Index.Count * 4, _Index.List^ ) ;
        end else
        begin
            _Index.Count := Value.Get_Stream_Size( 0 ) div 4 ;
            Value.Read( 0, 0, _Index.Count * 4, _Index.List^ ) ;
        end ;
        _Dirty_Index := False ;
    end ;
    if( _Index_File <> nil ) then
    begin
        _Index_File.Detach ;
    end ;
    _Index_File := Value ;
end ;

The getters are simple enough. The setters are typical of the setters we've used in previous classes. However, one addition here is that when a file is assigned, we need to either save our current data to them (if they are empty files) or load internal data from the file (if the file is not empty). We only load data from the index and allocation table files, because if a file is assigned, it means we are not using the memory buffer for the string data - so we never load from the data file. And we release the buffer space if a data file is assigned.

Our internal methods are basically file I/O operation wrappers that direct the request either to the file or to memory buffer, as appropriate.

procedure TUOS_String_Table.Read( var Data ; Size : longint ) ;

begin
    if( Data_File = nil ) then
    begin
        move( _Buffer[ _Position ], Data, Size ) ;
    end else
    begin
        Data_File.Read( 0, _Position, Size, Data ) ;
    end ;
    _Position := _Position + Size ;
end ;


function TUOS_String_Table.Read_Longint : longint ;

begin
    Read( Result, sizeof( Result ) ) ;
end ;


procedure TUOS_String_Table.Seek( Value : longint ) ;

var P : cardinal ;

begin
    _Position := Value ;
    if( Data_File = nil ) then
    begin
        if( _Position > _Buffer_Length ) then
        begin
            _Position := _Buffer_Length ;
        end ;
    end else
    begin
        P := Data_File.Get_Stream_Size( 0 ) ;
        if( _Position > P ) then
        begin
            _Position := P ;
        end ;
    end ;
end ;


procedure TUOS_String_Table.Write( const Value : string ) ;

var L : cardinal ;

begin
    if( Data_File = nil ) then
    begin
        if( _Position + length( Value ) > _Buffer_Length ) then
        begin
            L := ( _Buffer_Length + length( Value ) + 5 + 63 ) div 64 ;
            _AT.Size := L ;
            _Buffer_Length := L * 64 ;
            reallocmem( _Buffer, _Buffer_Length ) ;
        end ;
        move( PChar( Value )[ 0 ], _Buffer[ _Position ], length( Value ) ) ;
    end else
    begin
        Data_File.Write( 0, _Position, length( Value ), PChar( Value )[ 0 ] ) ;
    end ;
    _Position := _Position + length( Value ) ;
    _Dirty_Data := True ;
end ;


procedure TUOS_String_Table.Write_Longint( Value : longint ) ;

var S : string ;

begin
    setlength( S, 4 ) ;
    move( Value, PChar( S )[ 0 ], 4 ) ;
    Write( S ) ;
end ;

The code is fairly self-explanatory. One additional feature is the addition of a pointer into the data. This is set with Seek and updated with Reads and Writes. The idea is to allow data to be written a piece at a time without having to build a buffer and write once, and to make the code cleaner elsewhere because it doesn't have to do as much with the buffer offset as it would otherwise. We also set the "Dirty" flags to indicate that the data has been updated in-memory, but not written to the file.

Here are the other internal methods.

procedure TUOS_String_Table.Add_Space( Value : cardinal ) ;

var Buff : array[ 0..63 ] of byte ;
    L : int64 ;

begin
    if( Data_File <> nil ) then
    begin
        L := ( Value + 63 ) div 64 ; // Round up to number of 64-byte chunks
        fillchar( Buff, sizeof( Buff ), 0 ) ;
        while( L > 0 ) do
        begin
            Data_File.Write( 0, Data_File.Header.Size, sizeof( Buff ), Buff ) ;
            dec( L ) ;
        end ;
        exit ;
    end ;
    L := ( _Buffer_Length + Value + 63 ) div 64 ; // New size in 64-byte chunks
    _AT.Size := L ;
    _Buffer_Length := L * 64 ;
    reallocmem( _Buffer, _Buffer_Length ) ;
end ;


function TUOS_String_Table.Indexof( const Value : string ) : longint ;

var Low, High : longint ;
    This_Value : string ;

begin
    Result := 0 ;
    if( _Index.Count < 2 ) then
    begin
        exit ;
    end ;
    Low := 0 ;
    High := _Index.Count - 1 ;
    while( True ) do
    begin
        if( High <= Low ) then
        begin
            Result := Low ;
            exit ;
        end ;
        Result := ( High - Low ) div 2 + Low ; // Start in middle
        This_Value :=  Get_String( _Index[ Result ] ) ;
        if( This_Value = Value ) then // Found our target
        begin
            // Return first instance of a value, in case there are duplicates
            while(
                   ( Result >= 0 )
                   and
                   ( Get_String( _Index[ Result - 1 ] ) = Value )
                 ) do
            begin
                dec( Result ) ;
            end ;
            exit ;
        end else
        if( This_Value < Value ) then
        begin
            Low := Result + 1 ;
        end else
        begin
            if( High = Result ) then
            begin
                exit ;
            end ;
            High := Result ;
        end ;
    end ; // while( True )
end ; // TUOS_String_Table.Indexof


procedure TUOS_String_Table.Update ;

begin
    if( _Dirty_Data ) then
    begin
        if( Data_File <> nil ) then
        begin
            _Dirty_Data := False ;
        end ;
    end ;
    if( _Dirty_Index ) then
    begin
        if( _Index_File <> nil ) then
        begin
            _Dirty_Index := False ;
            _Index_File.Write( 0, 0, _Index.Count * 4, _Index.List^ ) ;
        end ;
    end ;
    if( _AT.Modified ) then
    begin
        if( _AT_File <> nil ) then
        begin
            _AT.Modified := False ;
            _AT_File.Write( 0, 0, _AT.Size, _AT.Get_Data^ ) ;
        end ;
    end ;
end ;

Add_Space extends the string table by the requested number of bytes. For the in-memory buffer, it just resizes it with reallocmem. For a file, it writes at the end of the file until it is the requested size. Note that the increase in size, for either mode, is 64-byte chunks, since that is what is covered by a single AT byte.
Indexof is what does a binary search of the index to find a given string value. It handles the case of duplicate values (when one or more strings have the maximum reference count value).
Update checks to see if the data is dirty, and writes changes out to the corresponding files and clears the Dirty flag - if the files are assigned.

Finally, we address the three main API methods. Add_String is used to add a string to the table. The offset in the table of the string is returned.

function TUOS_String_Table.Add_String( Value : string ) : longint ;

var C : byte ;
    P : int64 ;
    L, Ref : cardinal ;
    S : string ;

begin
    // Setup...
    Set_Last_Error( nil ) ;
    Result := 0 ;
    if( length( Value ) = 0 ) then
    begin
        exit ;
    end ;
    if( length( Value ) > 256 ) then
    begin
        Setlength( Value, 256 ) ;
    end ;
    C := length( Value ) ;

    if( _Index.Count > 0 ) then
    begin
        // Find existing string...
        Result := Indexof( Value ) ;
        S := Get_String( _Index[ Result ] ) ;
        if( Value = S ) then
        begin
            Seek( _Index[ Result ] ) ;
            Ref := Read_Longint ;
            if( Ref < $FFFFFFFF ) then // Less than 4 billion references
            begin
                // Increment the reference count...
                Result := _Index[ Result ] ;
                Seek( Result ) ;
                inc( Ref ) ;
                Write_Longint( Ref ) ;
                Update ;
                exit ;
            end ;
        end ;
    end else
    begin
        Result := -1 ;
    end ;

    // A new string...
    P := _AT.Allocate( 5 + length( Value ) ) ;
    if( P = 0 ) then // Not enough room
    begin
        Add_Space( length( Value ) + 5 ) ;
        P := _AT.Allocate( 5 + length( Value ) ) ;
    end ;
    if( ( P > $FFFFFFFE ) or ( P = 0 ) ) then
    begin
        Result := 0 ; // No more room
        exit ;
    end ;
    Seek( P ) ;
    if( S >= Value ) then
    begin
        dec( Result ) ; // Need to insert new string before found index
    end ;
    _Index.Insert( Result + 1, P ) ;
    _Dirty_Index := True ;
    Result := P ;
    Write( #1#0#0#0 + chr( C ) + Value ) ;
    Update ;
end ; // TUOS_String_Table.Add_String

If the string already exists, the existing string's reference count in incremented. If the reference is maxed out, we create new string entry. When creating a new entry, we try to allocate the needed space. If that fails, we extend the string table and try again. If we fail after the extend, it means we've run out of space.

The delete operation is simple.

procedure TUOS_String_Table.Delete_String( Index : longint ) ;

var L, Ref : cardinal ;

begin
    if( _Index.Indexof( Index ) = -1 ) then
    begin
        Set_Last_Error( Create_Exception( UOS_String_Table_Error_Invalid_String_Index, nil ) ) ;
        exit ;
    end ;
    Set_Last_Error( nil ) ;

    // Update string reference...
    Seek( Index ) ;
    Ref := Read_Longint ;
    L := 0 ;
    Read( L, 1 ) ;
    if( Ref < 2 ) then // No more references
    begin
        _AT.Deallocate( Index, 5 + L ) ;

        // Remove from index
        Index := _Index.Indexof( Index ) ;
        _Index.Delete( Index ) ;
        _Dirty_Index := True ;

        Update ;
        exit ;
    end ; // if( Ref < 2 )

    // Decrement the reference count...
    Seek( Index ) ;
    dec( Ref ) ;
    Write_Longint( Ref ) ;
    Update ;
end ; // TUOS_String_Table.Delete_String

We validate the string address, decrement its reference count, and deallocate the space if the reference count goes to 0.

The Get_String method is the simplest, and fastest, of the methods.

function TUOS_String_Table.Get_String( Value : longint ) : string ;

var C : integer ;

begin
    Result := '' ;
    Set_Last_Error( nil ) ;
    if( Value = 0 ) then
    begin
        exit ;
    end ;
    if( _Index.Indexof( Value ) = -1 ) then
    begin
        exit ;
    end ;
    Seek( Value + 4 ) ; // Position to length byte
    C := 0 ;
    Read( C, 1 ) ;
    if( C = 0 ) then
    begin
        C := 256 ;
    end ;
    setlength( Result, C ) ;
    Read( PChar( Result )[ 0 ], C ) ;
end ;

In both the first and second attempts at this class, the Get_String method has been the same, and is the fastest operation. It simply takes the address passed, which is the offset of the string header, and then grabs the string from the table and returns it. Note the handling of a size of 0.

UOS Interface
One of the other things that our File System is going to need for its basic operation is an ability to get the current date/time so that the timestamps in the file header can be set properly. In terms of UOS running on an Intel iAPX machine, any compiler for Windows should work fine as we will make UOS capable of loading PE files (Windows .EXEs). So, the example code that I've written in Delphi will actually work as our UOS code. There are a couple of restrictions, however. First, most compilers include built-in functions that call the Windows O/S to do some tasks, such as file I/O, heap usage, and getting the current date/time. So, if we use these built-in features in Delphi, the compiler will include code to call Windows functions. This may include referencing Windows DLLs, or making direct calls to Windows via certain CPU interrupts. Neither of these is going to work on UOS, except under a Windows emulator - which we a long ways from dealing with. So, our code has to avoid calling interrupts, or referencing Windows-specific code. But then how do we get a timestamp? Obviously, UOS will need to provide this feature, and there will be a need for us to provide access to all the UOS capabilities for a user program. This is a huge topic and we will attend to it in the future. For now, however, there are two features that we will need to provide for our file system's basic operation: heap support and get date/timestamp. These features (and others, later) will be provided via a UOS API. We will address the actual implementation of this interface later, but for our present purpose, we will need to use a UOS interface for our file system component.
Here is the implementation of the interface to the features that we need for our file system:

type TUOS_API = class
                    public
                        function Get_Timestamp : int64 ;
                        function Get_Heap : TCOM_Heap64 ;
                end ; // TUOS_API

In the next article, we will start on our File System class.