Copyright © 2016 by Alan Conroy. This article may be copied in whole or in part as long as this copyright is included.


1 Introduction
2 Ground Rules

Building a File System
3 File Systems
4 File Content Data Structure
5 Allocation Cluster Manager
6 Exceptions and Emancipation
7 Base Classes, Testing, and More
8 File Meta Data
9 Native File Class
10 Our File System
11 Allocation Table
12 File System Support Code
13 Initializing the File System
14 Contiguous Files
15 Rebuilding the File System
16 Native File System Support Methods
17 Lookups, Wildcards, and Unicode, Oh My
18 Finishing the File System Class

The Init Program
19 Hardware Abstraction and UOS Architecture
20 Init Command Mode
21 Using Our File System
22 Hardware and Device Lists
23 Fun with Stores: Partitions
24 Fun with Stores: RAID
25 Fun with Stores: RAM Disks
26 Init wrap-up

The Executive
27 Overview of The Executive
28 Starting the Kernel
29 The Kernel
30 Making a Store Bootable
31 The MMC
32 The HMC
33 Loading the components
34 Using the File Processor
35 Symbols and the SSC
36 The File Processor and Device Management
37 The File Processor and File System Management
38 Finishing Executive Startup

Users and Security
39 Introduction to Users and Security
40 More Fun With Stores: File Heaps
41 File Heaps, part 2
42 SysUAF
43 TUser
44 SysUAF API

Terminal I/O
45 Shells and UCL
46 UOS API, the Application Side
47 UOS API, the Executive Side
48 I/O Devices
49 Streams
50 Terminal Output Filters
51 The TTerminal Class
52 Handles
53 Putting it All Together
54 Getting Terminal Input
55 QIO
56 Cooking Terminal Input
57 Putting it all together, part 2
58 Quotas and I/O

UCL
59 UCL Basics
60 Symbol Substitution
61 Command execution
62 Command execution, part 2
63 Command Abbreviation
64 ASTs
65 Expressions, Part 1
66 Expressions, Part 2: Support code
67 Expressions, part 3: Parsing
68 SYS_GETJPIW and SYS_TRNLNM
69 Expressions, part 4: Evaluation

UCL Lexical Functions
70 PROCESS_SCAN
71 PROCESS_SCAN, Part 2
72 TProcess updates
73 Unicode revisted
74 Lexical functions: F$CONTEXT
75 Lexical functions: F$PID
76 Lexical Functions: F$CUNITS
77 Lexical Functions: F$CVSI and F$CVUI
78 UOS Date and Time Formatting
79 Lexical Functions: F$CVTIME
80 LIB_CVTIME
81 Date/Time Contexts
82 SYS_GETTIM, LIB_Get_Timestamp, SYS_ASCTIM, and LIB_SYS_ASCTIM
83 Lexical Functions: F$DELTA_TIME
84 Lexical functions: F$DEVICE
85 SYS_DEVICE_SCAN
86 Lexical functions: F$DIRECTORY
87 Lexical functions: F$EDIT and F$ELEMENT
88 Lexical functions: F$ENVIRONMENT
89 SYS_GETUAI
90 Lexical functions: F$EXTRACT and F$IDENTIFIER
91 LIB_FAO and LIB_FAOL
92 LIB_FAO and LIB_FAOL, part 2
93 Lexical functions: F$FAO
94 File Processing Structures
95 Lexical functions: F$FILE_ATTRIBUTES
96 SYS_DISPLAY
97 Lexical functions: F$GETDVI
98 Parse_GetDVI
99 GetDVI
100 GetDVI, part 2
101 GetDVI, part 3
102 Lexical functions: F$GETJPI
103 GETJPI
104 Lexical functions: F$GETSYI
105 GETSYI
106 Lexical functions: F$INTEGER, F$LENGTH, F$LOCATE, and F$MATCH_WILD
107 Lexical function: F$PARSE
108 FILESCAN
109 SYS_PARSE
110 Lexical Functions: F$MODE, F$PRIVILEGE, and F$PROCESS
111 File Lookup Service
112 Lexical Functions: F$SEARCH
113 SYS_SEARCH
114 F$SETPRV and SYS_SETPRV
115 Lexical Functions: F$STRING, F$TIME, and F$TYPE
116 More on symbols
117 Lexical Functions: F$TRNLNM
118 SYS_TRNLNM, Part 2
119 Lexical functions: F$UNIQUE, F$USER, and F$VERIFY
120 Lexical functions: F$MESSAGE
121 TUOS_File_Wrapper
122 OPEN, CLOSE, and READ system services

UCL Commands
123 WRITE
124 Symbol assignment
125 The @ command
126 @ and EXIT
127 CRELNT system service
128 DELLNT system service
129 IF...THEN...ELSE
130 Comments, labels, and GOTO
131 GOSUB and RETURN
132 CALL, SUBROUTINE, and ENDSUBROUTINE
133 ON, SET {NO}ON, and error handling
134 INQUIRE
135 SYS_WRITE Service
136 OPEN
137 CLOSE
138 DELLNM system service
139 READ
140 Command Recall
141 RECALL
142 RUN
143 LIB_RUN
144 The Data Stream Interface
145 Preparing for execution
146 EOJ and LOGOUT
147 SYS_DELPROC and LIB_GET_FOREIGN

CUSPs and utilities
148 The I/O Queue
149 Timers
150 Logging in, part one
151 Logging in, part 2
152 System configuration
153 SET NODE utility
154 UUI
155 SETTERM utility
156 SETTERM utility, part 2
157 SETTERM utility, part 3
158 AUTHORIZE utility
159 AUTHORIZE utility, UI
160 AUTHORIZE utility, Access Restrictions
161 AUTHORIZE utility, Part 4
162 AUTHORIZE utility, Reporting
163 AUTHORIZE utility, Part 6
164 Authentication
165 Hashlib
166 Authenticate, Part 7
167 Logging in, part 3
168 DAY_OF_WEEK, CVT_FROM_INTERNAL_TIME, and SPAWN
169 DAY_OF_WEEK and CVT_FROM_INTERNAL_TIME
170 LIB_SPAWN
171 CREPRC
172 CREPRC, Part 2
173 COPY
174 COPY, part 2
175 COPY, part 3
176 COPY, part 4
177 LIB_Get_Default_File_Protection and LIB_Substitute_Wildcards
178 CREATESTREAM, STREAMNAME, and Set_Contiguous
179 Help Files
180 LBR Services
181 LBR Services, Part 2
182 LIBRARY utility
183 LIBRARY utility, Part 2
184 FS Services
185 FS Services, Part 2
186 Implementing Help
187 HELP
188 HELP, Part 2
189 DMG_Get_Key and LIB_Put_Formatted_Output
190 LIBRARY utility, Part 3
191 Shutting Down UOS
192 SHUTDOWN
193 WAIT
194 SETIMR
195 WAITFR and Scheduling
196 REPLY, OPCOM, and Mailboxes
197 REPLY utility
198 Mailboxes
199 BRKTHRU
200 OPCOM

Glossary/Index


Downloads

File Content Data Structure

Back in article 2, we briefly defined a "Store". This concept is fundamental to the design of the UOS File System, so let's review it. A store is anything that can store data. Here are the characteristics of stores, and examples of them:

ReadWrite*SizeAccess modeExample
YesYesFixedRandomHard disk
YesYesFixedSequentialMagnetic tape
YesOnceFixedRandomCD, DVD
YesYesVariableRandomContainer file
YesYesVariableRandomRAM

* the ability to write data may be prevented if the device is set to read-only

The reason for basing the UOS File System off of stores is that we can have the file system software work regardless of what physical device it is on. So long as a store interface is provided for the underlying hardware, the file system software can work with it and we don't need to concern ourselves with the differences between different storage mediums, within reason. This approach is useful because it means that we can easily manage a file system on a hard disk, in memory (a RAM disk), or in a file (a container file). All we need to do is provide the file system with an instance of a store for the device in question. RAM disks can be used to create very fast (but small) virtual disks in RAM. This can be used in place of (or in addition to) memory caching of slower hard disks or network disks. Caching happens automatically and is the default behavior, but the ability to create a RAM disk provides the user with a certain amount of flexibility. Container files are used to hold a lot of different data (files) within a single file. For UOS, we will use container files to create library files, for instance. But, that will have to wait until later.

Managed Stores
The File System will need to allocate chunks of the available storage space as required - for instance, for data for our files. Also, as things are deleted, we will need to deallocate those chunks so they can be reused for other purposes. The management of free and allocated space is the domain of the store itself - not the file system. For this reason, we use a type of store called a "Managed Store". An example of a managed store is a heap. In fact, for RAM disks, we can give the File System an interface to the system heap (also to be discussed later). Different stores have different minimum allocation sizes. In terms of hard disks, the minimum allocatable unit is the sector. Most hard disks have a sector size of 512 bytes, which means that the smallest amount of data that can be read from or written to is 512 byte sectors. Memory heaps on Intel hardware tend to have a minimum allocation size of 16 bytes (for various reasons that we will eventually discuss). It certainly is possible that a store class could be written to deal with operations on chunks of data that are smaller than the minimum allocation size, but this would be an inefficient means of accessing something like a hard disk, because it means that the store class will have to read in a chunk, modify part of it, and write it back out rather than just writing a new value. That is, it takes three physical I/O operations, instead of one, to accomplish a single logical I/O operation. We don't want to make our disk I/O run three times slower than it needs to. What this means is that allocations, deallocations, and I/O operations must operate on some multiple of the minimum allocation size. So, on a disk, we will use chunks of 512, 1024, 1536, 2048, etc. Note that 512 is a multiple of 16, so a default size of 512 will work for our RAM disks, our hard disks, and all container files. It will even work for old floppy disks, which have sectors of 256 bytes.

A store is essentially an array of bytes. To support the largest available disks out there, we will use 64-bit pointers to access any point on the store. These pointers are simply integer offsets from the start of the store. So, a 1 Gb disk will have offsets 0 through 1,073,741,823. Thus, a 64-bit address range allows for a disk as large as 18 quintillion bytes (about 8 billion times the size of the largest hard disk today). All allocations on the store will return an offset that is on a minimum allocation size boundary, and we can verify that we have a valid pointer by making sure it is a multiple of the minimum allocation size. As long as we use valid pointers (that is, those returned from the store when an allocation is requested), and only read/write buffers that are multiples of the minimum allocation size, we will be accessing the store efficiently. On our 1 Gb hard disk, we would have 2,097,152 possible chunks which indicates the maximum number of file extents. Except that it doesn't. You see, a managed store needs to keep track of free and allocated space, which is typically done through an allocation table (a data structure that allows the store to manage the space). On this disk, the allocation table will take about 256Kb of space (512 chunks). This isn't much overhead on a small disk. But on a 1 Tb disk, there will be over 2 billion chunks and the allocation table will take up 256 Mb of space. As a percentage, that is a small amount of the total disk size, but 1/4 Gb for overhead is quite a lot (and there is other overhead we haven't talked about yet). One way to shrink the size of the allocation table is to increase the minimum allocation size. We call this the Cluster size, because we will cluster two or more chunks together on each allocation/deallocation. If we increase it to 1024, we halve the size of the allocation table to 128 Mb. We also reduce file fragments by about half. However, that also means that we waste 512 bytes per file, on average. With a clustersize of 512, we waste only an average of 256 bytes per file. The UOS File System can deal with any clustersize, although anything over 64K is likely to use up excessive amounts of RAM for I/O buffers. We will default to 512 as the clustersize for our File System stores, but allow the user to customize the cluster size for each device as they see fit.

So, here is our abstract virtual store and managed store classes, descendants of which will be used for each type of hardware device supported by the UOS File System software:

type TStore_Size64 = int64 ;
type TStore_Address64 = int64 ;

type TCOM_Store64 = class
                        public { API... }
                            function Read( var Data ; Address, _Size : TStore_Address64 ) : TStore_Address64 ;
                                virtual ; stdcall ; abstract ;
                            { Read data from specified Address in store to Data.
                              Read _Size bytes.
                              Function returns the number of bytes actually read. }

                            function Write( var Data ; Address, _Size : TStore_Address64 ) : TStore_Address64 ;
                                virtual ; stdcall ; abstract ;
                            { Write data from Data to specified Address in store.
                              Write _Size bytes.
                              Function returns the number of bytes actually written. }

                            function Min_Storage : TStore_Address64 ;
                                virtual ; stdcall ; abstract ;
                            { Returns the minimum atomic size of the store.  For
                              instance, this would be the sector size for a disk. }

                            function Get_Max_Storage : TStore_Size64 ;
                                virtual ; stdcall ; abstract ;
                            { Returns the maximum (current) size of the store, in
                              bytes.  In other words, this is the highest valid
                              address (plus 1). }

                            procedure Set_Max_Storage( Value : TStore_Size64 ) ;
                                virtual ; stdcall ; abstract ;
                            { Sets the maximum (current) size of the store, in bytes.
                              In other words, this is the new highest valid address
                              (plus 1).  This should only be used to shrink the size
                              of the store, not to extend it.  To extend, use the Extend method. }

                            function Get_Read_Only : boolean ;
                                virtual ; stdcall ; abstract ;
                            { Indicates that the store can only be read from. }

                            function Get_Write_Only : boolean ;
                                virtual ; stdcall ; abstract ;
                            { Indicates that the store can only be written to. }

                            procedure Set_Read_Only( Value : boolean ) ;
                                virtual ; stdcall ; abstract ;
                            { Set read-only status. }

                            procedure Set_Write_Only( Value : boolean ) ;
                                virtual ; stdcall ; abstract ;
                            { Set write-only status. }

                            function Contiguous_Store : boolean ;
                                virtual ; stdcall ; abstract ;
                            { Returns True if the store's addresses are contiguous
                              (non-sparse). }

                            function Extended_Size : TStore_Address64 ;
                                virtual ; stdcall ; abstract ;
                            { Returns the maximum theoretical size that the store can
                              be extended to.  If the store is not extendable, this
                              returns the same value as Get_Max_Storage. }

                          property Max_Storage : TStore_Address64
                              read Get_Max_Storage
                              write Set_Max_Storage ;

                          property Read_Only : boolean
                              read Get_Read_Only
                              write Set_Read_Only ;

                          property Write_Only : boolean
                              read Get_Write_Only
                              write Set_Write_Only ;
                    end ; { TCOM_Store64 }

First we define a couple data types, which are just synonyms for int64 (a signed 64-bit integer).
The class name reflects the fact that this is a 64-bit-address store. The COM indicates the use of stdcall conventions on the methods.
The Read and Write methods do what you'd think. Min_Storage/ returns the minimum allocatble amount of data, in bytes. Max_Storage indicates the size of the store. Extended_Size returns the maximum potential size for variable-length stores (files, for instance). A variable-length store can be resized. On a fixed-size store, Extended_Size returns the Max_Storage, and the size cannot be changed. Read_Only is true if the store cannot be written to. Write_Only is true if the store cannot be read from, but can be written to. Note that some stores are inherently one or the other and cannot be changed. Others, however, can be changed to read-only insofar as the class is concerned.

type TCOM_Managed_Store64 = class( TCOM_Store64 )
                                public { API... }
                                    function Allocate( Size : TStore_Address64 ) : TStore_Address64 ;
                                        virtual ; stdcall ; abstract ; { Allocate space}
                                    function Allocate_At( Offset, Size : TStore_Address64 ) : boolean ;
                                        virtual ; stdcall ; abstract ; { Allocate space at specific location }
                                    procedure Copy( Source, Destination : TStore_Address64 ;
                                        Count : integer ) ; virtual ; stdcall ; abstract ; { Copy data }
                                    procedure Deallocate( PTR, Size : TStore_Address64 ) ;
                                        virtual ; stdcall ; abstract ; { Deallocate space }
                                    procedure Fill( PTR : TStore_Address64 ; Count : integer ;
                                        Value : byte ) ; virtual ; stdcall ; abstract ;
                                    function Reallocate( PTR, Old, New : TStore_Address64 ) : TStore_Address64 ;
                                        virtual ; stdcall ; abstract ;
                                    function SpaceAvail : TStore_Address64 ;
                                        virtual ; stdcall ; abstract ;
                                    function XSpaceAvail : TStore_Address64 ;
                                        virtual ; stdcall ; abstract ;
                                    function MaxSpace : TStore_Address64 ;
                                        virtual ; stdcall ; abstract ;
                                    procedure Get_Root( var Buf ) ;
                                        virtual ; stdcall ; abstract ;
                                    procedure Set_Root( var Buf ) ;
                                        virtual ; stdcall ; abstract ;
                            end ; { TCOM_Managed_Store64 }

The managed store descends from the basic store, adding the allocation and deallocation features of a managed store.
Allocate and Deallocate methods do what you'd think. Allocate_At is a special case of Allocate where the caller can attempt to allocate space at a specific location on the store. In terms of our File System, this is used for placing files at specific locations. If the requested space cannot be allocated at the requested location, the method returns false. Reallocate attempts to extend an allocated chunk of memory in-place. If that cannot be done, it will allocate the requested space somewhere else, copy the data from the original location, and then deallocate the old value. It returns the new address of the data (which might be the same as before). Copy is a way to copy data from one location in the store to another. The class instance will attempt to do this in the most efficient way possible, given the actual hardware and therefore should be used rather than a read/write cycle in the calling code. MaxSpace returns the size of the largest possible contiguous chunk of the store that can be allocated. SpaceAvail returns the size of the total free space on the store, although this space may be fragmented into multiple pieces. XSpaceAvail returns the potential total free space on a variable-sized store. On fixed-size stores, XSpaceAvail will return the same alue as SpaceAvail. Fill is used to write a certain number of a given byte value to the store. We will use this to erase old data (for instance, after deleting a file) for security purposes. The class instance will use the fastest possible means to accomplish this task on the given hardware.
Of special note are the Get_Root and Set_Root methods. Generally, a pointer with a value of 0 is considered to be null, and using a null pointer is considered an error. Stores treat address 0 the same way, but then how do we get to the lowest addresses of the store? Get_Root will return the lowest chunk of the store, and Set_Root will write to it. These functions prevent inadvertant overwrites of the lowest area of the store. This is especially important because the lowest area contains some important information about the store. For instance, on a bootable disk, this area is considered the boot block. But, even in the case of non-bootable disks, all stores keep important information about the store in that area. Information such as the size of allocation units, pointers to important structures, etc. Boot blocks and this other key information will be discussed in a later article.

Now we are ready to look at the structures used to maintain the extents of our files. For a zero-length file, we actually don't need any pointers, but we'll ignore this situation for now. Since each extent of the file data can be anywhere on the store, since we don't know where the managed store will allocate the extent, we need a list of pointers to these extents - in essence, an array of pointers. But this array will need to be stored on the store as well, so that when UOS reboots, the array can be accessed and the file's data located. Since we will assume a minimum clustersize of 512 bytes, we can allocate 64 8-byte (64 bit) pointers. For a clustersize of 512, this can represent a file of up to 32Kb (64 x 512). What if we need a larger file? We will reserve the last pointer in a cluster as a pointer to another allocation cluster, with another 64 pointers. We can chain these allocation clusters together until we have enough for whatever file size we have. Unused pointers will be 0, and the last allocation cluster will end with a zero pointer since there are no more allocation clusters to link to. Here is a diagram of what a 32Kb file's allocation clusters might look like:

To access a specific location in the file, we simply take the desired file offset and divide it by the clustersize to determine the extent entry where that offset is located. We take the extent entry index and divide by 63 (the number of extent pointers in an allocation cluster) to get the allocation cluster in question. We follow the pointers at the end of each allocation cluster to get to the appropriate one. Then we take the extent entry index mod 63 to get the proper pointer offset within that allocation cluster. For instance, let's say that we have a request to access offset 25088 in the file. With a clustersize of 512, we divide 25088 by 512 to get extent index 49. We then divide that index by 63, to get allocation cluster 0. So, we need to access the first allocation cluster (cluster 0). 49 mod 63 gives us the pointer index 49 in this allocation cluster. That pointer is the location of that file position on the store.

Although the overhead of the allocation clusters, in terms of storage space, amounts to a mere 1.5%, there is a potential performance problem associated with reading multiple allocation clusters while following the allocation links. One way to reduce the number of read operations required to reach the needed allocation cluster, is to give the files a larger cluster size. In such a case, more of the file would be represented by the extent pointers in each allocation cluster, reducing the total number of allocation clusters that are required for the file, and fewer read operations would be required to find the pointer to any specific file position. But we don't want to force all files to use a large cluster size just for the sake of one or two files. Therefore, UOS will allow each file to have its own cluster size, which must be a multiple of the store's cluster size. This will allow the user to balance the needs between space and performance efficiency according to his own needs. There are some system files that, for reasons of hardware requirements on Intel architectures, should have a clustersize of at least 4Kb, which we will discuss later. Suffice it to say that supporting different cluster sizes for each file provides a useful amount of flexibility, both for the user and for UOS itself.

In the next few articles, we will write a file allocation class to manage the allocation clusters, and an extent store class to manage the file extents.