Copyright © 2016 by Alan Conroy. This article may be copied in whole or in part as long as this copyright is included.

1 Introduction
2 Ground Rules

Building a File System
3 File Systems
4 File Content Data Structure
5 Allocation Cluster Manager
6 Exceptions and Emancipation
7 Base Classes, Testing, and More
8 File Meta Data
9 Native File Class
10 Our File System
11 Allocation Table
12 File System Support Code
13 Initializing the File System
14 Contiguous Files
15 Rebuilding the File System
16 Native File System Support Methods
17 Lookups, Wildcards, and Unicode, Oh My
18 Finishing the File System Class

The Init Program
19 Hardware Abstraction and UOS Architecture
20 Init Command Mode
21 Using Our File System
22 Hardware and Device Lists
23 Fun with Stores: Partitions
24 Fun with Stores: RAID
25 Fun with Stores: RAM Disks
26 Init wrap-up

The Executive
27 Overview of The Executive
28 Starting the Kernel
29 The Kernel
30 Making a Store Bootable
31 The MMC
32 The HMC
33 Loading the components
34 Using the File Processor
35 Symbols and the SSC
36 The File Processor and Device Management
37 The File Processor and File System Management
38 Finishing Executive Startup

Users and Security
39 Introduction to Users and Security
40 More Fun With Stores: File Heaps
41 File Heaps, part 2
42 SysUAF
43 TUser

Terminal I/O
45 Shells and UCL
46 UOS API, the Application Side
47 UOS API, the Executive Side
48 I/O Devices
49 Streams
50 Terminal Output Filters
51 The TTerminal Class
52 Handles
53 Putting it All Together
54 Getting Terminal Input
55 QIO
56 Cooking Terminal Input
57 Putting it all together, part 2
58 Quotas and I/O

59 UCL Basics
60 Symbol Substitution
61 Command execution
62 Command execution, part 2
63 Command Abbreviation
64 ASTs
65 Expressions, Part 1
66 Expressions, Part 2: Support code
67 Expressions, part 3: Parsing
69 Expressions, part 4: Evaluation

UCL Lexical Functions
72 TProcess updates
73 Unicode revisted
74 Lexical functions: F$CONTEXT
75 Lexical functions: F$PID
76 Lexical Functions: F$CUNITS
77 Lexical Functions: F$CVSI and F$CVUI
78 UOS Date and Time Formatting
79 Lexical Functions: F$CVTIME
81 Date/Time Contexts
83 Lexical Functions: F$DELTA_TIME
84 Lexical functions: F$DEVICE
86 Lexical functions: F$DIRECTORY
87 Lexical functions: F$EDIT and F$ELEMENT
88 Lexical functions: F$ENVIRONMENT
90 Lexical functions: F$EXTRACT and F$IDENTIFIER
92 LIB_FAO and LIB_FAOL, part 2
93 Lexical functions: F$FAO
94 File Processing Structures
95 Lexical functions: F$FILE_ATTRIBUTES
97 Lexical functions: F$GETDVI
98 Parse_GetDVI
99 GetDVI
100 GetDVI, part 2
101 GetDVI, part 3
102 Lexical functions: F$GETJPI


Download sources
Download binaries

File Meta Data

Our Allocation Cluster Manager class provides for storing file data. But there is also meta data associated with each file. Meta data is information about the data. The most well-known file meta data is the file's name. The file name is not part of the file data itself; it is information about the file, so it is "meta data". There is other meta data as well. The meta data needs to be saved on the store as well as the actual data. Typically, the meta data is stored in what is called a "file header". We will retain that name for our collection of meta data. Let's consider the meta data that we need for UOS files. It is important to note, specifically, that this is for UOS files. UOS will support other file systems, which may not have all the meta data that a UOS file will have. UOS will have to work with these files that don't have all the meta data, but for our "native UOS files", we will provide a rich set of meta data.

Name: The file name is the means by which the user tells one file from another. You might think that this would be one of the simplest of the meta data that we store. But, alas, there are several considerations. First, how is the name interpreted? Is it stored as ASCII characters, or as UNICODE? If UNICODE, then which kind of UNICODE? In this respect, we can push the question up to the next software level and leave it up to UOS to determine how it is interpreted. We will simply store a series of bytes that represent a name. The next question is, how long can the name be? Windows, for instance, has a maximum file name length of 127. We will support at least that length for UOS. Of course, the actual file name length, in characters, depends upon the encoding that is used. It would be 127 ASCII characters, or 31 32-bit UNICODE characters, or somewhere between those two for other encodings. Most operating systems restrict which characters are allowed in file names. For instance, wildcard characters are typically not allowed. Again, we will leave that up to UOS and we will store whatever characters we are given. Older operating systems, such as RSTS/E and VMS, reserved 3 characters to serve as a file extension, which had special meaning to the operating system. Windows also associates file extensions with special meanings, but the length of the extension can be longer than 3 characters. Since we allow any characters in our names, we have no inherent special interpretations of the name. Again, all of that is up UOS.

Sizes: Although the Allocation Cluster Manager can determine the length of the file data for us, we will save the length ourselves and pass it to the ACM so we can avoid a bunch of potential turns, as described a few articles ago. We also will store a logical size, which will be less than or equal to the physical length. It is possible that our file data will be compressed by some means. Although the compression is the responsibility of other code, we will save an uncompressed size so the user will know how big the file would be if uncompressed. Remember that the File Size we store indicates the actual space used for the file on the store, whether it is compressed or not, hence the need for saving the uncompressed size as a separate value. We will have a record size that we initialize to 0, but can be used by UOS to save a record size for the file. 0 indicates that there is no specific record size. Cluster size is also something we need in order to allow a file's individual cluster size to be larger than the store's default. These sizes will be 64-bit integer values.

Dates: Users need to know the date the file was created. But we will also store dates associated with the last revision (modification), and the last time backed up. We also store the last access time (read or open). However, since this could adversely impact performance, UOS will only use it when requested to do so. In any case, we need to have it available for potential use. Finally, we will use an expiration date to allow for files to be automatically deleted by UOS when a certain date/time is reached. Use of this is optional, as well. All of these dates will use a 64-bit integer "Sirius Timestamp", which indicates the number of nanoseconds since midnight of January 1, year 0. In terms of actual dates, it isn't accurate since the calander has been changed more than once over the last 2,000 years. What the timestamp encodes is what would be the date and time if we projected the current calander system back in time. This is not a problem since we won't have any files that will have been created prior to, say, 1960. Nor do we have to worry about Y2K type problems, because the timestamp scheme works until after 8,000 AD. A value of 0 is long before there were such things as digital files, so we will use 0 as a value indicating "unassigned", "unused", or "default". Other than the creation date, which we set, all dates are initialized to 0 when the file is created.

Flags: There are various options that we will support for UOS files. Each of these is represented by a bit in a file flags integer value. Sometimes called file "attributes". We will discuss these flags as we go on.

Security: We will not concern ourselves with security at this level of code - that is the responsibility of the next higher level of UOS code. But, we will include a few items in the file header to support that code. These items include the Creator of the file and the Owner of the file, who are usually the same. The Owner, however, can change and it is they who have complete control of the file. We include a pointer to the Access Control List (ACL), which is a flexible means of granting other people various levels of access to the file. This concept is borrowed from VMS, and we will discuss it at some future date. ACLs and ownership help protect the file while it exists, but we also need to know how to dispose of the data in a secure way when the file is deleted. We will discuss this later, and it is not our job (at the file level) to deal with this, but we will save a few bits from the flags to keep track of what to do when the file is deleted.

Versioning: Versioning has to do with keeping previous versions of files around after the file has been updated. Various systems implement versioning differently, or not at all. RSTS/E didn't support it, but it was typical of programs running in that environment to keep one copy around with the extension ".BAK". If the file was updated again, the old .BAK file was lost forever and the new one was created from the current file, before it was changed. On VMS, a version number was assigned to the file, which was indicated by a semi-colon and a number after it. UOS will use a similar scheme by default, but allow each file to have a unique setting stored in the file header. The meaning of which will be described when we address the file processing code later, but will be implemented as a 32-bit integer value.

Other data: We can never assume that our file header will contain all the information that might be needed in some future version of UOS. So, we will keep an "extension" pointer which will point to an extension file header should we ever need that. That way, our current file headers will be forward compatible with future versions of UOS. For now, it will always be 0. And, of course, we will need to have a pointer to the file data itself (the _Root used in the Allocation Cluster Manager). But there is other data that we might need to store with the file. Some files require auditing information to be stored with them. In these highly secure applications, having that information kept in another file makes it too easy for the auditing information to be lost or deleted, so we will keep it as part of the file in the form of meta data. But since this is extensible and potentially very large, it cannot be kept in a fixed-size file header. And then there are other kinds of variable-length meta data that the user might want to keep with the file. For instance, what if he wants to associate a comment with a file to indicate why it exists? Shells for other operating systems may want to store odd-ball meta-data with files to match what that operating system stores in its own file systems. For instance, RSTS/E kept track of format attribues for its Record Management System. The point is, that there are any number of chunks are arbitarily large meta data that need to be associated with a file. They only differ from the file's actual data by virtue of our perspective. So, we will save each piece of meta data in a separate "data stream", which is to say that we will have an allocation cluster chain for each of these meta data. In fact, we can view the actual file data as nothing more than another data stream in our file. It may be the most important, and the default, stream, but from an implementation standpoint, it is no different from any other. How do we tell the multiple streams apart from each other? Perhaps there is a comment stream and an auditing stream in addition to the default data stream. So we will implement a naming scheme that allows each data stream to be identified. We will have some names that are reserved for UOS, but each user and application can add their own named streams to a file. All we need to do in the file header is account for these streams. Which means we need a name and a pointer for each one. We will discuss this later.
One last concern for the file header is that the store cluster size may be larger than a given file header, which means we will be wasting space if we only store one header per cluster. There are several ways of organizing file headers, which we will delve into in the near future. For now, we will simply reserve a 64-bit integer as a means of allowing the file system code to pack multiple headers into a cluster.

The file header potentially contains numerous stream names as well as the file name. In order to reserve room in a fixed-size header, we would have to determine a maximum length, and allocate room for a name of up to that length. Of course, it is unlikely that most names will be the maximum length, or even half that. So, we will be wasting a huge amount of space for no reason other than the possibility of a long file name. And we'd have to do this for each named data stream as well. Not to mention that my research indicates that the space taken up by files with the same name (but in different folders) is significant. Wouldn't it be nice if we could store the name elsewhere and just include a pointer to the name in the header? If we include a 64-bit pointer, we could point to a location on the store where the name is kept. But, that is even more likely to waste space, since most names will be smaller than the sector size of any hard disk. However, as you recall, our file code and the file header don't care about the name other than reserving space for it. All we need to do is store a value and let the file system code deal with names. So, we will keep a 64-bit value for the name, and let the file system code deal with how that is interpreted. We will do the same for the data stream names. Thus, each stream is now nothing more than two 64-bit integers in the header. But we can have any number of data streams, so what if they don't fit into our fixed-size header? It sounds like we will need a pointer to a cluster chain that can hold all of our data stream information. But that means that just to get to our file's data, we will have to follow a link from the file header to the data stream chain. This presents a possible performance issue, requiring an extra store read operation each time a file is opened. So, perhaps we can store a few stream pointers in the file header, including one reserved for the file data, and a pointer to a chain when we need more data streams than are provided for in the header. But how many should we provide in the file header? Before we answer that question, let's take a look at what our file header looks like thus far:
Byte offsetlengthDescription
08File Name
88Size on disk, in bytes
168Logical End of File (offset, in bytes)
248Uncompressed size, in bytes
328Cluster size
408Record size, in bytes
488Creation date
568Last modified date
648Last backup date
728Last access date
808Expiration date
888Attribute flags
1128ACL pointer
1208Extension pointer
1204Version limit
1244Extended flags. Used for disk sharing.

As we see, we've used 128 bytes for our file header thus far. That is reasonable, especially considering that many file systems reserve that much space just for the file name. But we aren't done yet. We still need a 64-bit value for the file system, and a 64-bit data stream overflow pointer. To maximize the use of space, we will want to be some exact division, or multiple, of the store cluster. We know that we need at least 24 more bytes for pointers, meaning that the file header is up to 152 bytes. It can't be smaller, so we need to extend it up. 256 would be the next best value, being 1/2 of a disk sector, or 16 times that of a memory cluster. That would leave room for 6 data stream name/pointers. Here's the rest of our file header:
Byte offsetlengthDescription
1288Data stream 0 name
1368Data stream 0 pointer
1448Data stream 1 name
1528Data stream 1 pointer
1608Data stream 2 name
1688Data stream 2 pointer
1768Data stream 3 name
1848Data stream 3 pointer
1928Data stream 4 name
2008Data stream 4 pointer
2088Data stream 5 name
2168Data stream 5 pointer
2248Data stream overflow pointer
2488File system pointer

We have 8 bytes left over, so we reserve that for future use. We will reserve data stream 0 for the actual file data. Which means that we could use the stream name value for something else, but for the sake of consistency and simple code, we will make stream 0 look like the other streams in the file header. A stream name of 0 indicates that stream is not used (except that stream 0 is always used). Here is the definition of our file header:

type TData_Stream = packed record
                        Name : int64 ;
                        Pointer : TStore_Address64 ;
                    end ;

     TUOS_File_Header = packed record
                                Name : int64 ;
                                Size : int64 ; // Size on disk
                                EOF : int64 ; // Logical size
                                Uncompressed_Size : int64 ;
                                Clustersize : int64 ;
                                Record_Size : int64 ;

                                Creation : int64 ; // Creation date
                                Last_Modified : int64 ;
                                Last_Backup : int64 ;
                                Last_Access : int64 ;
                                Expiration : int64 ;

                                Flags : int64 ;
                                Creator : int64 ;
                                Owner : int64 ;
                                ACL : int64 ;
                                Extension : TStore_Address64 ;
                                Version_Limit : longint ;
                                Extended_Flags : longint ;

                                Streams : array[ 0..5 ] of TData_Stream ;

                                Data_Stream : int64 ; // Overflow pointer
                                Reserved : int64 ;
                                File_System : int64 ;
                        end ;

Here are the flag values:
FAF_DSM_MASK3Data Security Mode.
FAF_PLACED4File was placed at a specific location on the store.
FAF_CONTIGUOUS8File data is contiguous. In this case, there is no allocation cluster chain. The stream pointer points directly to the start of the data.
FAF_DELETED16File is logically deleted. Awaiting physical deletion when the last handle is closed.
FAF_READONLY32File cannot be written to.
FAF_UNUSED64Unused file entry.
FAF_SYSTEM128System file.
FAF_HIDDEN256Hidden by default.
FAF_DIRECTORY512A directory (folder).

We will discuss these flags in more detail as we go along. Only a few of them have any meaning to our file class, which we will begin writing in our next article.