Allocation Cluster Manager

Managing file clusters involves managing a chain of buffers containing arrays of pointers to the file data. Most of the work involved with creating, reading, and writing file data is performed by the allocation cluster manager class. So, let's write the class. First, the definition:

type TCOM_Allocation_Cluster_Manager64 = class

There are two basic functions we need to provide for the file system software:

                                             public // API...
                                                 procedure Set_Size( Value : TStore_Size64 ) ;
                                                     virtual ;
                                                 function Offset_To_Pointer( Offset : TStore_Size64 ) : TStore_Address64 ;
                                                     virtual ;

You may notice that we are not using the stdcall calling convention, which is the standard for UOS components. However, this class is not a component; it is a helper class. For performance sake, we do not use stdcall here.
The first function allows us to set the size, in bytes, of the space allocated to this chain of allocation clusters. The second is a means of obtaining the address of data on the store. That is, given a file offset, we return the location on the store where that offset is located. Both of these operations will require us to process variable-sized allocation clusters. Remember that the minimum storage allocation will vary from store to store, so we can't be guaranteed how large the buffer will be. So, we will need to dynamically allocate a buffer. But we don't want to allocate/deallocate the buffer on each function call, for performance reasons. But, we can assume that the minimum allocation size of a store is not going to change once we write to it. So, once we have a store assigned to us, we can allocate a buffer of the appropriate size and use it for all calls. Note that the buffer size will be _Max_Index times 8 (the size of a 64-bit pointer), but we will cache it to avoid constant multiply (or shift) operations. Here are the class instance data to support this buffer:

                                             private // Instance data...
                                                 _Buffer : PAllocation_Record64 ; // Work buffer
                                                 _Buffer_Size : longint ; // Size of memory pointed to by _Buffer
                                                 _Max_Index : int64 ; // Maximum index in a cluster

PAllocation_Record64 is a pointer to an array of TStore_Address64 values, which is defined as:

type TAllocation_Record64 = array[ 0..Max_Memory div sizeof( TStore_Address64 ) ] of TStore_Address64 ;
     PAllocation_Record64 = ^TAllocation_Record64 ;

This allows us to allocate a buffer of any size. The _Buffer pointer points to a potentially huge array, but we will set _Max_Index to the maximum index in the buffer, based on the cluster allocation size. When the store is assigned to our class, we will allocate the buffer and set _Max_Index.

We also need to know the cluster size of the extents that we are managing. This may be different than the cluster size of the allocation clusters. We will leave the allocation cluster size tied to the store cluster size, but with a minimum value of 128, which would provide 16 pointers per allocation cluster. On a standard hard disk cluster size of 512, it would be 64 pointers. To finish off the rest of the data our class will need to operate: the root address of the first allocation cluster on the store (0 if not allocated), the store we are using, and the file cluster size.

                                                 _Clustersize : TStore_Size64 ; // Size of managed clusters
                                                 _Root : TStore_Address64 ; // Root of allocation clusters (first allocation cluster)
                                                 _Store : TCOM_Managed_Store64 ; // Store upon which the allocation clusters are stored

To determine the size of the data that we manage, we could run down the chain of allocation clusters, adding up the number of extents times the cluster size. That could be an expensive operation, so we will cache the size in another variable. And finally, we need a heap to store our buffer in. You see, we can't use the default Pascal (or C, or C++) heap since it will try to ask the Windows OS for memory, but there is no Windows OS, there is only UOS. UOS has heap management as well, but until we have a native UOS compiler, or a Windows emulator (which won't run in the kernel), the code generated by our compiler isn't going to use the UOS heap. We will get into the UOS heap class(es) later, but for now, we will assume that the caller who creates this class also assigns a heap instance to it. So we need a pointer to that heap object. Here are the remaining instance data variables:

                                                 _Size : TStore_Size64 ; // Size of data represented by allocation clusters
                                                 _Heap : TCOM_Heap64 ;

Since heaps can operate on any store, they return integer values for pointers. Since we trust that a heap class for memory is being assigned to us, we can cast these integers to pointers. TCOM_Heap64 descends from the managed store, and so we can (for now) just define TCOM_Heap64 as a synonym for TCOM_Managed_Store64.

Obviously, we need methods to allow our caller to assign the heap, store, and clustersize. Further, for existing files, we will need to know the root address of the first allocation cluster. We will also need a way for the caller to obtain the current values. Here are the methods to handle these assignments:

                                             public // API...
                                                 function Get_Clustersize : TStore_Size64 ;
                                                     virtual ; 
                                                 procedure Set_Clustersize( Value : TStore_Size64 ) ;
                                                     virtual ; 
                                                 function Get_Heap : TCOM_Heap64 ;
                                                     virtual ; 
                                                 procedure Set_Heap( Value : TCOM_Heap64 ) ;
                                                     virtual ; 
                                                 function Get_Root : TStore_Address64 ;
                                                     virtual ;
                                                 procedure Set_Root( Value : TStore_Address64 ) ;
                                                     virtual ; 
                                                 function Get_Size : TStore_Size64 ;
                                                     virtual ;
                                                 function Get_Store : TCOM_Managed_Store64 ;
                                                     virtual ; 
                                                 procedure Set_Store( Value : TCOM_Managed_Store64 ) ;
                                                     virtual ;

We also need a destructor to clean up after ourselves:

                                             public // Constructors and destructors...
                                                 destructor Destroy ; override ;

Now that we have defined the methods, we can look at the implementation. Here is the destructor:

destructor TCOM_Allocation_Cluster_Manager64.Destroy ;

begin
    Set_Heap( nil ) ;

    inherited Destroy ;
end ;

We set our heap to nil, because the Set_Heap method will deallocate the buffer whenever the heap is changed.

Here are the Get_* and Set_* method implementations, except for Set_Size, which we will cover last:

function TCOM_Allocation_Cluster_Manager64.Get_Heap : TCOM_Heap64 ;

begin
    Result := _Heap ;
end ;


procedure TCOM_Allocation_Cluster_Manager64.Set_Heap( Value : TCOM_Heap64 ) ;

begin
    if( _Heap <> nil ) then
    begin
        _Heap.Deallocate( integer( _Buffer ), _Buffer_Size ) ;
        _Buffer := nil ;
        _Buffer_Size := 0 ;
    end ;
    _Heap := Value ;
end ;

We deallocate the buffer from the existing heap, but - to be safe - we will make sure the heap is assigned before we do. Also, to be safe, we set the buffer to nil and the buffer size to 0.

function TCOM_Allocation_Cluster_Manager64.Get_Clustersize : TStore_Size64 ;

begin
    Result := _Clustersize ;
end ;


procedure TCOM_Allocation_Cluster_Manager64.Set_Clustersize( Value : TStore_Size64 ) ;

begin
    if( _Clustersize <> Value ) then // If anything changed
    begin
        _Clustersize := Value ;
        Set_Root( _Root ) ; // Recalculate size
    end ;
end ;

For performance sake, we only process a new cluster size if it differs from the current value. If different, we set the instance data and call Set_Root with the current Root value. Set_Root recalculates the file size.

function TCOM_Allocation_Cluster_Manager64.Get_Root : TStore_Address64 ;

begin
    Result := _Root ;
end ;


procedure TCOM_Allocation_Cluster_Manager64.Set_Root( Value : TStore_Address64 ) ;

var I : integer ;

begin
    if( _Store = nil ) then // Makes no sense to set the root if there is no store
    begin
        exit ;
    end ;

    // Set root and recalculate size...
    _Root := Value ;
    _Size := 0 ;
    while( Value <> 0 ) do
    begin
        _Store.Read( Value, _Buffer_Size, _Buffer^ ) ; // Get allocation cluster
        for I := 0 to _Max_Index - 1 do
        begin
            if( _Buffer^[ I ] = 0 ) then // Hit last allocated cluster
            begin
                break ; // We're done
            end ;
            _Size := _Size + _Clustersize ; // Add in another clustersize worth of bytes
        end ; // for I := 0 to Max_Index - 1
        Value := _Buffer^[ _Max_Index ] ; // Follow link to next allocation cluster
    end ; // while( Value <> 0 )
end ; // TCOM_Allocation_Cluster_Manager64.Set_Root

Setting the root is meaningless if we have no store for the root to point to, so we exit if there is no store. Note: this means that the caller needs to make sure the store is assigned to the instance before the root is set. Otherwise, we set the root, clear the size to 0, and then process all the allocation clusters to determine the size of the data. We will loop until we hit the end of the allocation cluster chain. In the loop, we read the next (first) allocation cluster, then we loop through the pointers. Once we hit a null pointer (value of 0), we can exit the loop because it means that we have run out of extents. Otherwise we add in another cluster size worth of data to the size. We only loop to _Max_Index - 1 because that last pointer in each allocation cluster is a pointer to the next allocation cluster. So, we use that value for the next allocation cluster to read, and loop back to do it all over again.

function TCOM_Allocation_Cluster_Manager64.Get_Size : TStore_Size64 ;

begin
    Result := _Size ;
end ;


function TCOM_Allocation_Cluster_Manager64.Get_Store : TCOM_Managed_Store64 ;

begin
    Result := _Store ;
end ;


procedure TCOM_Allocation_Cluster_Manager64.Set_Store( Value : TCOM_Managed_Store64 ) ;

begin
    _Store := Value ;
    _Size := 0 ;
    _Root := 0 ;
    if( Value <> nil ) then
    begin
        // Default the allocation cluster and file cluster size to the store clustersize...
        _Clustersize := Value.Min_Storage ;
        if( _Clustersize < 128 ) then
        begin
            _Clustersize := 128 ;
        end ;
        _Max_Index := ( _Clustersize div sizeof( TStore_Address64 ) ) - 1 ;
    end ;
end ;

Setting the heap is the first thing that should be done. Next is setting the store. Last is setting the root (if pre-existing data). Once the store is set, we clear the cached size and the root pointer, and set the store pointer to the passed value. Assuming that a nil hasn't been passed, we default the file cluster size to the minimum allocation size of the store. We also set the default max_index, which will determine the buffer size, based on the same value - but we always assume at least 128 bytes per allocation cluster.
As you can see, the Get_* methods are easy; they simply return the instance data.

Note that nowhere in these setup routines do we allocate the buffer - we assume that it was already set up. But, since we can't be sure what order various methods will be called, we don't know where the buffer might be assigned. For instance, Set_Root might not be called at all, if the file starts off as zero-length. And Set_Size will only be called if we change the file's size. In general, we want to avoid allocating data until it is needed, so we will implement a function that will only allocate a buffer when it is needed. Instead of referring directly to _Buffer, we will refer to the Buffer method, which will return _Buffer, and allocate it if it hasn't already been allocated. Here's the method:

function TCOM_Allocation_Cluster_Manager64.Buffer : PAllocation_Record64 ;

begin
    if( _Buffer_Size = 0 ) then // Buffer not already created
    begin
        _Buffer_Size := ( _Max_Index + 1 ) * sizeof( TStore_Address64 ) ;
        _Buffer := pointer( _Heap.Allocate( _Buffer_Size ) ) ;
    end ;
    Result := _Buffer ;
end ;

And, so, we will modify Set_Root as follows:

procedure TCOM_Allocation_Cluster_Manager64.Set_Root( Value : TStore_Address64 ) ;

var I : integer ;

begin
    if( _Store = nil ) then // Makes no sense to set the root if there is no store
    begin
        exit ;
    end ;

    // Set root and recalculate size...
    _Root := Value ;
    _Size := 0 ;
    while( Value <> 0 ) do
    begin
        _Store.Read( Value, _Buffer_Size, Buffer^ ) ; // Get allocation cluster
        for I := 0 to _Max_Index - 1 do
        begin
            if( Buffer^[ I ] = 0 ) then // Hit last allocated cluster
            begin
                break ; // We're done
            end ;
            _Size := _Size + _Clustersize ; // Add in another clustersize worth of bytes
        end ; // for I := 0 to Max_Index - 1
        Value := Buffer^[ _Max_Index ] ; // Follow link to next allocation cluster
    end ; // while( Value <> 0 )
end ; // TCOM_Allocation_Cluster_Manager64.Set_Root

There is actually a bug in this method, which we will address in a bit... But, first, let's look at the Offset_To_Pointer method.

function TCOM_Allocation_Cluster_Manager64.Offset_To_Pointer( Offset : TStore_Size64 ) : TStore_Address64 ;

var AC, C : longint ;
    P : TStore_Address64 ;

begin
    // Sanity check...
    if( ( Offset < 0 ) or ( Offset >= _Size ) ) then
    begin
        Result := 0 ;
        exit ;
    end ;

    // Determine cluster for offset...
    C := Offset div _Clustersize ; // Determine cluster for offset
    AC := C div _Max_Index ; // Allocation cluster index
    Offset := C mod _Max_Index ; // Pointer index within allocation cluster index

    // Read the allocation cluster...
    P := _Root ;
    while( AC >= 0 ) do
    begin
        dec( AC ) ;
        _Store.Read( P, _Buffer_Size, Buffer^ ) ;
        P := Buffer^[ _Max_Index ] ;
    end ;

    // Return the disk pointer...
    Result := Buffer^[ Offset ] ;
end ; // TCOM_Allocation_Cluster_Manager64.Offset_To_Pointer

The first thing we do is validate the offset. A negative offset, or an offset larger than the cached size is outside the range of the file contents, so we return 0. Note that since the first offset is 0, the comparison is ">= Size" rather than just "> Size". Then we calculate the allocation cluster number and the index in that cluster. Offset is the byte offset in the file, so, we convert it from byte offset to the file's cluster offset. Then we divide the file cluster by the number of pointers in each allocation cluster to get the allocation cluster offset, and the mod gives us the pointer index in that allocation cluster. So, next we need to read the appropriate allocation cluster, which is what the loop does. Then we simply return the value of the appropriate pointer.

Here, we see a potential for improving performance. Most data access tends to cluster around previous data access. This is especially true for files that are being read in sequential order. In the case of a file being read sequentially (such as being read into an editor), up to 63 clusters accessed will be through the same allocation cluster. If we already have the cluster in our buffer, why read it in again from the store? In the case of a RAM disk, reading the buffer over and over from the store isn't going to slow things down too much, but doing that from your typical hard disk is going to dramatically impact performance in a negative way. Even in the case of the RAM disk, the whole point of the memory-resident disk is faster performance, so why unnecessary overhead? Instead, we'll add a function to pull in a specific allocation cluster, and if it is already in the buffer, we don't need to do anything. On a large file, this could reduce the number of disk reads by as much as 64 times on a disk with a cluster size of 512. Even when we have caching on the disk, this small change in our class will likely have a noticable effect. Here's the new method, and the updated version of Offset_To_Pointer:

procedure TCOM_Allocation_Cluster_Manager64.Pull( AC : longint ) ;

var Original, P : TStore_Address64 ;

begin
    Buffer ; // Make sure buffer is allocated
    if( _Current_AC = AC ) then
    begin
        exit ; // Already have it
    end ;
    Original := AC ;

    if( ( _Current_AC >= 0 ) and ( AC > _Current_AC ) ) then // After our current AC
    begin
        AC := AC - _Current_AC - 1 ; // How much further to go
        P := Buffer^[ _Max_Index ] ;
    end else
    begin
        P := _Root ; // Start from the beginning
    end ;
    while( AC >= 0 ) do
    begin
        dec( AC ) ;
        _Store.Read( P, _Buffer_Size, Buffer^ ) ;
        _Current_AC_Pointer := P ;
        P := Buffer^[ _Max_Index ] ;
    end ;
    _Current_AC := Original ;
end ;


function TCOM_Allocation_Cluster_Manager64.Offset_To_Pointer( Offset : TStore_Size64 ) : TStore_Address64 ;

var AC, C : longint ;

begin
    // Sanity check...
    if( ( Offset < 0 ) or ( Offset >= _Size ) ) then
    begin
        Result := 0 ;
        exit ;
    end ;

    // Determine cluster for offset...
    C := Offset div _Clustersize ; // Determine cluster for offset
    AC := C div _Max_Index ; // Allocation cluster index
    Offset := C mod _Max_Index ; // Pointer index within allocation cluster index

    // Read the allocation cluster...
    Pull( AC ) ;

    // Return the disk pointer...
    Result := Buffer^[ Offset ] ;
end ; // TCOM_Allocation_Cluster_Manager64.Offset_To_Pointer

You may notice the first line in the Pull method calls Buffer, but does nothing with the result from that method. The reason we do this is because if the buffer doesn't already exist, _Buffer_Size will still be 0. Which means that the call to _Store.Read( P, _Buffer_Size, Buffer^ ) might be passing 0 as the length to _Store.Read, depending upon what order the compiler processes the parameters. We don't want our code to fail depending upon which version of Pascal compiler is used to build it, so this guarantees that _Buffer_Size is properly set before it is referenced. Remember the bug I mentioned in Set_Root? This is the same issue. So, at the start of the Set_Root method as well, we'll add a call to Buffer, just to make sure the size is properly initialized. We save the current allocation cluster index (_Current_AC) after we get the requested allocation cluster, and we check that at the start of the routine. If the requested allocation cluster is the last one we read, then there is nothing to do and we've saved ourselves a read operation on the store. If the requested cluster is after the one requested, we can just following the links from our current position, rather than starting from the root. Depending upon the type of operations being done on the file, this may save us a great many read operations. However, if the requested cluster is prior to our current location, we start from the root.

Finally, we come to the Set_Size method, which is where all the real work in this class is done. We'll take it in pieces:

procedure TCOM_Allocation_Cluster_Manager64.Set_Size( Value : TStore_Size64 ) ;

var I : longint ;
    Last, P : TStore_Address64 ;
    New_Size : TStore_Size64 ;

begin
    // Determine max cluster for new size...
    Value := ( Value + _Clustersize - 1 ) div _Clustersize ; // Determine number of clusters required for the specified size
    New_Size := Value * _Clustersize ; // Total new size, if we succeed
    if( New_Size = _Size ) then // No size change
    begin
        exit ;
    end ;

    // Update allocation clusters for the new size...
    Buffer ; // Make sure buffer is allocated
    Last := 0 ;
    if( _Root = 0 ) then // Nothing allocated yet
    begin
        _Root := Allocate_Allocation_Cluster ;
        _Current_AC := -1 ;
    end ;

It is possible that the caller passed a size for the data which is not a multiple of the clustersize, so we do some math to guarantee that the size requested is a multiple. Note that there are two sizes associated with files. One is the amount of disk space taken by the file. By definition, this will be a multiple of the store's minimum allocation size. The other file size has to do with the logical end of the file. This is kept track of elsewhere. As an example, let's say that you have a text file on a disk with 512 byte sectors, which contains only the text "hello world". The size of the file on the disk is 512 bytes. But the logical size of the file is only 11 bytes. This is an important thing to note for us, long term, but this class is only concerned with the actual disk space allocated to the file. So, if a request comes in for the file to be set to 11 bytes, our math will make sure that we allocate one clustersize worth of bytes for it, no matter how much larger than 11 that is. When we are done, Value will be the total number of file clusters required for the requested size. We then save the total size in New_Size by multiplying the cluster count by the cluster size. If we succeed, we will update the cached _Size with this value. We check to make sure we are actually changing the size and exit if not.
Next, we make sure the buffer is allocated and the buffer_Size set. We set _Current_AC to -1 to force the Pull method to search for a requested cluster from the start. We do this because we are going to be reading and/or writing multiple allocation clusters in this routine as we adjust the file's size, and whatever was cached before most likely won't be when we finish. Then we handle the situation when _Root is 0, indicating the file has no allocation clusters (which only happens for files with size 0). In this case, we allocate the first allocate cluster for the file and assign the pointer to _Root.
The next phase of the routine needs to handle two possible situations: truncating (shortening) or extending (expanding) the file. We use an if...then...else to handle both situations separately. My initial code did a single loop which did various checks for whether or not we were deleting or adding clusters, but despite the fact that it worked just fine, it wasn't the easiest code to work with. The new code takes up the same number of lines of code, but is much simpler to understand.

    if( New_Size > _Size ) then
    begin
        // Expanding...
        Pull( ( _Size - 1 ) div _Clustersize div _Max_Index ) ; // Get last allocation cluster
        Value := Value - _Current_AC * _Max_Index ;
        P := _Current_AC_Pointer ;
        while( Value > 0 ) do
        begin
            for I := 0 to _Max_Index - 1 do // For each pointer in this cluster
            begin
                if( Buffer^[ I ] = 0 ) then // Unallocated cluster
                begin
                    Buffer^[ I ] := _Store.Allocate( _Clustersize ) ;
                end ;
                dec( Value ) ; // One less cluster to account for
                if( Value = 0 ) then
                begin
                    break ; // No need to process any more of the pointers in this allocation cluster
                end ;
            end ; // for I := 0 to Max_Index - 1
            if( ( Value > 0 ) and ( Buffer^[ _Max_Index ] = 0 ) ) then // Need to extend the allocation chain
            begin
                Buffer^[ _Max_Index ] := _Store.Allocate( _Buffer_Size ) ;
		_Store.Fill( Buffer^[ _Max_Index ], _Buffer_Size, 0 ) ;
            end ;
            _Store.Write( P, _Buffer_Size, Buffer^ ) ;
            P := Buffer^[ _Max_Index ] ;
            if( P <> 0 ) then
            begin
                _Heap.Fill( integer( Buffer ), _Buffer_Size, 0 ) ;
            end ;
        end ; // while( Value > 0 )
    end else

For expanding the file size, we need to add additional file clusters to the end of the current ones. So, we first pull in the last allocation cluster in the file. We subtract 1 from _Size because the clusters are 0-based, and we want the cluster offset (or index), not the cluster number (eg cluster number 1 is cluster index 0). Then we convert from the requested size to the requested additional clusters, including any used indexes in the current allocation cluster. Then starting in the loaded allocation cluster, we process through all pointers in it, decrementing the number of clusters remaining, and allocating space on the store and assigning the address to each null (0) pointer. When we reach the end of the allocation cluster, we see if we need more allocation clusters. If so, we allocate one on the store and assign it to the last pointer in the current buffer. Then we update the store with the new allocation buffer. If we created a new allocation cluster, we clear the buffer and run through the loop again. Otherwise, we're done.
Now to the truncation code:

    begin
        // Truncating...
        Pull( ( Value - 1 ) div _Max_Index ) ; // Get new last allocation cluster
        Value := Value - _Current_AC * _Max_Index ;
        P := _Current_AC_Pointer ;
        while( P <> 0 ) do // Until remaining allocation clusters are dealt with
        begin
            for I := 0 to _Max_Index - 1 do // For each pointer in this cluster
            begin
                if( Buffer^[ I ] = 0 ) then // Unallocated cluster
                begin
                    break ; // No need to go through the rest of the pointers once we hit a 0
                end ;
                if( Value > 0 ) then // Until we reach the
                begin
                    dec( Value ) ;
                end else
                begin
                    _Store.Deallocate( Buffer^[ I ], _Clustersize ) ;
                    // Release the cluster...
                    Buffer^[ I ] := 0 ;
                    if( Buffer^[ 0 ] <> 0 ) then
                    begin
                        _Store.Write( P, _Buffer_Size, Buffer^ ) ; // Update on-store cluster immediately for safety
                    end ;
                end ; // if( Value > 0 )
            end ; // for I := 0 to _Max_Index - 1
            Last := Buffer^[ _Max_Index ] ;
            if( Buffer^[ 0 ] = 0 ) then // This allocation cluster is now unused
            begin
                _Store.Deallocate( P, _Buffer_Size ) ; // Remove allocation cluster
            end else
            if( Value <= 0 ) then
            begin
                Buffer^[ _Max_Index ] := 0 ; // No more allocation clusters are needed for this file
                _Store.Write( P, _Buffer_Size, Buffer^ ) ; // Update on-store cluster immediately for safety
            end ;
            P := Last ;

            // Move to next allocation cluster in chain...
            if( P <> 0 ) then
            begin
                _Store.Read( P, _Buffer_Size, Buffer^ ) ;
            end ;
        end ; // while( P <> 0 )
    end ; // if( New_Size > _Size )

Truncating the file means freeing up file clusters and zeroing the pointers to them, for all clusters beyond the requested new size. We pull the last allocation cluster, figure out how many file cluster pointers we need to keep in the last allocation cluster. Then we process through the pointers in the allocation cluster, deallocating file clusters, and setting the pointer to 0. After going through all the pointers, if the first pointer is 0 then we know that all of them are zero and the allocation cluster itself is deallocated. Otherwise, we clear the last pointer (the pointer to the next allocation cluster), and save the current allocation cluster to the store. Then we read in the next cluster and go through the process all over again. Once we find a null pointer, we know we are at the end of the pointers, and we can early out of the loop.
As a general rule, code should be written so that data isn't lost or corrupted. This goes double for operating system code. The last thing we want to do is have UOS be unreliable. For this reason, each time we deallocate a file data cluster, we update the on-store copy of the allocation cluster so that, if something goes wrong, we don't have allocation cluster pointers referencing areas of the disk that have been reallocated to another file. If we wait until the end of the loop before we update the on-store copy of the allocation cluster, we run the risk that if something fails (hardware or software) before we can update the store, we will have left-over, and errant, pointers in our allocation clusters. Note that we didn't update the on-store allocation cluster each time we allocated something during a file expansion. It wasn't necessary in that case, because the on-store allocation cluster didn't have any errant pointers (all pointers were null or pointed to allocated file clusters). But, doesn't that mean that there is the possiblilty of orphaning some clusters if we don't complete the process of writing the updated allocation cluster back to the store? Yes. But we don't lose or corrupt any data as a consequence. We can always run a check on the disk to recover allocated, but unused, areas of the store. In fact, this should be automatic if the system crashes. But more about these issues in article 15. Now the rest of the method:

    _Current_AC := -1 ; // Reset buffer cache state
    _Size := New_Size ;
    if( New_Size = 0 ) then
    begin
        _Root := 0 ;
        _Current_AC := -1 ;
    end ;
end ; // TCOM_Allocation_Cluster_Manager64.Set_Size

We set _Current_AC to -1 to force Pull to search from the root. This is because the last cached allocation cluster is most likely no longer what it was before the call to Set_Size. We then update the cached size, and clear _Root if the new size is 0.
So, now we have a working class. But the multiple updates of the on-store allocation cluster during a truncate has terrible performance. It would be nice to move the dealloations to the end of the loop so we only update the allocation cluster once. But we can't trade reliability for speed. Can we have the best of both worlds? Yes, we can. We simply have to make sure that we update the allocation cluster before we deallocate the file data clusters. So, we can save up a list of data clusters to deallocate and then deallocate them after we deallocate the allocation cluster, which we can now safely do once after the loop ends. The most extreme case of a file trunction is when a file is set to size 0, which happens when it is deleted. Obviously, a file can only be deleted once and other forms of truncation tend to happen with much less frequency than expansions, therefore we can create a dynamic buffer to hold the list of pending deallocates when a truncation is required, without worrying about undue performance issues. We could create a buffer for this list when the class is created, but chances are that it would not be used and thus it would waste memory for each file UOS had open. We could also use a static buffer for all instances of this class to use, but then we would have problems with multithreaded access to that buffer, which would be a headache on several levels. So, here is our improved version of the truncation code:

        List := TInteger64_List.Create ;
        try
            Pull( ( Value - 1 ) div _Max_Index ) ; // Get new last allocation cluster
            Value := Value - _Current_AC * _Max_Index ;
            P := _Current_AC_Pointer ;
            while( P <> 0 ) do // Until remaining allocation clusters are dealt with
            begin
                for I := 0 to _Max_Index - 1 do // For each pointer in this cluster
                begin
                    if( Buffer^[ I ] = 0 ) then // Unallocated cluster
                    begin
                        break ; // No need to go through the rest of the pointers once we hit a 0
                    end ;
                    if( Value > 0 ) then // Until we reach the
                    begin
                        dec( Value ) ;
                    end else
                    begin
                        List.Add( Buffer^[ I ] ) ; // Mark the cluster for release
                        Buffer^[ I ] := 0 ;
                    end ; // if( Value > 0 )
                end ; // for I := 0 to _Max_Index - 1
                Last := Buffer^[ _Max_Index ] ;
                if( Buffer^[ 0 ] = 0 ) then // This allocation cluster is now unused
                begin
                    Deallocate_Allocation_Cluster( P ) ; // Remove allocation cluster
                end else
                if( Value <= 0 ) then
                begin
                    Buffer^[ _Max_Index ] := 0 ; // No more allocation clusters are needed for this file
                    Write( P ) ; // Update on-store cluster immediately for safety
                end ;
                for I := 0 to List.Count - 1 do
                begin
                    _Store.Deallocate( List[ I ], _Clustersize ) ;
                end ;
                List.Clear ;
                P := Last ;

                // Move to next allocation cluster in chain...
                if( P <> 0 ) then
                begin
                    Read( P ) ;
                end ;
            end ; // while( P <> 0 )
        finally
            List.Free ;
        end ;

And, of course, this requires a new variable declaration at the top:

    List : TInteger64_List ;

TInteger64_List is a class that implements a list of 64-bit integer values. But wait! We are creating and freeing this dynamically-allocated class without using the heap object pointed to by _Heap. Won't that be a problem? As it turns out, we can intercept dynamic memory allocation/deallocations in Delphi/FPC and revector them to our heap class instance. I brought up the whole issue of the _Heap merely to illustrate the point that we can't rely on any compiler feature which assumes the presence of a particular operating system. If you are using a compiler which doesn't allow you to intercept built-in heap calls, then you have no choice but to assign a heap class instance to your classes. And you will have to replace the TInteger64_List with a class that makes use of the heap class.

UOS needs to track various metrics in order to provide statistics for performance monitoring and feedback. In terms of our allocation cluster management class, the only metric we need to keep track of is the process of following a pointer from one allocation cluster to another. We call this a "cluster turn", or "turn". The number of turns that occur is important because it directly relates to the performance of file operations. A high number of turns could indicate that file cluster sizes need to be increased, for instance. So, we will add a static variable to track the number of turns. This will be used by all instances of our class. Here's the definition:

var Turns : int64 = 0 ; // Number of allocation cluster turns

Then we modify the Pull routine to increment the turns each time it moves from one allocation cluster to another via the link pointer. Simple as that. Remember that there are some turns that occur in the Set_Size and Set_Root methods as well, so we'll add an increment statement there. This is how Pull looks now:

procedure TCOM_Allocation_Cluster_Manager64.Pull( AC : longint ) ;

var Original, P : TStore_Address64 ;

begin
    Buffer ; // Make sure buffer is allocated
    if( _Current_AC = AC ) then
    begin
        exit ; // Already have it
    end ;
    Original := AC ;

    if( ( _Current_AC >= 0 ) and ( AC > _Current_AC ) ) then // After our current AC
    begin
        AC := AC - _Current_AC - 1 ; // How much further to go
        P := Buffer^[ _Max_Index ] ;
    end else
    begin
        P := _Root ; // Start from the beginning
    end ;
    while( AC >= 0 ) do
    begin
        dec( AC ) ;
        _Store.Read( P, _Buffer_Size, Buffer^ ) ;
        _Current_AC_Pointer := P ;
        P := Buffer^[ _Max_Index ] ;
        inc( Turns ) ;
    end ;
    _Current_AC := Original ;
end ;

In the next article, we'll discuss some more changes we can make to the class to improve it.