FILESCAN

The FILESCAN system service is used to parse a file into its component fields. As we've seen before with other system services, the FILESCAN system service is embedded within the VMS executive, but it doesn't really belong there. Thus, in UOS, it is included in Starlet instead. This saves the overhead of system calls for a simple non-security-related and non-resource-sharing services. You might ask, "but doesn't the executive also need to parse files? And shouldn't we put that parsing code in a single place?" As a general rule, yes, we want to avoid duplicating code. However, in this case we make an exception and have it in two places for performance sake. Remember: it is relatively expensive to call across rings. And, as you will see, the source code for parsing exists in a single place, it is simply compiled into two different places: ring 0 executive and ring 3 starlet.

Here is the definition for SYS_FILESCAN:

FILESCAN searches a string for a file specification and parses it into its fields.

Format
SYS_FILESCAN( filespec, itemlist, fldflags, auxout, retlen )

Arguments
filespec

String to be searched. This is the address of an SRB structure that points to the string.

itemlist

Item list specifying which components of the file specification are to be returned. This argument is the address of the first descriptor in the list. Each descriptor has the following layout:

Bytes Description

0-3 Item Code. Indicates the file specification field to return. See table below for valid item codes.

4-7 Length. This is where the length of the field is written. If the corresponding field is missing, 0 is written here.

8-15 Address. The address of the start of the field is written here. If the corresponding field is missing, 0 is written here.

fldflgs

The address of where a bitmask is written that indicates which fields of the file specification were specified. If this value is 0, this is ignored. The fields are indicated by the following flag values:

Symbol name Description

FSCN_V_DEVICE Device name

FSCN_V_DIRECTORY Directory name

FSCN_V_NAME File name

FSCN_V_NODE Node name

FSCN_V_NODE_ACS Access control string of primary node

FSCN_V_NODE_PRIMARY Primary (first) node name

FSCN_V_NODE_SECONDARY Secondary (additional) node information

FSCN_V_ROOT Root directory name string

FSCN_V_TYPE File type

FSCN_V_VERSION Version number

auxout

Auxillary output buffer. This argument is the address of an SRB structure which indicates where the complete file specification (as provided) is written. Any secondary node information is stripped from the output and quotations are reduced and simplified.
If this value is 0, it is ignored. If provided, the values written to the item list are addresses within this auxillary buffer.

retlen

Auxillary output buffer length. This is the address of an 8-byte integer where the length of the auxillary output buffer is written. If this is 0, no length is written.

Description
The FILESCAN service searches a string for a file specification and parses the fields of that specification. The length and starting addresses of the fields requested are returned. If a field was requested in the item list but not found in the file specification, a length and address of 0 are written to the descriptor. The descriptor list is terminated with a descriptor that has an item code of 0.

The information returned describes the entire contiguous file specification. For example, to extract only the file name and type from the full string, you can use the address of the file name, for the length of the sum of the name and type to obtain the full file name. However, FSCN_NODE_PRIMARY and FSCN_NODE_ACS items contain no double colon (::), so you would have to add 2 to the sum of the lengths of those two fields to obtain the entire node specification.

FILESCAN does not check all aspects of validity in the specification. For instance, it does not verify that the node name specified corresponds to a valid node. Nor does it validate the access control string contents. Nor does it verify the existence of the path or specified file. It treats wildcard characters as any other valid character. It doesn't validate lengths either. Finally, multiple whitespace characters are not collapsed to a single space, nor trimmed from the beginning or end of the string. However, spaces, tabs, and delimiting characters must be enclosed in quotes if they are part of the file name or type, otherwise the character is treated as a terminator for the specification. Quotes used to indicate a node access control string require that the node name be enclosed in quotes and that the quotes delimiting the access control string must be doubled (""). For example, the node specification:
abcd"efg"
would need to be specified as:
"abcd""efg"""

FILESCAN does not assume default values for missing fields or perform logical name translations.

Here are the item codes that can be used in the passed descriptors:

Symbol name Description

FSCN_DEVICE Returns length and starting address of the device name, including the colon (:).

FSCN_DIRECTORY Returns the length and starting address of the path, including all backslashes (\).

FSCN_FILESPEC Returns the length and starting address of the full file specification.

FSCN_NAME Returns the length and starting address of the file name, including no syntatical elements.

FSCN_NODE Returns the length and starting address of the node, access control string, and double colon (::).

FSCN_NODE_ACS Returns the length and starting address of the node access control string.

FSCN_NODE_PRIMARY Returns the length and starting address of the primary node name. It doesn't include the double colon (::) or access control string.

FSCN_NODE_SECONDARY Returns the length and starting address of the secondary node string.

FSCN_ROOT Returns the length and starting address of the root diretory of the path, including backslashes (\).

FSCN_TYPE Returns the length and starting address of the file type, including the leading dot (.).

FSCN_VERSION Returns the length and starting address of the version, including the leading semicolon (;).

function FILESCAN( var Name : string ) : TStringList ;

var Descriptors : array[ FSCN_NODE..FSCN_VERSION + 1 ] of TScan_Descriptor ;
    I : integer ;
    S : string ;
    SRB : TSRB ;

begin
    // Setup...
    Result := TStringList.Create ;
    Set_String( Name, SRB ) ;
    fillchar( Descriptors, sizeof( Descriptors ), 0 ) ;
    Result.Add( '' ) ; // Position 0 unused
    Result.Add( '' ) ; // Position 1 also unused
    for I := FSCN_NODE to FSCN_VERSION do
    begin
        Descriptors[ I ].Code := I ;
        Result.Add( '' ) ;
    end ;

    // Make the call
    SYS_FILESCAN( int64( @SRB ), int64( @Descriptors ), 0, 0, 0 ) ;

    // Parse into result stringlist...
    for I := FSCN_NODE to FSCN_VERSION do
    begin
        if( Descriptors[ I ].Address <> 0 ) then
        begin
            setlength( S, Descriptors[ I ].Length ) ;
            move( PChar( Descriptors[ I ].Address )[ 0 ], PChar( S )[ 0 ], length( S ) ) ;
            Result[ I ] := S ;
        end ; // if( Descriptor[ I ].Address <> 0 )
    end ; // for I := FSCN_V_NODE to FSCN_V_VERSION
end ;

This new function is added to the PasStarlet unit to provide a Pascal interface to the FILESCAN system service. It creates and returns a TStringList instance that contains the parsed file specification. Each offset in this result list corresponds to a FSCN_ constant. Because FSCN_NODE is 2 we add two null strings to the list first off (the list's first element is index 0).

Note: The reason that we start with FSCN_NODE is because the value of 0 is used to indicate the end of a descriptor list and 1 indicates the entire filespec. Since we are returning the individual fields, we set both of those indexes in the result list to null strings.

We fill the descriptor array with zeroes so that the last descriptor is a terminator. Then we loop through the constants for each field, adding a null to the result list as a placeholder, and then set the corresponding descriptor item code to the FSCN_ constant value. Then we call the SYS_FILESCAN system service to parse the specification.

The FSCN_ constants are arranged in the order, from left to right, that the file specification fields occur. Thus, we can iterate from FSCN_NODE to FSCN_VERSION and fill the result list indexes with the appropriate fields. For each descriptor with a non-zero address, we set S to the appropriate length and then copy that many bytes into S and set the string list element to that.

function SYS_FILESCAN( Name, Itemlist : int64 ; Fldflgs : int64 = 0 ;
    auxout : int64 = 0 ; retlen : int64 = 0 ) : int64 ;

begin
    Result := LIB_SYS_FILESCAN( Name, Itemlist, Fldflgs, auxout, retlen ) ;
end ;

As we discussed at the start of the article, UOS implements this service in Starlet. Thus, we redirect a call to SYS_FILESCAN to starlet. Of course, the Starlet version can be called directly even though it doesn't exist in VMS.

type TScan_Descriptor_Array = array[ 0..65535 ] of TScan_Descriptor ;
     PScan_Descriptor_Array = ^TScan_Descriptor_Array ;

function LIB_SYS_FILESCAN( Name, Itemlist : int64 ; Fldflgs : int64 = 0 ;
    auxout : int64 = 0 ; retlen : int64 = 0 ) : int64 ;

var I, L : integer ;
    Access, Device, Nam, Node, Node2, Path, FType, Version, Root : string ;
    Access_Offset, Device_Offset, Name_Offset, Path_Offset, Type_Offset, Version_Offset : integer ;
    Descriptors : PScan_Descriptor_Array ;
    _Offset : integer ;
    Res : int64 ;
    S : string ;
    SRB : PSRB ;

This function exists in Starlet. It implements the FILESCAN system service. First we define a Scan_Descriptor_Array type to make access to the passed descriptor list easier to manipulate in code. This will allow up to 65,536 descriptors in the list, which is far, far more than needed in this case.

begin
    // Setup...
    if( ItemList = 0 ) then
    begin
        exit ;
    end ;
    SRB := PSRB( Name ) ;
    S := Get_String( SRB^ ) ;
    Descriptors := PScan_Descriptor_Array( pointer( Itemlist ) ) ;
    _Offset := 0 ;
    while( pos( copy( S, _Offset + 1, 1 ), ' '#9 ) > 0 ) do
    begin
        inc( _Offset ) ;
    end ;

If the passed item list pointer is nil, we return immediately. Otherwise we get the file specification from the SRB pointer and point our descriptor array to the passed item list. Then we iterate through the passed string until we find a non-whitespace character. _Offset indicates the offset from the start of the string where the actual file specification begins.

    // Parse the string...
    Parse_Filename( copy( S, _Offset + 1, length( S ) ), Node, Access, Node2, Device, Path, Nam, FType, 
        Version ) ;
    if( Auxout <> 0 ) then
    begin
        SRB := PSRB( Auxout ) ;
        S := Node + Device + Path + Nam + FType + Version ;
        if( length( S ) > SRB.Length ) then
        begin
            setlength( S, SRB.Length ) ;
        end ;
        move( PChar( S )[ 0 ], PChar( SRB.Buffer )[ 0 ], length( S ) ) ;
        if( Retlen <> 0 ) then
        begin
            Res := length( S ) ;
            move( Res, Pchar( Retlen )[ 0 ], sizeof( Res ) ) ;
        end ;
    end ;
    Access_Offset := length( Node ) - 2 ;
    Device_Offset := length( Node ) + length( Access ) + length( Node2 ) ;
    Path_Offset := Device_Offset + length( Device ) ;
    Name_Offset := Path_Offset + length( Path ) ;
    Type_Offset := Name_Offset + length( Nam ) ;
    Version_Offset := Type_Offset + length( FType ) ;

Next we call Parse_Filename to do the actual parsing (covered later in this article). If the Auxout value was provided, we build the full specification from its component fields, make sure it is no longer than the result buffer, truncating it if necessary, and then writing its length to the Retlen address, if that was provided. Whether or not Auxout was provided, by the end of the above code, the SRB structure contains the base address that we will be using to write back to the descriptors.

    L := 2 ;
    while( L <= length( Path ) ) do
    begin
        if( Path[ L ] = '\' ) then
        begin
            Root := copy( Path, 1, L ) ;
            break ;
        end ;
        inc( L ) ;
    end ;

Next, we extract the root directory from the path. We start at the second character to avoid the root backslash (if present), and proceed until we find a backslash or the end of the path. Root is then set to that portion of the path.

    // Return addresses...
    I := 0 ;
    while( I < 65535 ) do
    begin
        case Descriptors^[ I ].Code of
            FSCN_NODE : Set_Descriptor( I, 0, Node + Access ) ;
            FSCN_NODE_ACS : Set_Descriptor( I, Access_Offset, Access ) ;
            FSCN_NODE_PRIMARY : Set_Descriptor( I, 0, copy( Node, 3, length( Node ) ) ) ; // Excluding ::
            FSCN_NODE_SECONDARY : Set_Descriptor( I, 0, '' ) ; //todo
            FSCN_DEVICE : Set_Descriptor( I, Device_Offset, Device ) ;
            FSCN_ROOT : Set_Descriptor( I, Path_Offset, Root ) ;
            FSCN_DIRECTORY : Set_Descriptor( I, Path_Offset, Path ) ;
            FSCN_NAME : Set_Descriptor( I, Name_Offset, Nam ) ;
            FSCN_TYPE : Set_Descriptor( I, Type_Offset, FType ) ;
            FSCN_VERSION : Set_Descriptor( I, Version_Offset, Version ) ;
            else break ;
        end ;
        inc( I ) ;
    end ;

Next, we iterate through the passed descriptors, until we hit a terminator (any code other than one of the FSCN_ constants) or reach the 65,536th one, and call the local Set_Descriptor function to write the value to the current descriptor.

    if( Fldflgs <> 0 ) then
    begin
        Res := 0 ;
        if( length( Node ) <> 0 ) then
        begin
            Res := FSCN_V_NODE ;
        end ;
        if( length( Access ) <> 0 ) then
        begin
            Res := Res or FSCN_V_NODE_ACS ;
        end ;
        if( length( Node ) <> 0 ) then
        begin
            Res := Res or FSCN_V_NODE_PRIMARY ;
        end ;
        // Res := Res or FSCN_V_NODE_SECONDARY ; //todo
        if( length( Device ) <> 0 ) then
        begin
            Res := Res or FSCN_V_DEVICE ;
        end ;
        if( length( Path ) <> 0 ) then
        begin
            Res := Res or FSCN_V_ROOT or FSCN_V_DIRECTORY ;
        end ;
        if( length( Nam ) <> 0 ) then
        begin
            Res := Res or FSCN_V_NAME ;
        end ;
        if( length( FType ) <> 0 ) then
        begin
            Res := Res or FSCN_V_TYPE ;
        end ;
        if( length( Version ) <> 0 ) then
        begin
            Res := Res or FSCN_V_VERSION ;
        end ;
        move( Res, PChar( Fldflgs )[ 0 ], sizeof( Res ) ) ;
    end ; // if( Fldflgs <> 0 )
end ; // LIB_SYS_FILESCAN

Finally, if Fldflgs was specified, we construct the bitmask and then write it to the specified address. A given flag is set if the corresponding field is non-null.

    procedure Set_Descriptor( Index, Offset : integer ; const S : string ) ;

    begin
        if( ( length( S ) = 0 ) or ( Offset + _Offset >= SRB.Length ) ) then
        begin
            Descriptors^[ Index ].Address := 0 ;
            Descriptors^[ Index ].Length := 0 ;
        end else
        begin
            Descriptors^[ Index ].Address := SRB^.Buffer + Offset + _Offset ;
            Descriptors^[ Index ].Length := length( S ) ;
        end ;
    end ;

This local function writes the descriptor element Index. Offset is the offset from the start of the string and _Offset is any offset for leading whitespace. If the passed field value is null or is after the end of the SRB buffer, we write 0 to the address and length. Otherwise we write the length and the address which is built from the SRB's buffer address plus the offsets.

Note: You might be wondering how the offset could ever be greater than the length of the SRB buffer, since that was passed in to us. But remember that if Auxout is specified, we write the file specification to that address and return addresses for that buffer instead of the buffer passed in to us. The problem stems from the possibility that the buffer specified for Auxout may be too small to hold the entire specification. In that case, the offset for some of the fields may be beyond what fit in the buffer. So if the offset is beyond that buffer length, we cannot write an address that is not part of the specification - and may very well be invalid memory.

procedure Parse_Filename( const S : string ;
    var Node, Access, Secondary_Node, Device, Path, Name, Extension, Version : string ) ;

var I, L, Last, P : integer ;
    In_Quotes : boolean ;

begin
    // Setup
    In_Quotes := False ;
    I := 0 ;

This function does the actual file specification parsing. It is placed in the UOS_Util unit which is used by both the FIP and Starlet units. Thus, although in a compiled sense it exists in two places, there is only a single source file for it.

    // Find node name...
    Access := '' ;
    I := 1 ;
    Node := Parse_Field_Until( '::', ':' ) ;
    if( pos( '::', Node ) = 0 ) then // No node
    begin
        Node := '' ;
        I := 1 ;
    end else
    begin
        P := pos( '"', Node ) ;
        if( P > 0 ) then
        begin
            Access := copy( Node, P, length( Node ) ) ;
            setlength( Node, P - 1 ) ;
            Node := Node + '::' ;
            setlength( Access, length( Access ) - 2 ) ;
        end ;
    end ;

First, we get the Node by calling Parse_Field_Until. If there is no double-colon in the node name, then it is not a node name and we clear it and reset our string index (I) to the beginning of the string. Otherwise, we then look for any quote within the node name. If found, this indicates an access control string, so we extract that from the node name and save it in Access.

    // Device...
    L := I ;
    Device := Parse_Field_Until( ':', '\' ) ;

    // Path...
    L := I ;
    Path := Parse_Field_While( '\', ' ', True ) ;

Next we parse the device name from the specification, and then the path. You will note the two different parse functions. We will cover these later in the article, but the are largely the same. The difference is that Parse_Field_Until processes the string until the specified terminator is found, while Parse_Field_While processes the string until the last instance of the specified delimiter is found. Thus, a device name ends as soon as a colon or double-colon is encountered. The path ends with the last backslash that is found in the specification.

    // Name...
    L := I ;
    Name := Parse_Field_While( '.', ';') ;
    if( copy( Name, length( Name ), 1 ) = '.' ) then
    begin
        setlength( Name, length( Name ) - 1 ) ; // Trim dot
        dec( I ) ;
    end ;

Next we parse the file name. This ends with the last encountered dot. The Parse_Field_While function includes the dot, so if the last character of the name is a dot, we trim it and decrement the string index. We have to check to see if there is a dot at the end, because there may be no dot in the file specification at all.

    // Type...
    L := I ;
    Extension := Parse_Field_While( ';' , ' ' ) ;
    if( Last > 0 ) then // Semicolon found
    begin
        if( Valid_Version( Last, I ) ) then
        begin
            Extension := copy( S, L, Last - L ) ;
            I := Last ;
        end else
        begin
            Extension := Extension + Parse_Field_Until( ' ', ' ' ) ;
        end ;
    end ;

Now we parse the file type (extension). The parsing ends at the last semicolon. But if the characters following the semicolon are not an integer value, it isn't a version but part of the extension. This is checked with the Valid_Version function. If the version isn't valid, we add it to the extension by calling the Parse_Field_Until with a terminating delimiter of space. Since spaces already end parsing, this essentially parses until the end of the file specification, thus putting all of the remaining specification into the extension.

    // Version...
    Version := '' ;
    L := I ;
    if( I <= length( S ) ) then
    begin
        if( S[ I ] = ';' ) then // Found version
        begin
            Version := ';' ;
            inc( I ) ;
            if( S[ I ] = '-' ) then
            begin
                inc( I ) ;
                Version := Version + '-' ;
            end ;
            while( ( I <= length( S ) ) and ( pos( S[ I ], '0123456789' ) > 0 ) ) do
            begin
                Version := Version + S[ I ] ;
                inc( I ) ;
            end ;
        end ;
    end ; // if( I <= length( S ) )
end ; // Parse_Filename

Finally, we iterate through the rest of the specification ending when the end of the specification is reached or a value that ends a valid integer value. The integer can be negative, so the first character after the semicolon may be a dash (-).

    function Valid_Version( Starting, Ending : integer ) : boolean ;

    begin
        Result := False ;
        if( Ending > length( S ) ) then
        begin
            Ending := length( S ) ;
        end ;
        if( S[ Starting ] <> ';' ) then
        begin
            exit ;
        end ;
        inc( Starting ) ;
        if( copy( S, Starting, 1 ) = '-' ) then
        begin
            inc( Starting ) ;
        end ;
        while( Starting <= Ending ) do
        begin
            if( pos( copy( S, Starting, 1 ), '0123456789' ) = 0 ) then
            begin
                exit ;
            end ;
            inc( Starting ) ;
        end ;
        Result := True ;
    end ;

This local function checks to see if the range of characters passed to the function constitute a valid version field.

    function Parse_Field_Until( const Terminator, Next_Terminator : string ) : string ;

    begin
        // Find field...
        Result := '' ;
        while( I <= length( S ) ) do
        begin
            if( S[ I ] = '"' ) then
            begin
                if( copy( S, I + 1, 1 ) = '"' ) then // ""
                begin
                    if( In_Quotes ) then
                    begin
                        Result := Result + '"' ;
                    end ;
                    inc( I ) ;
                end else
                begin
                    In_Quotes := not In_Quotes ;
                end ;
            end else
            if( In_Quotes ) then
            begin
                Result := Result + S[ I ] ;
            end else
            if( ( S[ I ] = ' ' ) or ( S[ I ] = ',' ) or ( S[ I ] = HT ) ) then
            begin
                break ;
            end else
            if( copy( S, I, length( Terminator ) ) = Terminator ) then
            begin
                I := I + length( Terminator ) ;
                Result := Result + Terminator ;
                break ;
            end else
            if( S[ I ] = Next_Terminator ) then // Found terminator for next field, not this one
            begin
                Result := '' ;
                break ;
            end else
            begin
                Result := Result + S[ I ] ;
            end ;
            inc( I ) ;
        end ; // while( I <= length( S ) )
    end ; // .Parse_Field_Until

This local function parses through the file specification one character at a time. If a quote is encountered, we toggle the quote flag unless the next character is also a quote, in which case we treat it as a single literal quote. Otherwise, if a space or comma is found, we end the processing. Otherwise, if we encounter the terminator, we include that in the current field and exit. Otherwise, if we encounter the next field's terminator, we exit with a null field - because if we encounter the next field's terminator without encountering this field's terminator then it means that this field was not found at all. If we make it through this if...then gauntlet, we add the character to the result and then loop to the next character.

    function Parse_Field_While( Separator, Next_Separator : string ; Required : boolean = False ) : string ;

    begin
        Last := 0 ;
        Result := '' ;
        while( I <= length( S ) ) do
        begin
            if( S[ I ] = '"' ) then
            begin
                if( copy( S, I + 1, 1 ) = '"' ) then // ""
                begin
                    if( In_Quotes ) then
                    begin
                        Result := Result + '"' ;
                    end ;
                    inc( I ) ;
                end else
                begin
                    In_Quotes := not In_Quotes ;
                end ;
            end else
            if( In_Quotes ) then
            begin
                Result := Result + S[ I ] ;
            end else
            if( S[ I ] = Separator ) then
            begin
                Result := Result + Separator ;
                Last := I ;
            end else
            if( S[ I ] = Next_Separator ) then
            begin
                break ;
            end else
            if( ( S[ I ] = ' ' ) or ( S[ I ] = ',' ) or ( S[ I ] = HT ) ) then
            begin
                break ;
            end else
            begin
                Result := Result + S[ I ] ;
            end ;
            inc( I ) ;
        end ; // while( I <= length( S ) )
        if( pos( Separator, Result ) = 0 ) then // No separator
        begin
            if( Required ) then // Separator is required
            begin
                Result := '' ;
                I := L ;
            end ;
        end else
        begin
            Result := copy( S, L, Last - L + 1 ) ;
            I := Last + 1 ;
        end ;
        if( I < 1 ) then
        begin
            I := 1 ;
        end ;
    end ; // .Parse_Field_While

This local function works much like the previous Parse_Field_Until function. However, we process to the end of the specification keeping track of the position of the last encountered separator. Then we return everything up to, and including, that separator. There are two possibilities. Either the separator is required (such as with a path) or it is not. The difference is that if the separator is required but not found, then the field is considered to have been omitted and a null string is returned. If the separator is not required, then the field simply continues to the end of the specification. An example is the file name. It terminates with an extension separator (.), if found. But the lack of an extension simply means that the name continues to the end of the specification - or until the specified next separator.

In the next article, we will look at the SYS_PARSE service.