1 Introduction
2 Ground Rules

Building a File System
3 File Systems
4 File Content Data Structure
5 Allocation Cluster Manager
6 Exceptions and Emancipation
7 Base Classes, Testing, and More
8 File Meta Data
9 Native File Class
10 Our File System
11 Allocation Table
12 File System Support Code
13 Initializing the File System
14 Contiguous Files
15 Rebuilding the File System
16 Native File System Support Methods
17 Lookups, Wildcards, and Unicode, Oh My
18 Finishing the File System Class

The Init Program
19 Hardware Abstraction and UOS Architecture
20 Init Command Mode
21 Using Our File System
22 Hardware and Device Lists
23 Fun with Stores: Partitions
24 Fun with Stores: RAID
25 Fun with Stores: RAM Disks
26 Init wrap-up

The Executive
27 Overview of The Executive
28 Starting the Kernel
29 The Kernel
30 Making a Store Bootable
31 The MMC
32 The HMC
33 Loading the components
34 Using the File Processor
35 Symbols and the SSC
36 The File Processor and Device Management
37 The File Processor and File System Management
38 Finishing Executive Startup

Users and Security
39 Introduction to Users and Security
40 More Fun With Stores: File Heaps
41 File Heaps, part 2
42 SysUAF
43 TUser
44 SysUAF API

Terminal I/O
45 Shells and UCL
46 UOS API, the Application Side
47 UOS API, the Executive Side
48 I/O Devices
49 Streams
50 Terminal Output Filters
51 The TTerminal Class
52 Handles
53 Putting it All Together
54 Getting Terminal Input
55 QIO
56 Cooking Terminal Input
57 Putting it all together, part 2
58 Quotas and I/O

UCL
59 UCL Basics
60 Symbol Substitution
61 UCL Command execution
62 UCL Command execution, part 2
63 UCL Command Abbreviation
64 ASTs
65 UCL Expressions, Part 1
66 UCL Expressions, Part 2: Support code
67 UCL Expressions, part 3: Parsing
68 SYS_GETJPIW and SYS_TRNLNM
69 UCL Expressions, part 4: Evaluation

UCL Lexical Functions
70 PROCESS_SCAN
71 PROCESS_SCAN, Part 2
72 TProcess updates
73 Unicode revisted
74 Lexical functions: F$CONTEXT
75 Lexical functions: F$PID
76 Lexical Functions: F$CUNITS
77 Lexical Functions: F$CVSI and F$CVUI
78 UOS Date and Time Formatting
79 Lexical Functions: F$CVTIME
80 LIB_CVTIME
81 Date/Time Contexts
82 SYS_GETTIM, LIB_Get_Timestamp, SYS_ASCTIM, and LIB_SYS_ASCTIM
83 Lexical Functions: F$DELTA_TIME
84 Lexical functions: F$DEVICE
85 SYS_DEVICE_SCAN
86 Lexical functions: F$DIRECTORY
87 Lexical functions: F$EDIT and F$ELEMENT
88 Lexical functions: F$ENVIRONMENT
89 SYS_GETUAI
90 Lexical functions: F$EXTRACT and F$IDENTIFIER
91 LIB_FAO and LIB_FAOL
92 LIB_FAO and LIB_FAOL, part 2
93 Lexical functions: F$FAO
94 File Processing Structures
95 Lexical functions: F$FILE_ATTRIBUTES
96 SYS_DISPLAY
97 UCL Lexical functions: F$GETDVI
98 Parse_GetDVI

Glossary/Index


Download sources
Download binaries

Unicode revisted
We discussed unicode in depth back in article 17. We introduced the TUnicode_String class that we used for file processing. The strings are static with a fixed maximum length of 384 Unicode UTF32 characters. While this works fine for our needs up to this point, and is faster than using a dynamic-length string, we need something of more general utility for future uses. So, we've renamed the TUnicode_String class to TStatic_Unicode_String, and created a new class named TUnicode_String. This new class has the same methods (plus a couple extra that we'll cover later in the article). The main difference is that the Contents array is dynamic in this class - it is resized as necessary.

There are different ways we could have handled the new class - including making the new one a descendent of the previous one, or having them both descend from a common ancestor. However, virtualizing and generalizing to make this happen would result in additional overhead. For the file system, we're concerned about performance, so we will leave the old static string class the way it is and the file processing will continue to make use of it.

Here is the new TUnicode_String class (the new Compare method is described later). We shan't describe the code in any detail as it is almost identical to the old class, with the addition of some dynamic array handling - except for the new methods.

type TUnicode_String = class
                           public // Constructors and destructors...
                               constructor Create ;
                               destructor Destroy ; override ;

                           private // Instance data...
                               Has_Asterisk : boolean ;
                               Contents : array of cardinal ;

                           protected // Property handlers...
                               // Return length of our contents...
                               function Get_Length : integer ;
                               procedure Set_Length( Value : integer ) ;

                           public // API...
                               procedure Append( S : TUnicode_String ) ;
                                   overload ;

                               procedure Append( S : string ) ;
                                   overload ;

                               procedure Append( S : cardinal ) ;
                                   overload ;

                               function As_String : string ;

                               // Assign our contents from a UTF8 string...
                               procedure Assign_From_String( const S : string ;
                                   Format : integer ) ;

                               // Remove Length characters starting at Index
                               procedure Delete( Index, Len : integer ) ;

                               { Compare our substring with the passed
                                 substring.  W_Start indicates the start
                                 position and _Length is the length of the
                                 substring.  Name_Start indicates the start in the
                                 compared string.  Result:
                                     -1 = Less than Match
                                     0 = Equals Match
                                     1 = Greater than Match }
                               function Compare( Wildcard_Start, _Length : integer ;
                                   Match : TUnicode_String ;
                                   Match_Start : integer ) : integer ;

                               // Create...
                               function Copy( Start, Len : integer ) : TUnicode_String ;

                               // Return edited string...
                               function Edit( Options, Escape : cardinal ) : TUnicode_String ;

                               // Return true if our contents are equal to the match
                               function Equal( Match : TUnicode_String ) : boolean ;

                               // Return character at specific index
                               function Get_Char( Index : integer ) : cardinal ;

                               // Insert character at given position
                               procedure Insert( Position : integer ;
                                   Value : cardinal ) ;

                               // Convert our characters to lowercase...
                               procedure Lowercase ;

                               // Position of substring...
                               function Pos( const Value : string ;
                                   Start : integer = 1 ) : integer ; overload ;

                               function Pos( const Value : TUnicode_String ;
                                   Start : integer = 1 ) : integer ; overload ;

                               // Return rightmost instance of Value
                               function RPos( Value : char ) : integer ;

                               // Return Pos, considering wildcards
                               function Wildcard_Pos( Value : TUnicode_String ;
                                   Start : integer = 1 ) : integer ;

                           public // Properties...
                               property Length : integer
                                   read Get_Length
                                   write Set_Length ;
                       end ; // TUnicode_String

And here are the updated methods.

// TUnicode_String methods...

// Constructors and destructors...

constructor TUnicode_String.Create ;

begin
    inherited Create ;

    setlength( Contents, 1 ) ;
    Contents[ 0 ] := 0 ;
end ;


destructor TUnicode_String.Destroy ;

begin
    setlength( Contents, 0 ) ;

    inherited Destroy ;
end ;


// API...

function TUnicode_String.As_String : string ;

var Dummy, Loop : integer ;

begin
    System.setlength( Result, Length ) ;
    for Loop := 1 to Length do
    begin
        Dummy := Contents[ Loop ] ;
        if( Dummy > 127 ) then
        begin
            Dummy := Dummy or 128 ;
        end ;
        Result[ Loop ] := chr( Dummy ) ;
    end ;
end ;


procedure TUnicode_String.Assign_From_String( const S : string ;
    Format : integer ) ;

var Index, Size, Mask : integer ;
    Value : cardinal ;

begin
    Index := 1 ; // Index in Spec
    Contents[ 0 ] := 0 ;
    if( Format = ST_UTF8 ) then // UTF8
    begin
        while( Index <= system.length( S ) ) do
        begin
            Value := 0 ;
            if( S[ Index ] > #$FC ) then
            begin
                Size := 6 ;
                Mask := 1 ;
            end else
            if( S[ Index ] > #$F8 ) then
            begin
                Size := 5 ;
                Mask := 3 ;
            end else
            if( S[ Index ] > #$F0 ) then
            begin
                Size := 4 ;
                Mask := 7 ;
            end else
            if( S[ Index ] > #$E0 ) then
            begin
                Size := 3 ;
                Mask := $F ;
            end else
            if( S[ Index ] > #$C0 ) then
            begin
                Size := 2 ;
                Mask := $1F ;
            end else
            begin
                Size := 1 ;
                Mask := $7F ;
            end ;
            while( Size > 0 ) do
            begin
                dec( Size ) ;
                Value := Value or ( ord( S[ Index ] ) and Mask ) ;
                if( Size > 0 ) then
                begin
                    Value := Value shl 6 ;
                end ;
                Mask := $3F ;
                inc( Index ) ;
            end ;
            inc( Contents[ 0 ] ) ;
            setlength( Contents, Contents[ 0 ] + 1 ) ;
            Contents[ Contents[ 0 ] ] := Value ;
        end ; // while( Index < system.length( S ) )
    end else
    begin
        setlength( Contents, system.Length( S ) div Format + 1 ) ;
        Value := 0 ;
        Index := 1 ; // Index in S
        while( Index <= system.length( S ) ) do
        begin
            move( PChar( S )[ Index - 1 ], Value, Format ) ;
            Index := Index + Format ;
            inc( Contents[ 0 ] ) ;
            Contents[ Contents[ 0 ] ] := Value ;
        end ;
    end ; // if( Format = ST_UTF8 )
end ; // TUnicode_String.Assign_From_String


function TUnicode_String.Get_Length : integer ;

begin
    Result := Contents[ 0 ] ;
end ;


procedure TUnicode_String.Set_Length( Value : Integer ) ;

begin
    Contents[ 0 ] := Value ;
    setlength( Contents, Value + 1 ) ;
end ;


function TUnicode_String.Copy( Start, Len : integer ) : TUnicode_String ;

begin
    // Setup...
    Result := TUnicode_String.Create ;
    if( Start > Length ) then
    begin
        exit ;
    end ;
    if( Start < 1 ) then
    begin
        Start := 1 ;
    end ;
    if( Start + Len - 1 > Length ) then
    begin
        Len := Length - Start + 1 ;
    end ;

    Result.Length := Len ;
    move( Contents[ Start ], Result.Contents[ 1 ], Len * sizeof( cardinal ) ) ;
end ; // TUnicode_String.Copy


// Remove Length characters starting at Index
procedure TUnicode_String.Delete( Index, Len : integer ) ;

begin
    if( Index < 1 ) then
    begin
        Index := 1 ;
    end ;
    if( Index > Length ) then
    begin
        exit ;
    end ;
    if( Index + Len > Length ) then
    begin
        Length := Index - 1 ;
        exit ;
    end ;
    move( Contents[ Index + Len ], Contents[ Index ], Len ) ;
    Length := Length - Len ;
end ; // TUnicode_String.Delete


function TUnicode_String.Equal( Match : TUnicode_String ) : boolean ;

var Loop : integer ;

begin
    Result := False ;
    if( Length <> Match.Length ) then
    begin
        exit ;
    end ;
    for Loop := 1 to Length do
    begin
        if( ( Contents[ Loop ] <> Match.Contents[ Loop ] ) ) then
        begin
            if(
                ( Contents[ Loop ] <> ord( '?' ) )
                and
                ( Match.Contents[ Loop ] <> ord( '?' ) )
              ) then
            begin
                exit ;
            end ;
        end ;
    end ;
    Result := True ;
end ; // TUnicode_String.Equal


procedure TUnicode_String.Insert( Position : integer ; Value : cardinal ) ;

begin
    setlength( Contents, system.length( Contents ) + 1 ) ;
    move( Contents[ Position ], Contents[ Position + 1 ], 
        ( system.Length( Contents ) - Position - 1 ) * sizeof( cardinal ) ) ;
    Contents[ Position ] := Value ;
    inc( Contents[ 0 ] ) ;
end ;


procedure TUnicode_String.Lowercase ;

var Dummy, V : integer ;
    _Folding_Index : integer ;

begin
    Dummy := 1 ;
    while( Dummy <= Length ) do
    begin
        V := lowcase( Contents[ Dummy ], _Folding_Index ) ;
        if( ( V = 0 ) and ( _Folding_Index >= 0 ) ) then
        begin
            Contents[ Dummy ] := Foldings[ _Folding_Index, 1 ] ;
            for V := 2 to 3 do
            begin
                if( Foldings[ _Folding_Index, V ] <> 0 ) then
                begin
                    inc( Dummy ) ;
                    Insert( Dummy, Foldings[ _Folding_Index, V ] ) ;
                end ;
            end ;
        end else
        begin
            Contents[ Dummy ] := V ;
        end ;
        inc( Dummy ) ;
    end ;
end ; // TUnicode_String.Lowercase


function TUnicode_String.Pos( const Value : TUnicode_String ;
    Start : integer = 1 ) : integer ;

var Dummy, Dummy1 : integer ;
    Found : boolean ;

begin
    Result := 0 ;
    if( Start > Length ) then
    begin
        exit ;
    end ;
    if( Value.Length > Length - Start + 1 ) then
    begin
        exit ; // Substring is longer than our contents
    end ;
    for Dummy := Start to Length - Value.Length + 1 do
    begin
        Found := True ;
        for Dummy1 := 1 to Value.Length do
        begin
            if(
                ( Value.Contents[ Dummy1 ] <> Contents[ Dummy1 + Dummy - 1 ] )
                and
                ( Contents[ Dummy1 + Dummy - 1 ] <> ord( '?' ) )
                and
                ( Value.Contents[ Dummy1 ] <> ord( '?' ) )
              ) then
            begin
                Found := False ;
                break ;
            end ;
        end ; // for Dummy1
        if( Found ) then
        begin
            Result := Dummy ;
            exit ;
        end ;
    end ; // for Dummy
end ; // TUnicode_String.Pos


function TUnicode_String.Pos( const Value : string ;
    Start : integer = 1 ) : integer ;

var Dummy, Dummy1 : integer ;
    Found : boolean ;

begin
    Result := 0 ;
    if( Start > Length ) then
    begin
        exit ;
    end ;
    if( System.Length( Value ) > Length - Start + 1 ) then
    begin
        exit ; // Substring is longer than our contents
    end ;
    for Dummy := Start to Length - system.length( Value ) + 1 do
    begin
        Found := True ;
        for Dummy1 := 1 to system.length( Value ) do
        begin
            if( ord( Value[ Dummy1 ] ) <> Contents[ Dummy1 + Dummy - 1 ] ) then
            begin
                Found := False ;
                break ;
            end ;
        end ; // for Dummy1
        if( Found ) then
        begin
            Result := Dummy ;
            exit ;
        end ;
    end ; // for Dummy
end ; // TUnicode_String.Pos


function TUnicode_String.RPos( Value : char ) : integer ;

var Loop, V : cardinal ;

begin
    V := ord( Value ) ;
    for Loop := Length downto 1 do
    begin
        if( Contents[ Loop ] = V ) then
        begin
            Result := Loop ;
            exit ;
        end ;
    end ;
    Result := 0 ;
end ;


function TUnicode_String.Wildcard_Pos( Value : TUnicode_String ;
    Start : integer = 1 ) : integer ;

var Dummy, Dummy1 : integer ;
    Found : boolean ;

begin
    Result := 0 ;
    if( Start > Length ) then
    begin
        exit ;
    end ;
    if( Value.Length > Length - Start + 1 ) then
    begin
        exit ; // Substring is longer than our contents
    end ;
    for Dummy := Start to Length - Value.Length + 1 do
    begin
        Found := True ;
        for Dummy1 := 1 to Value.Length do
        begin
            if(
                ( Value.Contents[ Dummy1 ] <> Contents[ Dummy1 + Dummy - 1 ] )
                and
                ( Contents[ Dummy1 + Dummy - 1 ] <> ord( '?' ) )
                and
                ( Value.Contents[ Dummy1 ] <> ord( '?' ) )
              ) then
            begin
                Found := False ;
                break ;
            end ;
        end ; // for Dummy1
        if( Found ) then
        begin
            Result := Dummy ;
            exit ;
        end ;
    end ; // for Dummy
end ; // TUnicode_String.Wildcard_Pos

Now, let's look at the new methods for this class.

procedure TUnicode_String.Append( S : TUnicode_String ) ;

var L : integer ;

begin
    if( S = nil ) then
    begin
        exit ;
    end ;
    L := Length + 1 ;
    Length := Length + S.Length ;
    move( S.Contents[ 1 ], Contents[ L ], S.Length * sizeof( cardinal ) ) ;
end ;


procedure TUnicode_String.Append( S : string ) ;

var I, L : integer ;

begin
    L := Length ;
    Length := Length + system.length( S ) ;
    for I := 1 to system.length( S ) do
    begin
        Contents[ L + I ] := ord( S[ I ] ) ;
    end ;
end ;


procedure TUnicode_String.Append( S : cardinal ) ;

begin
    Length := Length + 1 ;
    Contents[ Length ] := S ;
end ;
These methods are used to append another string to ourselves. This overloaded function has three versions: one takes a Pascal string, one takes a TUnicode_String, and one takes a single Unicode character value.

function TUnicode_String.Get_Char( Index : integer ) : cardinal ;

begin
    if( ( Index < 1 ) or ( Index > Length ) ) then
    begin
        Result := 0 ;
        exit ;
    end ;
    Result := Contents[ Index ] ;
end ;
This method simply provides a means of accessing the internal contents, one character at a time.

// Do a wildcard comparison...
function TUnicode_String.Compare( Wildcard_Start, _Length : integer ;
    Match : TUnicode_String ; Match_Start : integer ) : integer ;

var Loop : integer ;

begin
    // Setup...
    Result := 0 ;
    if( _Length < 1 ) then
    begin
        exit ;
    end ;
    if( Wildcard_Start < 1 ) then
    begin
        Wildcard_Start := 1 ;
    end ;
    if( Match_Start < 1 ) then
    begin
        Match_Start := 1 ;
    end ;
    if( Wildcard_Start + _Length > Length + 1 ) then
    begin
        _Length := Length - Wildcard_Start + 1 ;
    end ;
    if( Match_Start + _Length > Match.Length + 1 ) then
    begin
        _Length := Match.Length - Match_Start + 1 ;
    end ;

    // Do comparison...
    for Loop := 0 to _Length - 1 do
    begin
        if( Contents[ Wildcard_Start + Loop ] <>
            Match.Contents[ Match_Start + Loop ] ) then
        begin
            if(
                ( Contents[ Wildcard_Start + Loop ] <> ord( '?' ) )
                and
                ( Match.Contents[ Wildcard_Start + Loop ] <> ord( '?' ) )
              ) then
            begin
                if( Contents[ Wildcard_Start + Loop ] < Match.Contents[ Wildcard_Start + Loop ] ) then
                begin
                    Result := -1 ;
                end else
                begin
                    Result := 1 ;
                end ;
                exit ;
            end ;
        end ;
    end ;
end ; // TUnicode_String.Compare
This function is like the equivalent method in TStatic_Unicode_Sring. Except, unlike that method, this one doesn't return a boolean, but an integer value which indicates the following:
  • -1 = Our contents are less than the Match
  • 0 = Strings are equal
  • 1 = Our contents are greater than the Match
After the setup, we loop through the contents

Let's look at the Compare function which compares two Unicode strings.

// Compare strings.  Result: 0 = equal, -1 = L < R, 1 = L > R
function Compare( L, R : TUnicode_String ; Wildcard : boolean ) : integer ;

var Dummy, Dummy1, I, Len : integer ;
    LC, RC : cardinal ;
    _L, _R, S, Temp : TUnicode_String ;
    _Has_Asterisk : boolean ;
    L_Start, R_Start : integer ;
    L_End, R_End : integer ;

begin
    // Setup...
    Result := 0 ; // Assume equal
    _L := L.Copy( 1, L.Length ) ;
    _R := R.Copy( 1, R.Length ) ;
    _Has_Asterisk := False ;
First, we make copies of the two strings and set up for the rest of the function.
    // Pre-normalize the specification...
    if( Wildcard ) then
    begin
        Dummy := _R.pos( '**' ) ;
        while( Dummy > 0 ) do
        begin
            _R.Delete( Dummy, 1 ) ;
            Dummy := _R.pos( '**' ) ;
        end ;
        if( _R.As_String = '*' ) then
        begin
            exit ; // Wildcard matches anything/everything
        end ;
        Dummy := _R.pos( '*?' ) ;
        while( Dummy > 0 ) do
        begin
            Temp := _R.copy( Dummy + 2, _R.length ) ;
            _R.Length := Dummy ;
            _R.Append( '?*' ) ;
            _R.Append( Temp ) ;
            Temp.Free ;
            Dummy := _R.pos( '*?' ) ;
        end ;
        _Has_Asterisk := _R.pos( '*' ) > 0 ;
    end ; // if( Wildcard )
The Compare function can perform wildcard or normal comparisons. If the Wildcard parameter is true, we will do a wildcard comparison. In that case, we perform some normalization of the strings to simplify the following comparison code. For instance, we convert double asterisks to single asterisks, and switch all "?*" to "*?". If an asterisk is present, we set the Has_Asterisk flag.

    Len := _L.Length ;
    if( Len > _R.Length ) then
    begin
        Len := _R.Length ;
    end ;
Next, we determine the maximum number of characters to compare by minimizing the two lengths and setting Len to that value. Without this, we might try to index beyond the end of one of the strings as we loop through the data.

    // Non-wildcard check...
    if( not _Has_Asterisk ) then
    begin
        for I := 1 to Len do
        begin
            LC := _L.Contents[ I ] ;
            RC := _R.Contents[ I ] ;
            if( LC <> RC ) then
            begin
                if( Wildcard ) then
                begin
                    if( LC = ord( '?' ) ) or ( RC = ord( '?' ) ) then
                    begin
                        continue ;
                    end ;
                end ;
                if( LC < RC ) then
                begin
                    Result := -1 ;
                    exit ;
                end else
                begin
                    Result := 1 ;
                    exit ;
                end ;
            end ; // if( LC <> RC )
        end ; // for I := 1 to Len

        // If we get here, they are equal up to position Len...
        if( L.Length <> R.Length ) then
        begin
            if( L.Length > R.Length ) then
            begin
                Result := 1 ;
            end else
            begin
                Result := -1 ;
            end ;
        end ;
        exit ;
    end ; // if( not _Has_Asterisk )
If we don't have an asterisk (and a wildcard) then we have a straight-forward task. We loop through the contents, comparing each character. If a character doesn't match, we allow a match on either character being a "?". Otherwise, we set the result to -1 or 1, as appropriate and exit.
If we get through the entire contents (up to Len) with everything being equal, we still aren't done. If the lengths of the strings differ, we set the result appropriately. On otherwise equal strings, the longer one will be "greater".

    // Do wildcard match...
    R_Start := 1 ;
    L_Start := 1 ;
    R_End := R.Length ;
    L_End := L.Length ;

    // Check prefix before first wildcard...
    Dummy := _R.pos( '*' ) ;
    if( Dummy > R_Start ) then // Something before the asterisk
    begin
        Result := R.Compare( R_Start, Dummy - R_Start, L, L_Start ) ;
        if( Result <> 0 ) then
        begin
            exit ;
        end ;
        R_Start := Dummy ;
        L_Start := Dummy ;
    end ; // if( Dummy > R_Start )

    // Check suffix after last wildcard...
    Dummy := _R.RPos( '*' ) ;
    if( Dummy < _R.Length ) then
    begin
        Result := R.Compare( Dummy + 1, R.Length - Dummy, L, L.Length - ( R.Length - Dummy ) + 1 ) ;
        if( Result <> 0 ) then
        begin
            exit ;
        end ;
        R_End := Dummy ;
        L_End := L.Length - ( R.Length - Dummy ) ;
    end ; // if( Dummy < R.Length )

    // Check for remaining matches, left-to-right...
    while( R_Start <= R_End ) do
    begin
        if( R_Start >= R_End ) then
        begin
            break ; // All that's left in the wildcard spec is an asterisk - we match
        end ;
        Dummy := _R.Pos( '*', R_Start + 1 ) ;
        S := R.Copy( R_Start + 1, Dummy - R_Start - 1 ) ;
        Dummy1 := _L.Wildcard_Pos( S, L_Start ) ;
        S.Free ;
        if( Dummy1 = 0 ) then
        begin
            exit ; // Not found
        end ;

        // Move past wildcard and matching characters...
        L_Start := Dummy1 + Dummy - R_Start - 1 ;
        R_Start := Dummy ; // Move past wildcard and matching characters
    end ; // while
end ; // Compare
We won't go over this code since it is almost identical to the _Compare function we discussed in article 17. The only differences are that it deals with the new TUnicode_String class and returns the -1/0/1 values rather than a boolean.

Another new function is the Edit function:

function TUnicode_String.Edit( Options, Escape : cardinal ) : TUnicode_String ;

var AH, AL : cardinal ;
    Dummy : integer ;
    Escaped : boolean ;
    ESI : integer ;
    _Folding_Index : integer ;
    Quote_Type : cardinal ;
    Last : integer ;
    Leading : boolean ;
    OK : boolean ;
    Space : boolean ;
    V, V2, V3 : integer ;

begin
    Result := TUnicode_String.Create ;

    // Quick check...
    if( Length = 0 ) then // No edits on zero-length strings
    begin
        Exit ;
    end ;

    // Normalize the Options...
    if ( Options And ( 1024 or 64 ) ) = 1024 or 64 then
    begin
        Options := Options And Not 1024 ;
    end ;
    // Disallow [] to {} if [] to ()

    if ( Options And 6144 ) = 6144 then
    begin
        Options := Options And Not 4096 ;
    end ;
    // Disallow () to {} if () to []

    if ( Options And 24576 ) = 24576 then
    begin
        Options := Options And Not 16384 ;
    end ;
    // Disallow {} to [] if {} to ()

    // Setup...
    Space := False ; // No spaces
    Last := 0 ;
    Quote_Type := 0 ; // Not within significant quotes
    ESI := 0 ;
    Leading := True ;
    Escaped := False ;

    // Process string...
    while( ESI < Length ) do
    begin
        inc( ESI ) ; // Increment source string pointer
        OK := True ; // This byte is OK - so far
        AH := Contents[ ESI ] ; // Current character
        if( Quote_Type = 0 ) then // No quotes
        begin
            AL := AH ; // Save original value
            if(
                ( AH <> ord( ' ' ) )
                and
                ( AH <> _HT )
              ) then
            begin
                Leading := False ;
            end ;
            if( ( Options and 2 ) = 2 ) then // Remove all spaces/tabs
            begin
                if(
                    ( AH = ord( ' ' ) )
                    or
                    ( AH = _HT )
                  ) then
                begin
                    OK := False ;
                end ;
            end ;
            if( ( Options and 4 ) = 4 ) then // Ignore special values?
            begin
                if( ( AH = _NUL ) or ( AH = _LF ) or ( AH = _FF ) or ( AH = _CR ) or ( AH = _DEL ) ) then
                begin
                    OK := False ;
                end ;
            end ;
            if( ( Options and 8 ) = 8 ) then // Ignore leading spaces
            begin
                if( Leading ) then
                begin
                    OK := False ;
                end ;
            end ;
            if( ( Options and 16 ) = 16 ) then // Reduce tabs/spaces to single space
            begin
                if(
                    ( AH = ord( ' ' ) )
                    or
                    ( AH = _HT )
                  ) then
                begin
                    if( Space ) then
                    begin
                        OK := False ;
                    end else
                    begin
                        Space := True ;
                        if( AH = _HT ) then
                        begin
                            AH := ord( ' ' ) ;
                        end ;
                    end ;
                end ;
            end ;
            if( OK and ( ( Options and $40000 ) <> 0 ) ) then
            begin
                OK := AH >= ord( ' ' ) ;
            end ;
            if( ( Options and 32 ) = 32 ) then // Lower to upper case
            begin
                V2 := 0 ;
                V3 := 0 ;
                if( ESI < Length ) then
                begin
                    V2 := Contents[ ESI + 1 ] ;
                end ;
                if( ESI + 1 < Length ) then
                begin
                    V3 := Contents[ ESI + 2 ] ;
                end ;
                AH := Upcase( AH, V2, V3, _Folding_Index ) ;
                ESI := ESI + _Folding_Index - 1 ;
            end ;
            if( ( AH = AL ) and ( ( Options and 512 ) = 512 ) ) then
            begin // Upper to lower case (and not already converted the other way)
                V := lowcase( AH, _Folding_Index ) ;
                if( ( V = 0 ) and ( _Folding_Index >= 0 ) ) then
                begin
                    AH := Foldings[ _Folding_Index, 1 ] ;
                    for V := 2 to 3 do
                    begin
                        if( Foldings[ _Folding_Index, V ] <> 0 ) then
                        begin
                            Result.Append( AH ) ;
                            AH := Foldings[ _Folding_Index, V ] ;
                            inc( Dummy ) ;
                        end ;
                    end ;
                end else
                begin
                    AH := V ;
                end ;
            end ;

            if( ( Options and 64 ) = 64 ) then // Convert [] to ()
            begin
                if( AL = ord( '[' ) ) then
                begin
                    AH := ord( '(' ) ;
                end else
                if( AL = ord( ']' ) ) then
                begin
                    AH := ord( ')' ) ;
                end ;
            end ;
            if( ( AH = AL ) and ( ( Options and 2048 ) = 2048 ) ) then // Convert () to []
            begin
                if( AL = ord( '(' ) ) then
                begin
                    AH := ord( '[' ) ;
                end else
                if( AL = ord( ')' ) ) then
                begin
                    AH := ord( ']' ) ;
                end ;
            end ;

            if( ( Options and 4096 ) = 4096 ) then // Convert () to braces
            begin
                if( AL = ord( '(' ) ) then
                begin
                    AH := ord( '{' ) ;
                end else
                if( AL = ord( ')' ) ) then
                begin
                    AH := ord( '}' ) ;
                end ;
            end ;
            if( ( AH = AL ) and ( ( Options and 8192 ) = 8192 ) ) then // Convert braces to ()
            begin
                if( AL = ord( '{' ) ) then
                begin
                    AH := ord( '(' ) ;
                end else
                if( AL = ord( '}' ) ) then
                begin
                    AH := ord( ')' ) ;
                end ;
            end ;

            if( ( Options and 1024 ) = 1024 ) then // Convert [] to braces
            begin
                if( ( AL = ord( '[' ) ) or ( AL = ord( ']' ) ) ) then
                begin
                    AH := ord( AL ) + 32 ;
                end ;
            end ;
            if( ( AH = AL ) and ( ( Options and 16384 ) = 16384 ) ) then // Convert braces to []
            begin
                if( ( AL = ord( '{' ) ) or ( AL = ord( '}' ) ) ) then
                begin
                    AH := ord( AL ) - 32 ;
                end ;
            end ;
        end ; // if( Quote_Type = 0 )

        if( OK ) then
        begin
            Result.Append( AH ) ; // Build result
            if( ( AH = ord( ' ' ) ) or ( AH = _HT ) ) then
            begin
                Space := True ;
            end ;
        end ;
        if( ( AH <> ord( ' ' ) ) and ( AH <> _HT ) ) then
        begin
            Space := False ;
            if( ( Options and 256 ) = 256 ) then // Allow no alter within quotes
            begin
                if(
                    ( ( AH = ord( '"' ) ) or ( AH = 39 ) )
                    and
                    ( not Escaped )
                  ) then
                begin
                    if( Quote_Type = 0 ) then // Not in quotes
                    begin
                        Quote_Type := AH ;
                    end else
                    if( Quote_Type = AH ) then
                    begin
                        Quote_Type := 0 ;
                    end ;
                end ;
            end ;
        end ; // if( ( AH <> ord( ' ' ) ) and ( AH <> _HT ) )
        if(
            ( AH <> ord( ' ' ) )
            and
            ( AH <> _HT )
          ) then
        begin
            Last := Result.Length ; // Last non-space character
        end ;
        if( Escape <> 0 ) then // Have an escape character
        begin
            if( AL = Escape ) then // This is an escape character
            begin
                if( ESI = 1 ) then // If first character, then always an escape
                begin
                    Escaped := True ;
                end else
                begin
                    Escaped := not Escaped ;
                    // If previous character was an escape, then it is escaping this character
                end ;
            end else
            begin
                Escaped := False ;
            end ; // if( AL = Escape )
        end ; // if( Escape <> #0 )
    end ; // while( ESI < Length )
    if( ( Options and 128 ) = 128 ) then // Trim following spaces
    begin
        if( Quote_Type = 0 ) then // Not within quotes
        begin
            Result.Length := Last ; // Last non-space character position
        end ;
    end ;
end ; // TUnicode_String.Edit
We won't cover this function line-by-line. Suffice it to say that this method returns a string which is a copy of our string, with certain textual transformations. The specific transformation(s) performed depends on the bitmask Options passed to the method:
BitMeaning
2Remove all white space (spaces and tabs).
4Remove all nulls, linefeeds, formfeeds, carriage returns, and DELs.
8Remove leading spaces/tabs.
16Reduce multiple white space (spaces and tabs) to a single space.
32Convert lower to upper case.
64Convert square parentheses to normal parentheses: [] to ()
128Remove all trailing white space.
256Leave characters within quotes (" or ') unmodified.
512Convert upper case to lower case.
1024Convert square parentheses to braces: [] to {}
2048Convert parentheses to square parentheses: () to []
4096Convert parentheses to braces: () to {}
8192Convert braces to parentheses {} to ()
16384Convert braces to square parentheses {} to []
The Escape parameter (if non-zero) can be treated as an "escape" code to mark the following quote as one that should not be treated as a quote in terms of option 256.

The function basically sets up for the processing, including removing some bits where they conflict with each other (such as 64 and 1024). Then we step through each character in the string, and move it (or the transformed value) to the result.

One might wonder why we don't have a different function for the different transformations. The reason is that we often want to perform multiple transformations on a given string and it is more efficient to have a single routine do all of the requested transformations in one fell swoop.

function upcase( V1, V2, V3 : cardinal ; var _Count : integer ) : cardinal ; //NEW

var L : integer ;

begin
    Result := V1 ;
    _Count := 1 ; // Indicates one character was translated
    if( ( V1 < $61 ) or ( V1 > $118DF ) ) then // Not within range of our table
    begin
        exit ;
    end ;
    for L := 0 to high( Foldings ) do
    begin
        if(
            ( Foldings[ L, 1 ] = V1 )
            and
            ( ( Foldings[ L, 2 ] = V2 ) or ( Foldings[ L, 2 ] = 0 ) )
            and
            ( ( Foldings[ L, 3 ] = V3 ) or ( Foldings[ L, 3 ] = 0 ) )
          ) then
        begin
            Result := Foldings[ L, 0 ] ;
            if( Foldings[ L, 3 ] <> 0 ) then
            begin
                _Count := 3 ;
            end else
            if( Foldings[ L, 2 ] <> 0 ) then
            begin
                _Count := 2 ;
            end ;
            exit ;
        end ;
    end ; // for L := 0 to high( Foldings )
end ; // upcase
The Edit method makes use of one new function. UpCase does the opposite of the LowCase function - it converts lowercase characters to upper case. The process is a little bit more complicated because going from upper to lower case involves a single input character. But the other way might require up to three input characters to generate a single output character. So we have to search the Foldings array for matching 1-3 lowercase characters. Since some conversions involve only one or two lowercase values, we have to take that into account. We return, via the _Count parameter, the number of input characters that were used to convert to a single upper case character.

function Lowercase( const S : string ) : string ;

begin
    Result := Edit( S, 512, 0 ) ;
end ;


function Edit( S : string ; Options, Escape : integer ) : string ;

var I : integer ;
    US, US1 : TUnicode_String ;

begin
    Result := S ;
    for I := 1 to length( S ) do
    begin
        if( S[ I ] > #127 ) then // Have UTF8
        begin
            US := TUnicode_String.Create ;
            US.Assign_From_String( S, ST_UTF8 ) ;
            US1 := US.Edit( Options, Escape ) ;
            US.Free ;
            Result := US1.As_String ;
            US1.Free ;
            exit ;
        end ;
    end ;
    Result := CommonUt.Edit( S, Options ) ;
end ;
Lastly, we have two functions that perform actions on UTF-8 strings without the calling code having to construct TUnicode_String instances solely to do them. Lowercase simply calls Edit with the lowercase option flag of 512.
Edit assumes a UTF-8 string. It scans the string for any character with the top bit set. Such are interpreted as a UTF-8 character and the function constructs a Unicode_String instance, assigns the string to it, calls the instance's Edit method, returns the Pascal string version of the result, and frees the instance. If there are no UTF-8 characters, we call a version of Edit that is in the CommonUt unit. We won't cover that function here. It is basically the same code as the Edit method, but operates on a normal ANSI string. It is a little more efficient in memory space and performance on these strings, not to mention that we don't need to construct a TUnicode_String instance and free it. It is possible that on a really long string, the cost of scanning the entire string to find an UTF-8 character exceeds the cost of constructing the instance and skipping the scan, but most of our use of this routine will be dealing with fairly short strings.

In the next article, now that we've laid the groundwork, we will begin our examination of UCL lexical functions.

 

Copyright © 2019 by Alan Conroy. This article may be copied in whole or in part as long as this copyright is included.