This program solves some of the exercises given
in Chapter 5.
It is a truth table program which in each line of the table
writes the values of all subformulas.
In addition it accepts multi-character operators as input,
and on completion of the table it reports whether
the input formula was contingent, a tautology or a self-contradiction.
If the input formula is terminated with a ? instead of a .,
the table is not output, but only the brief report;
in this case processing of the table is abandoned
as soon as it becomes known that the formula is contingent.
On encountering an error it places a pointer under the offending
symbol.
The program also optimises the two loops
in the truth table procedure by creating a linked list
of nodes containing the next propositional atom.
This linking is done in the main program as soon as the entire formula
has been read;
hence in the truth table procedure
the two FOR loops are eliminated.
For an input file such as
The main program again consists of a REPEAT loop that has to be
executed for each formula that is read.
Two terminators of formulas are allowed:
a period . will produce the full truth table,
and ? will only produce a minimal synopsis ---
'contingent', 'tautology' or 'contradiction'.
While the formula was being read,
an initially empty set of propositional variables is collected;
this set is now used by a FOR loop to link all
the variables to their alphabetic successors,
using an ARRAY indexed by lowercase letters
having as entries the next lowercase letter
occurring in the formula.
This single FOR loop replaces the various wasteful FOR loops
in the original program.
After this linking,
if a full table is required then the header line will be written.
For the list of variables the linked list can be traversed.
For the formula itself
the content of the input buffer is written out;
the input buffer was copied by the scanner from the input file.
It is also necessary to write a * underneath the main operator,
so the parser and code generator have to keep track of where
in the input buffer the various operators are located.
This facility is used not just here,
but also when writing values of subformulas underneath their operator.
The translator:
This again consists of nested procedures
formula, expression, term, and factor.
Again they are responsible for parsing, error reporting and
generating the postfix code.
But the code that is generated now also has to contain
an indication of where the value of a subformula is to be written,
which is underneath the operator which gave rise to the code
being generated.
So code consists of a postfix character operator
together with an integer index into the current input line
and hence into the input buffer.
Later, when a line of the table is being computed,
the values of subformulas are placed into the buffer
in the place of the operator.
But the generation of the postfix operator and the index
occurs well after the reading of the input operator.
So, when an input operator is being read by the scanner,
its index in the input buffer has to be saved in a local variable
of the parsing procedure.
Then when code is generated,
the postfix operator is generated together with the saved index.
Thus code consists of an array of records
containing a postfix character as before,
but also an integer index.
The scanner:
Procedure getsym
has to recognise single printing characters as before,
but it also has to keep track of where in the input line they occurred,
and in addition it has to recognise multi-letter operators.
An auxiliary procedure getch reads single characters and copies
them into the buffer.
Procedure getsym begins by calling getch to skip over any blanks.
The current length of the buffer is then made available
as the index to be used for code generation.
If the current character is an uppercase letter,
then the current symbol is a multi-letter symbol and has to be
read into a short string.
A simple linear search then determines which operator it is.
Only the single character equivalent is made available to the parser.
So the parser never knows whether a single character operator
was used or its multi-letter equivalent.
If the multi-letter symbol is not recognised
as a standard operator,
then an error is reported.
The error reporting procedure will be called when
a formula is terminated illegally,
when a factor starts with the wrong symbol,
when a right hand parenthesis is missing,
or when a multi-letter operator is not recognised.
Since the scanner keeps a count of how many characters
there were in the current input line at the moment the error was detected,
it is possible to write a pointer underneath the last symbol
that had been read.
This is followed by another line with the error message.
Line generation: The recursive procedure for generating lines in the truth table is just a modification of the one in Chapter 5. The procedure can now make use of the linked list of variables that was created by the main program just after the formula had been read. If the parameter of the procedure is a possible variable, then that variable is first made true and then made false, and for both cases the procedure then calls itself recursively using as the parameter the next variable in the list. Eventually all variables will have values. If a full table is not required, then the formula can be evaluated for the current assignment of truth values. But if a full table is required, then it is necessary to step through the linked list of variables to write their current values. Then the formula is evaluated by a procedure described in the next paragraph. This procedure writes the values of subformulas into what was the input buffer, and when it returns the transformed input buffer can be written out to complete the line of the truth table.
Evaluation:
In the original program in Chapter 5 the evaluation of a formula
was done by a function returning the truth value of the whole formula.
Here however the truth values of all subformulas are required, too,
and they are to be placed into what was originally
the input buffer,
but what is now becoming an output buffer.
The reason for using the same buffer
has to do with the alignment of the operators
in the input formula and their truth values in the output line.
If the input formula contained spaces, tabs or even rub-outs,
it is not easy to achieve the alignment by using column numbers.
The trick of using the same buffer solves the problem in a simple way.
The actual evaluation of the postfix code
proceeds on a stack in a manner which is familiar by now.
For each operator that was executed,
the truth value that has been computed
is then placed into the output buffer as either 0 or 1,
at the position given by the infix index
that had been associated with the postfix operator.
Note that placing the values into the buffer
occurs temporally in postfix order
but spatially in infix positions.
When the FOR loop has been executed
and the formula has been evaluated,
then it has to be recorded whether
the formula was true or was false.
If a full table is not required
and by now the formula has been both true and false,
the program can report that the formula is contingent
and can jump to the start in readiness for a new formula.
The source is as follows:
This program solves some of the exercises in Chapter 10. It is a semantic tableau program which uses the trunk before branch optimisation. It also has an inbuilt macro expansion mechanism. Instead of using a recursive descent parser or a non-recursive predictive top down parser, it uses a non-recursive LR1 bottom up parser. Finally, for variety, it uses pointers to implement the internal code.
The macro mechanism: This facility allows users to define capital letters as short for any sequence of characters. These macros are therefore text macros, and the sequence of characters is not checked syntactically at the point of definition but only at the point of use when the sequence of characters is presented to the parser. Definitions of macros can occur anywhere in the source file, independently of what the parser is currently doing. Consequently the entire macro definition mechanism is hidden inside the scanner and a few global variables that are only accessed by the scanner. The parser at no stage knows whether a single character symbol it is reading came directly from the input file or whether it came from a macro body. Hence the scanner and its macro mechanism could be used for a quite different input language.
In detail, definitions of macros are preceded by a !
to enter definition mode.
Then uppercase letters are expected, each followed by =
and then a macro body enclosed in an arbitrary quotation character
to start and end the body.
A sequence of definitions is terminated by a period ..
Macro bodies can be inspected by a '@'
which causes the names of the defined macros
and their bodies to be typed out.
Finally, macros are used by just writing their name.
Macro bodies are always read from the input file,
but one macro M1 may contain calls to another macro M2.
If this happens, then the body that will be
substituted for M2 will be whatever M2 is defined to be
at the time M1 is being expanded,
and not what it was when M1 was being defined.
However, use before definition is not permitted.
The macro mechanism of this program is the part
that is most visible to the user;
the other parts are described later in this section.
The demonstration: In the following demonstration, the first two parts exercise the macro mechanism about as much as one would normally want to use it. The second part solves the puzzle of Portia's caskets, it is described in more detail below. The next part illustrates the semantic semantic tableau with the "trunk before branch" optimisation, which of course is not really visible to the user. The last part traces the stack of the bottom up parser and translator that is used here.
Portia's caskets:
The second part of the demonstration solves the puzzle
given at the end of Chapter 10.
We need to find a formula F which describes
all aspects of the puzzle and then
use the program to find an assignment of truth values
to the atomic formulas of F which will make F true
and thereby solve the puzzle.
Since the program tests whether a formula is a tautology
and reports countermodels if it is not,
we must submit the negation of F to the program.
Any countermodel to that negation is a model of F.
To construct the required formula F, we need atomic formulas as follows:
F is best constructed in parts:
A is simply a threefold disjunction,
each disjunct says that the portrait is in one of the caskets
and not in the others.
Subformula B is a fourfold disjunction,
the first three say that one of the inscriptions is true
and the others are false, and the fourth says that all
inscriptions are false.
Subformula C has to capture the meaning of the inscriptions,
so it is a conjunction of three formulas:
the first says that the inscription on the gold casket is true
if and only if the portrait is in the gold casket,
the second says that the inscription on the silver casket is true
if and only if the portrait is not in the silver casket,
and the third says that the inscription on the lead casket is true
if and only if the portrait is not in the gold casket.
The required formula F
is just the conjunction (A & B & C),
and its negation, -F, has to be submitted to the program.
One countermodel of -F is found, so it is a model of F.
In that model we see that b is true
whereas a and c are false --- so the portrait is in the silver casket.
We also see that f is true whereas d and e are false ---
so the inscription on the lead casket is true.
The puzzle is solved, and Portia has to marry the computer.
Here is the demonstration:
The macro mechanism is implemented in the scanner,
procedure getch,
whose sole function is to produce the next character for the parser to process.
Without the macro mechanism,
this procedure would be just a REPEAT loop which reads
one ore more characters from the input file
until the last character read is a printing character.
However, when a macro is being expanded,
the next character is to come not from the input file
but from the body of the macro.
In that case the macro memory is consulted,
it consists essentially of a large array of characters
together with one integer pointer that is incremented
each time a character is taken from the body of a macro.
If the end of the current macro is reached,
then a stack of current macro calls is popped
and the next lower macro is continued.
At this point a character has been
obtained from either the input file
by reading or from a macro body by retrieval from the
array of characters.
In most cases it will have been a character that
is to be passed on to the parser for further processing.
However, it could also be a ! to signal
one or more macro definitions to follow,
or it could be an uppercase letter to signal
a macro to be called.
For the definitions,
a series of uppercase letters is expected, each followed by =
and then a macro body enclosed in an arbitrary quote.
For a macro call,
a check has to be made that the uppercase letter seen
does indeed denote a defined macro,
so for every uppercase letter there
are two integer variables denoting the start and finish
of that macro in the string space.
If the finish is less than the start, this would indicate
that the macro is not defined.
Furthermore, a check has to be made
that the call is not directly or indirectly recursive,
so a set variable is consulted to determine
whether the macro is already active.
If these two tests have been passed,
the details of the macro being called are
pushed onto the macro call stack.
These details are the letter which is the name
of the macro, and the start and finish of the body
of the macro in the string space.
The scanner now starts to extract from
the string space.
The LR1 parser:
This part of the program replaces the recursive descent parser
with a (non-recursive) bottom up parser as suggested in
an exercise in Chapter 6.
The input language presented to the parser
after it has been processed by the macro mechanism
is still extremely simple.
Hence it becomes possible to use a very simple parsing stack
containing just characters.
Some of the characters are just input characters ---
lowercase letters, the operators and parentheses.
In reduction steps some items on the stack are replaced by
uppercase letters representing non-terminals that have been recognised:
F for factor, T for term, E for expression
and B for Boolean formula.
In the demonstration,
the last formula was being parsed with tracing switched on by the ?.
The following lines then represent, on the left,
the stack growing to the right,
then after a few spaces, the current input character enclosed
in double quotes
and finally the shift or reduce action
to be executed at this configuration.
The LR1 parser works as follows:
Since it is not recursive,
the entire code can be made part of the main program,
at the position where a recursive descent parser
would contain a call to read a formula
for translation into internal code.
Essentially the parser consists of a REPEAT loop
containing a CASE statement which dispatches
on the kind of item that is on the top of the stack.
The exact action may also depend on the current input character
and on the second or even third item on the stack.
There are three possible actions:
1. to shift the input character onto the stack,
2. to reduce the stack by replacing some of the items on the top
of the stack by something else,
or 3. to report an error and abandon the processing of the current formula.
The details of the CASE statement are essentially derived
from the grammar,
but the derivation is tedious and error prone.
For example,
the productions
The internal code:
Code generation is handled entirely
during reduction steps,
and the code generating procedure can even be hidden
inside the reduction procedure.
The code that has to be generated is again
binary tree code.
But whereas previous programs used
an ARRAY of RECORDs of operators and two integer indices
into that same array,
for variety this program uses Pascal pointers.
So in the main declaration section
the type of such pointers has to be declared
together with the type of the records to which they point.
One solitary global pointer variable
is declared and always made to point to the last
formula tree that has been generated.
The actual code generation procedure
is very similar to the corresponding ones
in earlier programs;
the main difference is that the left and right
fields are not integers but pointers,
and instead of incrementing
an index in the code array
to obtain a new record,
the Pascal procedure new is called.
It is also necessary to obtain the correct linkage
of the various nodes.
In a recursive descent translator
it was possible to use local variables
on the recursion stack for this purpose.
However, since we are using an explicit stack
for the parsing,
the saving of earlier nodes has to done
on that stack, too.
Actually, the technique is similar
to the one used in the non-recursive top down predictive parser
used in Chapter 10.
We augment the explicit parsing stack
with additional fields containing the last code
generated in reduction steps.
Then, for example production 2
discussed above,
when popping T & F
will generate a node with & as the operator,
and what was saved beside the T as the left
and what was saved beside the F as the right.
Then the E is pushed as required by the production,
together with a pointer to the node just generated.
In previous programs all node records of binary trees
were automatically recycled when a new formula was being read,
because the integer index into the code array
was always set to zero at the start of the main loop.
This method is not possible when one is using pointers.
Instead it is necessary to reclaim the space used
by disposing of the tree properly.
This is done by a recursive procedure
which takes a tree as a parameter.
If the tree is not nil,
then it calls itself recursively on the left and the right
subtree and then uses Pascal's dispose
to recycle the record.
The use of pointers has one advantage: it is no longer necessary to set an arbitrary limit to the amount of internal code that can be generated. But there is also a disadvantage: it is not possible to write pointer values, hence the code that has been generated cannot be checked so easily - essentially one has to get it right without checking. Also, it is necessary to (recursively) dispose of the tree when it is no longer needed. It may be worth repeating that the use of pointers for the binary tree code has nothing to do with whether the translating mechanism uses recursive descent or a non-recursive predictive top down method, or, as here, a bottom up method.
The "trunk before branch" optimisation: When a formula has been read successfully, the internal code that has been generated is passed to the semantic tableau interpreter. While the overall structure is the same as for the one in Chapter 10, there are several differences.
The first difference is quite minor:
There are no longer two procedures,
one for making a formula true and one for making a formula false.
Instead there is only one procedure make
with an extra Boolean parameter indicating whether
the formula is to be made true or false.
Secondly, the two set parameters which contain
the formulas that have been made true or false
have been replaced by two global variables.
This requires a slightly more elaborate treatment
of atoms.
If the atomic formula that is to be made true or false
already has the opposite truth value,
then "the path closes", and the procedure does nothing.
Otherwise the path can be continued,
and there are two possibilities that arise:
1. the atom already has the required truth value,
and in that case the procedure calls its continuation, or
2. the atom does not yet have the value required,
and in that case the set representing the value
has to be augmented by the atom,
the continuation called,
and when that returns the atom has to be removed again.
Procedure show then uses these two globals
to display the model that it has constructed.
But the most difficult difference concerns the "trunk before branch" optimisation. Probably you should not even attempt to understand this unless you have done a few semantic tableaux by hand and have learnt to appreciate the sometimes quite spectacular saving in effort which the optimisation produces. The idea behind the computer implementation is quite simple: whenever the procedure which makes formulas true or false comes across a formula that could be made true or false in two ways, it delays this task and first continues with other tasks. This happens for example when a disjunction is to be made true or a conjunction is to be made false. The other tasks are of course the continuations that have accumulated so far. So, in a situation like that, the continuation procedure is called, but it now has a procedure parameter which is a further continuation --- namely the handling of the formula that was to be delayed.
The delay procedure has no parameters.
Its body is essentially a CASE statement
which looks at the formula which was the parameter
of the call to the make procedure which gave rise
to this formula being delayed and now being executed.
The individual cases are exactly what they would
have been if the formula had been processed when seen the first time.
To take as an example a disjunction,
here is how it will be processed by procedure make:
delay procedure is ever called,
then the following part of its body becomes operative:
make which originally
gave rise to this disjunction being delayed.
The source is as follows: