From Professor Daniel Leblanc’s Fall 2015 CS311 course: External link
Overview
In this project you will be writing a program to read in a grammar and a string and determine if the grammar can generate that string. Unlike with project 1 where any DFA was a possible input, you will only need to be able to deal with specific grammars. You may use any programming language you choose.
Your submission should contain your completed working code and a 1-2 page, not including sample inputs/outputs, writeup describing your work. The writeup must be submitted as a PDF.
For this part of the project you need to find a way to represent a Context-Free Grammar as a file so that your program can read that file in part 2. For this project we will be restricting ourselves to a very small subset of the possible grammars that can exist. There are two main criteria the grammars need to have.
All rules in the grammar must have a terminal as the first symbol on the right hand side of the rule.
No variable in the grammar can have two rules with the same terminal as the first symbol on
the right hand side. For full credit you must encode the following languages and at least one additional language of your choosing.
{| }
{ | *}
{ | and }
If you are having trouble creating grammars that obey the described criteria feel free to contact me or talk with your classmates. The encoding must be your own work, but I won’t be as strict on the grammars.
Once you have created the grammars you need to read them into some data structure of your choice. Read the next section to see how the grammars will be used to parse a string. The process will be similar to the pushdown automatons we discussed in class. You will also need to implement a way for the user to enter a string to be parsed. A simple command-line interface is what I recommend, but you are welcome to make another choice if you have other ideas.
To parse a string you will need to maintain a stack and an input buffer. Begin by pushing the start variable onto the stack. Then repeat the following until you empty the input buffer.
Pop an element from the stack.
If the popped element is a variable, look at the next character in the input and use that to determine which rule to apply. Then push the right hand side of that rule onto the stack. If no rule matches the next input reject the string.
If the popped element is a terminal, make sure it matches the next character in the input and remove both. If it does not match reject the string.
If the stack is empty and the input buffer is not, reject the string.
If both the stack and the input buffer are empty, accept the string.
Once you have finished the program you will need to perform some thorough tests. You may assume that the grammar files are in whatever format you specify. Be sure to test a wide variety of input strings for each grammar. At minimum you need to test each possible accepting or rejecting path, as well as any corner cases that may exist for you language. A good strategy is to manually generate a small set of test cases for each grammar and then randomly generate some additional string that can be used. You may submit the tests in their own file or at the end of your writeup.
Your writeup should walk through each of the above steps and explain why you took the approach
you did. I would also like answers to the following questions.
Why were those particular restrictions placed on the grammars?
If those restrictions weren’t there, how would your program need to change?
What would happen if the grammar were ambiguous?
What other approaches might be used to tell if a string can be generated by a given grammar?