Generate Symbol Table From File in C++

7.2k views Asked by At

So I'm trying to generate a symbol table from an input file that contains C-style nested blocks like this in C++;

A: { int a; float b;
B: { float c; int d;
C: { int b; int c;
}
}
D: { float a;
}
}

The output should look like this.

A: a -> <int, A>
b -> <float, A>
B: a -> <int, A>
b -> <float, A>
c -> <float, B>
d -> <int, B>
C: a -> <int, A>
b -> <int, C> -> <float, A>
c -> <int C> -> <float, B>
d -> <int, local to B>
D: a -> <float D> -> <int, A>
b -> <float, A>

I'v tried so many things. Using vectors, maps and now finally I have decided to use multimaps. No matter what I do I come down to the same problem so it probably has nothing to do with the data-structure I choose.

The issue is that because I'm reading line by line I end up cout-ing more than what I need to. But if I don't have it cout/iterate the multimaps in the for loop for each line than I'd iterate after they have been erased/popped. I'm not sure what to do logic wise to get the output to display as it should or if I'm even on the right track.

Here is my .cpp file so far. Ignore comments as they were past attempts that I have opted out of using for the moment. Also in this version I'm not making use of vectors so you can ignore vector related code. I"m just using multimaps now.

#include<iostream>
#include<fstream>
#include<string>
#include <sstream> 
#include <map>
#include <vector>
#include <algorithm>
using namespace std;

void ReadFromFile();
void main(){

    ReadFromFile();
    cin.get();
}

void ReadFromFile(){

    stringstream ss;
    string type = "";
    string var = "";
    string lable = "";
    string Obraket = "";
    string Cbraket = "";
    int braketCount = -1;

    ifstream myfile("input1.txt");
    multimap<string, string> symbol;
    multimap<string, multimap<string, string>> symbolL;
    if (myfile.is_open())
    {
        for (string line; getline(myfile, line);)
        {

            istringstream in(line);
            if (in.str().find("}") == string::npos && in.str().find("{") != string::npos){

                in >> lable;
                in >> Obraket;

                braketCount++;
                cout << Obraket << endl;
                in >> type;
                in >> var;
                symbol.insert(pair<string, string>(var.substr(0, 1), type));

                if (in.str().find("float") != string::npos || in.str().find("int") != string::npos){

                    var = "";
                    type = "";
                    in >> type;
                    in >> var;
                    if (type.length() > 1){
                        symbol.insert(pair<string, string>(var.substr(0, 1), type));
                    }
                }

                symbolL.insert( pair<string, multimap<string, string>>(lable,symbol));

                    for (multimap<string, multimap<string, string>>::iterator it = symbolL.begin(); it != symbolL.end(); ++it){
                    cout << it->first;
                    for (multimap<string, string>::iterator it2 = symbol.begin(); it2 != symbol.end(); ++it2){
                        cout << it2->first << "-> " << "<" << it2->second << ">, " << it->first.substr(0, 1) << endl;
                    }
                    }
            }
            else if (in.str().find("}") != string::npos){
                in >> Cbraket;
                //braketCount--;
                cout << Cbraket << endl;
                symbolL.erase(prev(symbolL.end()));

                //symbol.erase(prev(symbol.end()));
            }

        }

        myfile.close();
    }
    else cout << "Unable to open file";



}

This is the output I get.

{
A:a-> <int>, A
b-> <float>, A
{
A:a-> <int>, A
b-> <float>, A
c-> <float>, A
d-> <int>, A
B:a-> <int>, B
b-> <float>, B
c-> <float>, B
d-> <int>, B
{
A:a-> <int>, A
b-> <float>, A
b-> <int>, A
c-> <float>, A
c-> <int>, A
d-> <int>, A
B:a-> <int>, B
b-> <float>, B
b-> <int>, B
c-> <float>, B
c-> <int>, B
d-> <int>, B
C:a-> <int>, C
b-> <float>, C
b-> <int>, C
c-> <float>, C
c-> <int>, C
d-> <int>, C
}
}
{
A:a-> <int>, A
a-> <float>, A
b-> <float>, A
b-> <int>, A
c-> <float>, A
c-> <int>, A
d-> <int>, A
D:a-> <int>, D
a-> <float>, D
b-> <float>, D
b-> <int>, D
c-> <float>, D
c-> <int>, D
d-> <int>, D
}
}
2

There are 2 answers

2
ch0kee On

Here it is!

If you accept an advice, start reading the main function instead of the types on the top.

I didn't need multimap. Instead of copying the parent block's variables, I only reference the parent container with its index. Just before printing there is a traversal to the top-most block, and it collects all the variables visible in the current block.

#include <algorithm>
#include <cassert>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <map>
#include <stack>
#include <string>
#include <vector>

using namespace std;

typedef string type_t; //type of variable 
typedef string variable_t; //name of variable
typedef string class_t; //name of 'class' (container)

const int NO_PARENT = -1; //top-most

typedef vector<class ClassSymbolTable> symbols_t; //we use vector to preserve order

// main class, it stores symbols of a single class, and references its parent
class ClassSymbolTable {
  class_t _name; //class name
  map<variable_t, type_t> _types; //map of variable types
  vector<variable_t> _variables; //we use this vector to preserve order
  symbols_t& _symbols; //reference to the symbol table
  int _parent_index = NO_PARENT; //reference to parent index in symbol vector


  //!! parent class, nullptr if top-level
  ClassSymbolTable* parent() const { return _parent_index != NO_PARENT ? &_symbols[_parent_index] : nullptr; }

  // does this class directly declares var ?
  bool declares_variable(const variable_t& var) const {
    return _types.find(var) != _types.end();
  }

  // print variable info in desired format
  void print_variable(const variable_t& var) {
    if (declares_variable(var)) {
      cout << " -> <" << _types[var] << ", " << _name << ">";
    }
    if (parent()) {
      parent()->print_variable(var);
    }
  }

  // traverse classes up to top-level and collect variables in order
  void collect_variables_to_print(vector<variable_t>& vars) {
    if (ClassSymbolTable* p = parent()) {
      p->collect_variables_to_print(vars);
      // add variables defined on this level
      vector<variable_t> add_vars;
      for (size_t i = 0; i < _variables.size(); ++i) {
        if (find(vars.begin(), vars.end(), _variables[i]) == vars.end()) {
          // defined on this level
          add_vars.push_back(_variables[i]);
        }
      }
      vars.insert(vars.end(), add_vars.begin(), add_vars.end());
    }
    else {
      //top-level
      vars = _variables;
    }
  }

  // get depth for indentation
  int get_depth() const {
    int depth = 0;
    for (ClassSymbolTable* p = parent(); p; p = p->parent()) {
      ++depth;
    }
    return depth;
  }


  static size_t s_max_class_name_length; //for printing
public:
  // ctor
  ClassSymbolTable(const string& name, int parent_index, symbols_t& symbols)
    : _name(name), _parent_index(parent_index), _symbols(symbols)
  {
    s_max_class_name_length = max(s_max_class_name_length, name.length());
  }

  // add variable
  void add(const variable_t& var, const type_t& type) {
    _variables.push_back(var);
    _types[var] = type;
  }

  // print this class' vars in desired format
  void print() {
    cout.fill(' ');
    const int indent = get_depth() + s_max_class_name_length + 3 /*for ':' */;

    vector<variable_t> vars;
    collect_variables_to_print(vars);

    // print class name
    string classname = _name + ": ";

    cout.fill(' ');
    cout.width(indent);
    cout << classname;

    // print vars
    cout.width(0);
    cout << vars[0];
    print_variable(vars[0]);
    cout << endl;

    for (size_t i = 1; i < vars.size(); ++i) {
      cout.width(indent);
      cout << ' '; //pad before

      cout.width(0);
      cout << vars[i];
      print_variable(vars[i]);
      cout << endl;
    }
    cout.width(0);
  }


};

size_t ClassSymbolTable::s_max_class_name_length = 0;


int main(int argc, char* argv[])
{
  ifstream in("input1.txt");
  assert(in.is_open());

  symbols_t symbols;

  //collect symbols
  const char* delimiters = ":;{}";
  vector<string> current_tokens;
  string buffer;
  stack<int> class_stack; //to manage class hierarchy, we stack the classes' index in the symbol vector
  class_stack.push(NO_PARENT); //so we dont have to branch at first level
  while (in >> buffer) {
    size_t found = buffer.find_first_of(delimiters);
    current_tokens.push_back(buffer.substr(0, found)); //either whole or until delimiter

    if (found != string::npos) { //delimiter found
      char delimiter = buffer[found];
      switch (delimiter) {
      case ':': //class name
        assert(current_tokens.size() == 1);
        {
          // add new class symbol table and refer to parent class
          symbols.emplace_back(current_tokens[0], class_stack.top(), symbols);
          // we rather store index in '{' for symmetric code
        }
        break;
      case '{': //block open
        assert(!symbols.empty());
        {
          class_stack.push(symbols.size()-1); //stack the index for nested classes
        }
        break;
      case '}': //block close
        assert(!class_stack.empty());
        {
          class_stack.pop(); //done with this class
        }
        break;
      case ';': //variable
        assert(!symbols.empty());
        assert(current_tokens.size() == 2);
        {
          // add variable to the current class symbol table
          ClassSymbolTable& current_class = symbols.back();
          current_class.add(current_tokens[1], current_tokens[0]);
        }
        break;
      }

      //put back the remaining characters
      current_tokens.clear();
      if (found < buffer.size() - 1) {
        current_tokens.push_back(buffer.substr(found + 1));
      }

    }

  }
  assert(class_stack.size() == 1 && class_stack.top() == NO_PARENT); //just to be sure

  for (ClassSymbolTable& c : symbols) {
    c.print();
  }

  cout << "." << endl;
  return 0;
}

It can be optimized to avoid lookup during printing, and you can also avoid storing the symbols if you only want to print them. You can store local variables here and there, but the main idea will be the same. And yes, I'm using yet another container to manage nesting, a stack :)

Only by using multimap, your variables will be shuffled. Somehow, you have to keep track of the order.

I am using vectors to do that.

(If you can't compile C++11, just replace the range-based for loop at the very end of main)

0
Some programmer dude On

I would suggest a structure (i.e. a struct or a class) for the top level, have a std::map of those top-level structure. Then each structure in turn contains a std::map for the contained symbols, again with a structure that contains, among other things, the type of the symbol.

Something as simple as this:

struct LocalSymbol
{
    std::string name;
    enum
    {
        FLOAT,
        INT
    } type;
    // Possibly other information needed for local symbols
};

struct GlobalSymbol
{
    std::string name;
    // Possibly other information needed for global symbols
    std::map<std::string, LocalSymbol> locals;
}

std::map<std::string, GlobalSymbol> globals;

This will very easily give you the nested structure you seem to want, as well as keeping all keeping related data tightly together into smaller structures.

Your other big problem seems to be parsing, and I suggest you read more about compilers and parsing, and try to implement a more traditional lexer-parser kind of parser, where you split up the input handling and parsing into two components. If you want to hand-code the parser-part, I suggest a recursive descent style parser which will make it very easy to handle scoping and levels.