How can I parse ASCII Art to HTML using Java or Javascript?

3.1k views Asked by At

I saw that the Neo4j API uses ASCII Art very cleverly with its API:

http://jaxenter.com/getting-started-with-neo4j-the-java-graph-database-47955.html

I want to try something similar, but with ASCI Art to HTML. How can ASCII art be parsed, so for example, given an ASCII Art input something like:

--------------------------------
I                              I
I   -------          -------   I
I   I     I          I     I   I
I   I  A  I          I  B  I   I
I   I     I          I     I   I
I   -------          -------   I
I                              I
I                              I
--------------------------------

: could result in HTML output something like:

<div>
    <div style='display:inline;'>
             A
    </div>
    <div style='display:inline;'>
             B
    </div>
</div>

Update

The question was closed citing that I need to "demonstrate a minimal understanding of the problem being solved.". I do have an understanding of the problem to be solved. The problem is that I want to solve is to make templated HTML easier to understand in source code for the following web framework:

https://github.com/zubairq/coils

: although the solution could be applied to any web framework. I have since seen someone attempt to make an initial version in C++ here:

https://github.com/h3nr1x/asciidivs2html/blob/master/asciidivs2html.cpp

: very impressive! If you can get it to work in Java or Clojure then if we can get the question reopened I will nominate a bounty so you can get more points for the solution :)

I ran the Java solution provided by @meewok and here is the result:

$ java AsciiToDIVs.RunConverter
Created a box(ID=0,X=0,Y=0,width=33,height=10)
Created a box(ID=1,X=2,Y=4,width=8,height=5,parent=0)
Created a char(Char=A,X=4,Y=7,parent=1)
Created a box(ID=2,X=2,Y=21,width=8,height=5,parent=0)
Created a char(Char=B,X=4,Y=24,parent=2)
<div><div><div>A</div></div><div><div>B</div></div></div>
3

There are 3 answers

5
Menelaos On BEST ANSWER

Methodology

A solution to implement is the following:

  • create an in memory 2D array (array of arrays) which is similar to a chessboard.

Then i will create an algorith that when it detects "-" characters, i initialize acall to a method to detect the remaining corners ( top right, bottom left, bottom right) following the characters and where they end.

Example ( quick pseudocode ):

while(selectedCell==I) selectedCell=selectedCell.goDown();

Using such a strategy you can map out your boxes and which boxes are contained within which.

Remaining would be to print this info as html..

Quick and Dirty Implementation

Since I was in the mood I spent an hour+ to quickly cook up a toy implementation. The below is non-optimized in respect to that I do not make use of Iterators to go over Cells, and would need refactoring to become a serious framework.

Cell.java


package AsciiToDIVs;

public class Cell {
    public char Character;
    public CellGrid parentGrid;
    private int rowIndex;
    private int colIndex;

    public Cell(char Character, CellGrid parent, int rowIndex, int colIndex)
    {
        this.Character = Character;
        this.parentGrid = parent;
        this.rowIndex = rowIndex;
        this.colIndex = colIndex;
    }

    public int getRowIndex() {
        return rowIndex;
    }

    public int getColIndex() {
        return colIndex;
    }
}

CellGrid.java


package AsciiToDIVs;

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Iterator;

public class CellGrid {

    private ArrayList<ArrayList<Cell>> CellGridData;

    public CellGrid(String asciiFile) throws IOException {
        readDataFile(asciiFile);
    }

    public ArrayList<FoundObject> findBoxes(FoundBoxObject parent)
    {

        int startRowIndex = 0, startColIndex = 0, 
                parentRowLimit = Integer.MAX_VALUE, 
                parentColLimit = Integer.MAX_VALUE,
                startingColIndex = 0;
        if(parent != null)
        {
            startRowIndex = parent.getRowIndex()+1;
            startColIndex = startingColIndex =  parent.getColIndex()+1;
            parentRowLimit = parent.getRowIndex() + parent.getHeight();
            parentColLimit = parent.getColIndex() + parent.getWidth();
        }

        ArrayList<FoundObject> results = new ArrayList<FoundObject>();

        Cell currentCell;

        if(startRowIndex>=CellGridData.size())
        return null;        

        for(; startRowIndex<CellGridData.size() && startRowIndex<parentRowLimit; startRowIndex++ )
        {
            startColIndex = startingColIndex;

            for(; startColIndex< CellGridData.get(startRowIndex).size() && startColIndex<parentColLimit; startColIndex++)
            {           
                FoundBoxObject withinBox = checkWithinFoundBoxObject(results, startRowIndex, startColIndex);

                if(withinBox !=null)
                startColIndex+=withinBox.getWidth();

                currentCell = getCell(startRowIndex, startColIndex);

                if(currentCell!=null)
                {
                    if(currentCell.Character == '-') // Found a TOP-CORNER
                    {
                        int boxHeight =  getConsecutiveIs(startRowIndex+1, startColIndex) + 1;
                        if(boxHeight>1)
                        {
                            int boxWidth = getConsecutiveDashes(startRowIndex, startColIndex);

                            FoundBoxObject box = new FoundBoxObject(startRowIndex, startColIndex, boxWidth, boxHeight, parent);
                            results.add(box);
                            findBoxes(box);

                            startColIndex+=boxWidth;                            
                        }                   
                    }

                    //This is a character
                    else if(currentCell.Character != '-' && currentCell.Character != 'I' && currentCell.Character != ' ' 
                            && currentCell.Character != '\n' && currentCell.Character != '\n' && currentCell.Character != '\t')
                    {
                        FoundCharObject Char = new FoundCharObject(startRowIndex, startColIndex, parent,  currentCell.Character);
                        results.add(Char);
                    }
                }
            }       
        }

        if(parent!=null)
        parent.containedObjects = results;

        return results;     
    }

    public static String printDIV(ArrayList<FoundObject> objects)
    {
        String result = "";
        Iterator<FoundObject> it = objects.iterator();
        FoundObject fo;

        while(it.hasNext())
        {
            result+="<div>";

            fo = it.next();

            if(fo instanceof FoundCharObject)
            {
                FoundCharObject fc = (FoundCharObject)fo;
                result+=fc.getChar();
            }

            if(fo instanceof FoundBoxObject)
            {
                FoundBoxObject fb = (FoundBoxObject)fo;
                result+=printDIV(fb.containedObjects);
            }

            result+="</div>";
        }

        return result;
    }

    private FoundBoxObject checkWithinFoundBoxObject(ArrayList<FoundObject> results, int rowIndex, int colIndex)
    {
        Iterator<FoundObject> it = results.iterator();
        FoundObject f;
        FoundBoxObject fbox = null;
        while(it.hasNext())
        {
            f = it.next();

            if(f instanceof FoundBoxObject)
            {
                fbox = (FoundBoxObject) f;

                if(rowIndex >= fbox.getRowIndex() && rowIndex <= fbox.getRowIndex() + fbox.getHeight())
                {
                    if(colIndex >= fbox.getColIndex() && colIndex <= fbox.getColIndex() + fbox.getWidth())
                    {
                        return fbox;
                    }
                }
            }
        }

        return null;
    }

    private int getConsecutiveDashes(int startRowIndex, int startColIndex)
    {
        int counter = 0;
        Cell cell = getCell(startRowIndex, startColIndex);

        while( cell!=null && cell.Character =='-')
        {
            counter++;
            cell = getCell(startRowIndex, startColIndex++);
        }

        return counter;

    }

    private int getConsecutiveIs(int startRowIndex, int startColIndex)
    {
        int counter = 0;
        Cell cell = getCell(startRowIndex, startColIndex);

        while( cell!=null && cell.Character =='I')
        {
            counter++;
            cell = getCell(startRowIndex++, startColIndex);
        }

        return counter;
    }

    public Cell getCell(int rowIndex, int columnIndex)
    {
        ArrayList<Cell> row;


        if(rowIndex<CellGridData.size())
        row = CellGridData.get(rowIndex);
        else return null;

        Cell cell = null;

        if(row!=null){
            if(columnIndex<row.size())
            cell = row.get(columnIndex);
        }

        return cell;
    }


    public Iterator<ArrayList<Cell>> getRowGridIterator(int StartRow) {
        Iterator<ArrayList<Cell>> itRow = CellGridData.iterator();

        int CurrentRow = 0;

        while (itRow.hasNext()) {
            // Itrate to Row
            if (CurrentRow++ < StartRow)
                itRow.next();

        }
        return itRow;
    }

    private void readDataFile(String asciiFile) throws IOException {
        CellGridData = new ArrayList<ArrayList<Cell>>();
        ArrayList<Cell> row;

        FileInputStream fstream = new FileInputStream(asciiFile);
        BufferedReader br = new BufferedReader(new InputStreamReader(fstream));

        String strLine;

        // Read File Line By Line
        int rowIndex = 0;
        while ((strLine = br.readLine()) != null) {
            CellGridData.add(row = new ArrayList<Cell>());
            // System.out.println (strLine);
            for (int colIndex = 0; colIndex < strLine.length(); colIndex++) {
                row.add(new Cell(strLine.charAt(colIndex), this, rowIndex,colIndex));
                // System.out.print(strLine.charAt(i));
            }
            rowIndex++;
            // System.out.println();
        }

        // Close the input stream
        br.close();
    }

    public String printGrid() {
        String result = "";

        Iterator<ArrayList<Cell>> itRow = CellGridData.iterator();
        Iterator<Cell> itCol;
        Cell cell;

        while (itRow.hasNext()) {
            itCol = itRow.next().iterator();

            while (itCol.hasNext()) {
                cell = itCol.next();
                result += cell.Character;
            }
            result += "\n";
        }

        return result;
    }

}

FoundBoxObject.java


package AsciiToDIVs;

import java.util.ArrayList;

public class FoundBoxObject extends FoundObject {
    public ArrayList<FoundObject> containedObjects = new ArrayList<FoundObject>();
    public static int boxCounter = 0;

    public final int ID = boxCounter++;

    public FoundBoxObject(int rowIndex, int colIndex, int width, int height, FoundBoxObject parent) {
        super(rowIndex, colIndex, width, height);

        if(parent!=null)
        System.out.println("Created a box(" +
                "ID="+ID+
                ",X="+rowIndex+
                ",Y="+colIndex+
                ",width="+width+
                ",height="+height+
                ",parent="+parent.ID+")");
        else
            System.out.println("Created a box(" +
                    "ID="+ID+
                    ",X="+rowIndex+
                    ",Y="+colIndex+
                    ",width="+width+
                    ",height="+height+
                    ")");   
    }

}

FoundCharObject.java


package AsciiToDIVs;

public class FoundCharObject extends FoundObject {
private Character Char;

public FoundCharObject(int rowIndex, int colIndex,FoundBoxObject parent, char Char) {
    super(rowIndex, colIndex, 1, 1);

    if(parent!=null)
    System.out.println("Created a char(" +
            "Char="+Char+
            ",X="+rowIndex+
            ",Y="+colIndex+
            ",parent="+parent.ID+")");
    else
        System.out.println("Created a char(" +
                ",X="+rowIndex+
                ",Y="+colIndex+")");

    this.Char = Char;
}

public Character getChar() {
    return Char;
}
}

FoundObject.java


package AsciiToDIVs;

public class FoundObject {

    private int rowIndex;
    private int colIndex;
    private int width = 0;
    private int height = 0;

    public FoundObject(int rowIndex, int colIndex, int width, int height )
    {
        this.rowIndex = rowIndex;
        this.colIndex = colIndex;
        this.width = width;
        this.height = height;
    }

    public int getRowIndex() {
        return rowIndex;
    }

    public int getColIndex() {
        return colIndex;
    }

    public int getWidth() {
        return width;
    }

    public int getHeight() {
        return height;
    }
}

Main Method


public static void main(String args[])
    {
        try {
            CellGrid grid = new CellGrid("ascii.txt");
            System.out.println(CellGrid.printDIV(grid.findBoxes(null)));
            //System.out.println(grid.printGrid());
        } catch (IOException e) {
            e.printStackTrace();
        }       
    }   

Update

The 'printDIV' should be like this (more '' were being printed than needed).

public static String printDIV(ArrayList<FoundObject> objects)
    {
        String result = "";
        Iterator<FoundObject> it = objects.iterator();
        FoundObject fo;

        while(it.hasNext())
        {
            fo = it.next();

            if(fo instanceof FoundCharObject)
            {
                FoundCharObject fc = (FoundCharObject)fo;
                result+=fc.getChar();
            }

            if(fo instanceof FoundBoxObject)
            {
                result+="<div>";
                FoundBoxObject fb = (FoundBoxObject)fo;
                result+=printDIV(fb.containedObjects);
                result+="</div>";
            }           
        }

        return result;
    }
3
Ira Baxter On

What you want is the idea of 2-dimensional parsing, which detects 2D entities and verifies they have legitimate relationships.

See http://mmi.tudelft.nl/pub/siska/TSD%202DVisLangGrammar.pdf

What will be difficult is defining the sets of possible "ASCII Art" constraints. Do only want to to recognize letters? Made only of the same-letter characters? "cursive" letters? boxes? (Your example has boxes whose sides aren't made of the same ASCII character). Boxes with arbitrary thick walls? Nested boxes? Diagrams with (thin/fat) arrows? Kilroy-was-here-nose-over-the-wall? Pictures of Mona Lisa in which character pixels provide density relations? What exactly do you mean by "ASCII art"?

The real problem is defining the range of things you intend to recognize. If you limit that range, your odds of success go way up (see the referenced paper).

The problem here has little to to do specifically with Java or Javascript. This is far more related to algorithms. Pick a limited class of art, choose the right algorithms, and then what you have is a coding problem which should be relatively easy to solve. No limits, no algorithms --> no amount of Javascript will save you.

1
Michael Dyck On

Here's a fairly simple solution in JavaScript, tested via Node. Of course, you'll need to adjust the input and output methods.

var s = "\n\
--------------------------------\n\
I                              I\n\
I   -------          -------   I\n\
I   I     I          I     I   I\n\
I   I  A  I          I  B  I   I\n\
I   I     I          I     I   I\n\
I   -------          -------   I\n\
I                              I\n\
I                              I\n\
--------------------------------\n\
";

var lines = s.split('\n');

var outer_box_top_re = /--+/g;

var i;
for (i=0; i<lines.length; i++) {
    while ((res = outer_box_top_re.exec(lines[i])) != null) {
        L = res.index
        R = outer_box_top_re.lastIndex
        process_box(i, L, R)
    }
}

function process_box(T, L, R) {
    console.log('<div top="' + T + '" left="' + L + '" right="' + R + '">')
    blank_out(T, L, R)

    var i = T;
    while (1) {
        i += 1;
        if (i >= lines.length) {
            console.log('Fell off bottom of ascii-art without finding bottom of box');
            process.exit(1);
        }

        var line = lines[i];

        if (line[L] == 'I' && line[R-1] == 'I') {
            // interior

            // Look for (the tops of) sub-boxes.
            // (between L+1 and R-2)
            var inner_box_top_re = /--+/g;
            // Inner and outer need to be separate so that
            // inner doesn't stomp on outer's lastIndex.
            inner_box_top_re.lastIndex = L+1;
            while ((res = inner_box_top_re.exec(lines[i])) != null) {
                sub_L = res.index;
                sub_R = inner_box_top_re.lastIndex;
                if (sub_L > R-1) { break; }
                process_box(i, sub_L, sub_R);
            }

            // Look for any other content (i.e., a box label)
            content = lines[i].substring(L+1, R-1);
            if (content.search(/[^ ]/) != -1) {
                console.log(content);
            }

            blank_out(i, L, R);
        }
        else if (line.substring(L,R).match(/^-+$/)) {
            // bottom
            blank_out(i, L, R);
            break;
        }
        else {
            console.log("line " + i + " doesn't contain a valid continuation of the box");
            process.exit(1)
        }
    }

    console.log('</div>')
}

function blank_out(i, L, R) {
    lines[i] = (
          lines[i].substring(0,L)
        + lines[i].substring(L,R).replace(/./g, ' ')
        + lines[i].substring(R)
    );
}