Complexity of IDE error detection and auto-completion dependent upon language syntax?

205 views Asked by At

Are fewer checks/less rigorous code analysis required to provide development environment error feedback and auto completion for programming languages that are composed largely of human-readable phrases and words (i.e. Python, VB.NET)? This is in contrast to C-style languages, that depend more upon symbols and punctuation for code structure.

1

There are 1 answers

2
Ira Baxter On BEST ANSWER

I have experience/am responsible for building dozens of language front ends.

Wordy languages vs. punctuationy languages are generally equally hard to parse and statically analyze.

The folks that define languages of either kind have either been decorating them for decades (e.g., COBOL since 1958), or building sophisticated languages (C++, Scala, Ruby) with both complex syntax and complex name resolution and type inference rules; the compiler vendors then proceed to add obscure syntax to support the strange things they do or to provide a customer lock (e.g., MS "managed C++", DLL declarations, etc.). There's the third problem of lousy definitions; the top languages may have precise rules about how they work, but many languages have sloppy definitions (e.g., PHP) which creates dark corner cases that have to be ironed out by painful experimentation with the actual implementation.

C++ has been our worst, esp. with the C++11 committee making a massive recent mess of things. We have full C++ parsers, but are still working on full name resolution for C++11 on top of our C++98 implementation. (The name resolution code is some 250,000 lines of code and its not enough!).

IBM COBOL is a close second; the language is just giant, and there are all sorts of funny name resolution rules ("an unqualified name can refer to a particular name without qualification if the reference is unambiguous" So, is this name an unambiguous reference in this context?).

Once you get past parsing and name/type resolution, then you get into control flow, data flow, points-to analysis, range anlaysis, call graph construction, ... which are generally about the same amount of effort as the earlier phases; we get away with less by having really good libraries that support these tasks.

With all this as background analyses, you can start to do "static analyis" of the smart kind that people want.

Another poster noted that recovering from syntax errors and (emphasis) "continue to generate meaningful error messages". All I can say to this is "Amen, brother". See this SO answer https://stackoverflow.com/a/6657974/120163 for a discussion of what goes wrong when you have "partial programs", which is essentially what you get when syntax error repairs guess at a fix.