Accessing an "out-of-bounds" index in an interpreted versus a compiled language

103 views Asked by At

What is the difference between accessing an out-of-bounds (negative, or otherwise inaccessible) index in a compiled programming language (such as C) versus an interpreted language (such as MATLAB)?

As per the recommendation of this site, I have researched a number of threads concerning the accessing of out-of-bounds indices. Most of these threads, however, only focus on resolving an issue with source. That said, I have was able to garner from this site that accessing an out-of-bounds index while using C results in undefined behavior. Through experimentation using MATLAB, it is my guess that interpreted languages perform tests to determine if an index should be inaccessible and "catch" poorly-written code before out-of-bounds indexes are accessed. Is this actually the case with interpreted languages in general, or do they, similar to the C (compiled) language, cause a level of undefined behavior to occur? Does the accessing of an out-of-bounds index within the program of any compiled language cause undefined behavior?

1

There are 1 answers

0
Gil On

Some languages leave it as implementation "details" and others clearly specify what behavior is expected... but this has changed over time for several programming languages.

Regarding C, it is perfectly legitimate (and useful) to use negative indexes in an array, even if that may lead sometimes to crashes or code/data corruption (intended or not) because C tries not to limit your capabilities as a programmer. If you know how the C language is implemented then there's not that much incertainty about what will happen with mis-addressed stack-based or malloc-based memory blocks. C compilers may issue warnings during compilation to help preventing errors (unintented negative array indexes).

Other languages decide for the programmer and try to block these actions, either at compilation time (PASCAL is a good old example) or at execution time (JIT, VMs, etc.). There is no general rule unless the language specifications define a specific behavior.

Even in C, you can use many ways to prevent unintended damages, like guardian memory areas surrounding the array's memory block. Fault can then be processed by a signal handler.

Since most other languages rely on C/C++ implementations this is also how 'more modern' programming languages handle these issues (negative array index) as part of the specifications or implementations. Tests for negative array indexes may also be used, but at a performance penalty.

C# or Java variables take more space than in C because they allocate more information (locks, garbage collection, guardian areas, etc.), while the wasted space for C variables may only result from alignment when the default behavior is not replaced by something more sophisticated by the programmer.