How do dynamically typed languages work?

2.8k views Asked by At

I today learned that dynamically typed programming languages do type checking at run-time as opposed to statically typed languages which do so at Compile-time.(Correct me if I am wrong). What I want to know is how does at run time dynamically typed languages find out the type and how does it work? And also dynamically typed languages are called value typed language, what does it mean to say that in the case of dynamically typed languages, type is associated with value?

As I am a beginner my question would come to some of you as not a good question, please try to think from my perspective, I have just started and I am not finding this answer anywhere.

Update

From Wikipedia Page On Type System

Implementations of dynamically type-checked languages generally associate each runtime object with a "type tag" (i.e. a reference to a type) containing its type information. This runtime type information (RTTI) can also be used to implement dynamic dispatch, late binding, downcasting, reflection, and similar features.

Now What is type tag and how does it work, I mean if you could show me how in memory it is represented?

2

There are 2 answers

0
Uriel On

Variables in dynamic languages are usually references to classes instances, or objects (and some times built in datatypes as int for some languages).

They can receive references to different objects (or classes instances) during their "life time" (where garbage collection exist), and their type is therefore derived by the type of the object (is it int, string or an user defined Person?).

When coming to determine the type of the object, different languages would implement this in different ways.

Python, for example, uses metaclasses to determine the type of the class, and uses that information when it comes to determine what type of object it is.

Javascript uses objects who contain in addition to their normal attributes specifiers (similarly to metaclasses), alongside the normal datatypes (integer, float, character...).

0
Ryan Culpepper On

It's up to the implementation. Here are three rough examples of how an implementation might decide to represent values.

Note on terminology: I'm going to use the word "value" to talk about the bits that get passed as arguments to functions, put in fields, etc. In many cases, a value might either be or include a pointer to additional memory elsewhere.

1: Pointer with type tag header

In this implementation, a value is just a pointer to an object in memory, and every object has a "type tag" at the same offset that says what kind of thing it is. It could be a simple enumeration or a pointer to a "meta-class" of some sort.

An integer, for example, would be represented by a pointer to an object with two fields: the first contains the type tag INTEGER and the second contains the integer value. When talking about simple primitive data like integers, this is called a "boxed" representation.

The advantage is simplicity. The downside of boxing integers is that every arithmetic operation 1) goes through extra pointer indirections and 2) requires allocating an object for the result. That's generally really slow.

Java virtual machines use this representation for all class-typed variables and fields. That's why int is fast and Integer is slow (barring optimization, etc etc).

2: Two words

Another possible representation is to use two words to represent every value. The first is the type tag and the second is either the immediate value (in the case of types like integers, booleans, characters, etc) or a pointer (in the case of strings, objects, arrays, etc).

That eliminates the allocation and memory indirection problem associated with boxing integers etc, but it makes the value representation twice as big, which is wasteful.

3: One word with tag bits

A third kind of representation "optimizes" the second representation by squeezing the type tag and value into a single word. If all objects are allocated 32-bit aligned, then every valid pointer ends in 2 zero bits. (Or 3 for 64-bit alignment, etc.) Those bits can be used to discriminate between a very few basic types. For example, here's one tagging system:

  1. xxxxxx00 means the integer value xxxxxx (sign-extended). Integers in this representation are called "fixnums".
  2. vvvvvv01 means vvvvvv interpreted as another kind of small value (like a boolean, "null", character, etc)
  3. pppppp10 means an object at address pppppp00 (where the object starts with a more detailed type header)

Like representation 2 above, this representation scheme also avoids boxing (most) small data, but it avoids bloat because values are still represented by a single word. It requires some degree of cooperation with the garbage collector (to recognize and follow pointers), but so does 2.

For a language like Scheme, which has unlimited-size integers as well as other kinds of numbers, like "flonums" (floating point numbers) and "ratnums" (exact rational numbers, or fractions), an arithmetic operation like a+b is compiled like this:

if a and b are both fixnums:
  add a and b directly
  on overflow, jump to the general addition function
  otherwise, that's the result
otherwise, jump to the general addition function

The general addition function just plods through all the cases, figures out promotions if necessary, and does the appropriate work for the kinds of numbers given. But if most of the time a and b are fixnums, then the computation happens in line and it's fast.

This representation is fast and compact, although operations (primitive arithmetic, object field fetch, etc) get a little more complicated to check and compensate for tag bits. One downside of this approach is that you generally cannot use it if you're writing an interpreter/VM in a typical typed, safe language like Java, C#, ML, etc. The type system won't let you turn pieces of integers into pointers. You can do it in C, which is unsafe.

Yet another variation of this representation idea is "NaN-encoding": there are around 2^51 bit patterns that mean "not a number" as IEEE double-precision floating point numbers. Why not represent valid floating-point numbers as themselves and pack other values into NaN-space?


Here are some references: