Lazy evaluation container for dynamic programming?

546 views Asked by At

I have some pattern that works great for me, but that I have some difficulty explaining to fellow programmers. I am looking for some justification or literature reference.

I personally work with PHP, but this would also be applicable to Java, Javascript, C++, and similar languages. Examples will be in PHP or Pseudocode, I hope you can live with this.

The idea is to use a lazy evaluation container for intermediate results, to avoid multiple computation of the same intermediate value.

"Dynamic programming":

http://en.wikipedia.org/wiki/Dynamic_programming

The dynamic programming approach seeks to solve each subproblem only once, thus reducing the number of computations: once the solution to a given subproblem has been computed, it is stored or "memo-ized": the next time the same solution is needed, it is simply looked up

Lazy evaluation container:

class LazyEvaluationContainer {

  protected $values = array();

  function get($key) {
    if (isset($this->values[$key])) {
      return $this->values[$key];
    }
    if (method_exists($this, $key)) {
      return $this->values[$key] = $this->$key();
    }
    throw new Exception("Key $key not supported.");
  }

  protected function foo() {
    // Make sure that bar() runs only once.
    return $this->get('bar') + $this->get('bar');
  }

  protected function bar() {
    .. // expensive computation.
  }
}

Similar containers are used e.g. as dependency injection containers (DIC).

Details

I usually use some variation of this.

  • It is possible to have the actual data methods in a different object than the data computation methods?
  • It is possible to have computation methods with parameters, using a cache with a nested array?
  • In PHP it is possible to use magic methods (__get() or __call()) for the main retrieval method. In combination with "@property" in the class docblock, this allows type hints for each "virtual" property.
  • I often use method names like "get_someValue()", where "someValue" is the actual key, to distinguish from regular methods.
  • It is possible to distribute the data computation to more than one object, to get some kind of separation of concerns?
  • It is possible to pre-initialize some values?

EDIT: Questions

There is already a nice answer talking about a cute mechanic in Spring @Configuration classes.

To make this more useful and interesting, I extend/clarify the question a bit:

  • Is storing intermediate values from dynamic programming a legitimate use case for this?
  • What are the best practices to implement this in PHP? Is some of the stuff in "Details" bad and ugly?
2

There are 2 answers

5
fdreger On

If I understand you correctly, this is quite a standard procedure, although, as you rightly admit, associated with DI (or bootstrapping applications).

A concrete, canonical example would be any Spring @Configuration class with lazy bean definitions; I think it displays exactly the same behavior as you describe, although the actual code that accomplishes it is hidden from view (and generated behind the scenes). Actual Java code could be like this:

@Configuration
public class Whatever {

  @Bean @Lazy
  public OneThing createOneThing() {
    return new OneThing();
  }

  @Bean @Lazy
  public SomeOtherThing createSomeOtherThing() {
    return new SomeOtherThing();
  }

  // here the magic begins:

  @Bean @Lazy
  public SomeThirdThing getSomeThirdThing() {
    return new SomeThirdThing(this.createOneThing(), this.createOneThing(), this.createOneThing(), createSomeOtherThing());
  }
}

Each method marked with @Bean @Lazy represents one "resource" that will be created once it is needed (and the method is called) and - no matter how many times it seems that the method is called - the object will only be created once (due to some magic that changes the actual code during loading). So even though it seems that in createOneThing() is called two times in createOneThing(), only one call will occur (and that's only after someone tries to call createSomeThirdThing() or calls getBean(SomeThirdThing.class) on ApplicationContext).

11
Sven On

I think you cannot have a universal lazy evaluation container for everything.

Let's first discuss what you really have there. I don't think it's lazy evaluation. Lazy evaluation is defined as delaying an evaluation to the point where the value is really needed, and sharing an already evaluated value with further requests for that value.

The typical example that comes to my mind is a database connection. You'd prepare everything to be able to use that connection when it is needed, but only when there really is a database query needed, the connection is created, and then shared with subsequent queries.

The typical implementation would be to pass the connection string to the constructor, store it internally, and when there is a call to the query method, first the method to return the connection handle is called, which will create and save that handle with the connection string if it does not exist. Later calls to that object will reuse the existing connection.

Such a database object would qualify for lazy evaluating the database connection: It is only created when really needed, and it is then shared for every other query.

When I look at your implementation, it would not qualify for "evaluate only if really needed", it will only store the value that was once created. So it really is only some sort of cache.

It also does not really solve the problem of universally only evaluating the expensive computation once globally. If you have two instances, you will run the expensive function twice. But on the other hand, NOT evaluating it twice will introduce global state - which should be considered a bad thing unless explicitly declared. Usually it would make code very hard to test properly. Personally I'd avoid that.

It is possible to have the actual data methods in a different object than the data computation methods?

If you have a look at how the Zend Framework offers the cache pattern (\Zend\Cache\Pattern\{Callback,Class,Object}Cache), you'd see that the real working class is getting a decorator wrapped around it. All the internal stuff of getting the values stored and read them back is handled internally, from the outside you'd call your methods just like before.

The downside is that you do not have an object of the type of the original class. So if you use type hinting, you cannot pass a decorated caching object instead of the original object. The solution is to implement an interface. The original class implements it with the real functions, and then you create another class that extends the cache decorator and implements the interface as well. This object will pass the type hinting checks, but you are forced to manually implement all interface methods, which do nothing more than pass the call to the internal magic function that would otherwise intercept them.

interface Foo
{
    public function foo();
}

class FooExpensive implements Foo
{
    public function foo()
    {
        sleep(100);
        return "bar";
    }
}

class FooCached extends \Zend\Cache\Pattern\ObjectPattern implements Foo
{
    public function foo()
    {
        //internally uses instance of FooExpensive to calculate once
        $args = func_get_args();
        return $this->call(__FUNCTION__, $args); 
    }
}

I have found it impossible in PHP to implement a cache without at least these two classes and one interface (but on the other hand, implementing against an interface is a good thing, it shouldn't bother you). You cannot simply use the native cache object directly.

It is possible to have computation methods with parameters, using a cache with a nested array?

Parameters are working in the above implementation, and they are used in the internal generation of a cache key. You should probably have a look at the \Zend\Cache\Pattern\CallbackCache::generateCallbackKey method.

In PHP it is possible to use magic methods (__get() or __call()) for the main retrieval method. In combination with "@property" in the class docblock, this allows type hints for each "virtual" property.

Magic methods are evil. A documentation block should be considered outdated, as it is no real working code. While I found it acceptable to use magic getter and setter in a really easy-to-understand value object code, which would allow to store any value in any property just like stdClass, I do recommend to be very careful with __call.

I often use method names like "get_someValue()", where "someValue" is the actual key, to distinguish from regular methods.

I would consider this a violation of PSR-1: "4.3. Methods: Method names MUST be declared in camelCase()." And is there a reason to mark these methods as something special? Are they special at all? The do return the value, don't they?

It is possible to distribute the data computation to more than one object, to get some kind of separation of concerns?

If you cache a complex construction of objects, this is completely possible.

It is possible to pre-initialize some values?

This should not be the concern of a cache, but of the implementation itself. What is the point in NOT executing an expensive computation, but to return a preset value? If that is a real use case (like instantly return NULL if a parameter is outside of the defined range), it must be part of the implementation itself. You should not rely on an additional layer around the object to return a value in such cases.

Is storing intermediate values from dynamic programming a legitimate use case for this?

Do you have a dynamic programming problem? There is this sentence on the Wikipedia page you linked:

There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping subproblems. If a problem can be solved by combining optimal solutions to non-overlapping subproblems, the strategy is called "divide and conquer" instead.

I think that there are already existing patterns that seem to solve the lazy evaluation part of your example: Singleton, ServiceLocator, Factory. (I'm not promoting singletons here!)

There also is the concept of "promises": Objects are returned that promise to return the real value later if asked, but as long as the value isn't needed right now, would act as the values replacement that could be passed along instead. You might want to read this blog posting: http://blog.ircmaxell.com/2013/01/promise-for-clean-code.html

What are the best practices to implement this in PHP? Is some of the stuff in "Details" bad and ugly?

You used an example that probably comes close to the Fibonacci example. The aspect I don't like about that example is that you use a single instance to collect all values. In a way, you are aggregating global state here - which probably is what this whole concept is about. But global state is evil, and I don't like that extra layer. And you haven't really solved the problem of parameters enough.

I wonder why there are really two calls to bar() inside foo()? The more obvious method would be to duplicate the result directly in foo(), and then "add" it.

All in all, I'm not too impressed until now. I cannot anticipate a real use case for such a general purpose solution on this simple level. I do like IDE auto suggest support, and I do not like duck-typing (passing an object that only simulates being compatible, but without being able to ensure the instance).