Variants fails to compile on GF

109 views Asked by At

I build a program to generate different verbs from one Imperative tree.

the abstract file:

abstract Test = {
      flags startcat = Utterance;
      cat
          Utterance; Imperative; Verb; VerbPhrase;
      fun

      -- verb phrase
      Play_VP : VerbPhrase;


      -- Imp
      Play_Imp : VerbPhrase -> Imperative;


      -- Utt
      Sentence : Imperative -> Utterance;}

concrete file:

concrete TestEng of Test = open SyntaxEng, TestEngSrc, ParadigmsEng in {

lincat
    Utterance   = Utt;
    Imperative  = Imp;
    VerbPhrase  = VP;
    Verb        = V;
lin


-- verb phrase
Play_VP = mkVP ( variants{ mkV(play_Str) ; put_on_V });


--Imp
Play_Imp verbPhrase = mkImp(verbPhrase);

--Utt
Sentence imperative = mkUtt(imperative);}

and finally the source file:

resource TestEngSrc = open ParadigmsEng, SyntaxEng in {

oper
    -- verb string
    play_Str : Str  = variants{ "broadcast" ; "play"
                    ; "replay" ; "see" ; "view" ; "watch" ; "show"};

    -- verb
    play_V : V = variants {mkV(play_Str) ; put_on_V };


        -- verb part
        put_on_V : V = partV (mkV "put") "on";}

but as soon as I run this program it starts running and stuck to this situation Compiling problem

I searched GF thread on GitHub to make sure if this problem is a personal one or general, but I found this page: https://github.com/GrammaticalFramework/GF/issues/32 Which mentioned a solution would be offered in newer versions of GF. Is there are any update about this thread or is there is a better solution than the one offered in this thread. Appreciate your time and effort.

3

There are 3 answers

0
inariksit On BEST ANSWER

Addressing Yousef's answer:

Yes, the compiled grammar is smaller when the variants are at a V level. In your alternative grammar, you apply a V -> VP operation on the V that has the variants. In my grammar, I applied a V -> Imp operation on the V. In both of these grammars that compile quickly, the category that gets the variants is V and not VP.

You are right that there is no reason to avoid the VP category elsewhere in the grammar—the crucial issue here is whether the grammar has a variant-riddled VP as a 0-argument function.

Anatomy of the grammar blowup

Why is that? I returned to this question after reducing the fields in a VP, so now I can demonstrate this more easily.

We need to look at the PGF dump. You can see it by typing pg in the GF shell, where you have opened the grammar.

PGF dump for well-behaving grammar

Here are the concrete functions for the original grammar (in Yousef's first question), with the difference of adding a function MkVP : Verb -> VerbPhrase, and moving all variants into Play_V.

-- Abstract funs
  fun Play_V : Verb ;
  fun MkVP   : Verb -> VerbPhrase ;
  fun MkImp  : VerbPhrase -> Imperative ;

-- English concrete syntax compiled into the following
    F8 := (S0,S0,S0,S0,S0,S0,S0,S0,S0,S0,{-S54 x 40-},S0,S1,S7,S1,S1,S1,S1,S2,S2,S2,S1,S1,S1,S8,S5,S3,S0,S0,S0,S0,S0,S0,S0,S0,S0,S0) [MkVP]
    F9 := (S0,S0,S0,S0,S0,S0,S0,S0,S0,S0,{-S54 x 40-},S0,S1,S7,S1,S1,S1,S1,S2,S2,S2,S1,S1,S1,S8,S5,S3,S21,S23,S52,S53,S20,S19,S18,S43,S43,S43) [MkVP]
    F10 := (S4,S4,S6,S6,S14,S14,S15,S15,S16,S16,S17,S17) [MkImp]
    F11 := (S10,S13,S11,S12,S11,S0) [Play_V]
    F12 := (S24,S27,S25,S26,S25,S0) [Play_V]
    F13 := (S31,S34,S32,S33,S32,S0) [Play_V]
    F14 := (S35,S38,S36,S37,S36,S0) [Play_V]
    F15 := (S44,S47,S45,S46,S45,S0) [Play_V]
    F16 := (S48,S50,S49,S51,S49,S0) [Play_V]
    F17 := (S39,S42,S40,S41,S40,S0) [Play_V]
    F18 := (S28,S29,S28,S30,S28,S22) [Play_V]
  • We have 8 concrete functions for Play_V, because we had 8 variants: "broadcast", "play", "replay", "see", "view", "watch", "show" and "put on".
  • We have two concrete functions for MkVP, because they are inherited from the GF RGL. This particular split into two concrete functions is due to the isRefl param in the lincat of V:
    1. A function of type V -> VP either takes a V where isRefl=True, in which case it puts the sequences S21,S23,S52,S53,S20,S19,S18,S43,S43,S43 into appropriate fields. (If you follow the sequence numbers in the PGF dump, you will see they all correspond to reflexive pronouns. S21 is "myself", S43 is "themselves".)
    2. Or it takes a V where isRefl=False, in which case it puts S0 into all those fields. (S0 is the empty string.)

PGF dump for the misbehaving grammar

Now, let us look at the PGF for the grammar that has a 0-argument function VP riddled with variants.

F4 := (S0,S0,S0,S0,S0,S0,S0,S0,S0,S0,{-S15 x 40-},S0,S8,S9,S8,S8,S8,S8,S11,S11,S11,S8,S8,S8,S0,S10,S9,S0,S0,S0,S0,S0,S0,S0,S0,S0,S0) [Play_VP]

…

F16388 := (S0,S0,S0,S0,S0,S0,S0,S0,S0,S0,{-S15 x 40-},S0,S12,S12,S12,S12,S12,S12,S13,S13,S13,S12,S12,S12,S7,S14,S12,S0,S0,S0,S0,S0,S0,S0,S0,S0,S0) [Play_VP]

This time, we don't have the split into "what if the argument is reflexive or not", because the Play_VP doesn't take arguments. Instead, we split into over 16000 concrete functions, due to the variants blowing up.

To see the process in a smaller scale, see my blog post: https://inariksit.github.io/gf/2018/06/13/pmcfg.html#variants The key there is the following: we only introduce 4 variants in a linearisation of a single function—the variants don't come from the arguments, but are introduced directly into the function. Each of these variants is used multiple times in the linearisation, so that blows up into 64 new concrete functions.

Now for a function that returns a VP, its arguments are used in many more places. The lincat of V has only 6 fields, and VP has almost 100, even after my latest fix. This means that the same fields from the V argument are reused multiple times, and whenever that happens, it splits exponentially into 8 new branches of concrete functions.

Solutions

To recap:

  • Keep the variants in a category that has a small lincat; V instead of VP in this case.
  • No need to avoid large categories elsewhere in the grammar; if a function f : SmallCat -> BigCat takes an argument that is full of variants, it will go just fine. The function f will not blow up—it doesn't care about its potential arguments on the level of variants, only on the level of inherent parameters (like MkVP is interested if its argument V is reflexive, but doesn't care if it is composed of 8 variants).

Future

The overall handling of variants is going to change in GF 4.0. So whenever it is released, this whole answer is hopefully deprecated, and we have a glorious future where nobody runs into these problems anymore.

0
inariksit On

No, there hasn't been update in the handling of variants. But luckily, your code can be made much more efficient with a small fix.

VP is big and slow

The biggest bottleneck in your grammar is that you have the category VerbPhrase, with lincat VP from the RGL. It's not visible to the end user, but a VP contains almost 3000 fields. If you want to see, try this in the GF shell:

> i -retain TestEngSrc.gf
> cc mkVP play_V
... lots of output

I don't know the exact details of compilation, but with a VP that has 8 variants, the compiler gets stuck.

How to fix your grammar

If you know that you'll only use the verbs in imperative, you can skip the VP stage completely, and just create imperatives from verbs directly. The RGL category for V is much nicer, instead of ~3000 fields, it has 6. So if you change your grammar to this, it compiles instantly. I change the name to Test2, so you can compare against the old.

abstract Test2 = {
      flags startcat = Utterance;
      cat
          Utterance; Imperative; Verb;
      fun

      -- Verb
      Play_V : Verb ;

      -- Imp
      Play_Imp : Verb -> Imperative;

      -- Utt
      Sentence : Imperative -> Utterance;
}

And concrete syntax is here. I'm opening IrregEng and LexiconEng, because some of the verbs are already defined there.

concrete Test2Eng of Test2 = open SyntaxEng, ParadigmsEng, IrregEng, LexiconEng in {

lincat
    Utterance   = Utt;
    Imperative  = Imp;
    Verb        = V;
lin

  --Verb
  -- broadcast_V, see_V, show_V are in IrregEng.  play_V is in LexiconEng.
  Play_V = play_V|replay_V|broadcast_V|see_V|show_V|view_V|watch_V|put_on_V ;


  --Imp
  Play_Imp verb = mkImp verb ;

  --Utt
  Sentence imperative = mkUtt imperative ;

  oper
    replay_V : V = mkV "replay" ;
    view_V : V = mkV "view" ;
    watch_V : V = mkV "watch" ;
    put_on_V : V = partV put_V "on"; -- put_V is in IrregEng
}

Testing in the GF shell, works as intended:

Test2> p "replay"
Sentence (Play_Imp Play_V)

Test2> p "watch"
Sentence (Play_Imp Play_V)

Test2> gt | l -treebank -all
Test2: Sentence (Play_Imp Play_V)
Test2Eng: play
Test2Eng: replay
Test2Eng: broadcast
Test2Eng: see
Test2Eng: show
Test2Eng: view
Test2Eng: watch
Test2Eng: put on
0
yousef almesbahi On

Since I need to use Verb Phrase, there's no way just to skip it. Thus I tried to find out the reason behind this problem and I ended up with this result:

In this condition although the number of variants is a no more than three strings but it still freeze each time the program runs.

Code with error

abstract:

   abstract Test = {
   flags startcat = VerbPhrase;
   cat
       VerbPhrase; Verb; 
   fun

   Play_VP : VerbPhrase;
}

concrete:

   concrete TestEng of Test = open SyntaxEng,  ParadigmsEng, IrregEng in {
   
   lincat
       VerbPhrase  = VP;
       Verb        = V ;
   lin

       Play_VP = mkVP(play_V);


   oper

   play_V : V = variants {mkV(play_Str) ; put_on_V};

   play_Str : Str = variants {"play" ; "brodcast"};

   put_on_V : V = partV put_V "on";
}

But on the other situation when Verb is defined in abstract in this method the program runs perfectly fine.

Code with no error

abstract:

   abstract Test = {
   flags startcat = VerbPhrase;
   cat
       VerbPhrase; Verb; 
   fun

   Play_V : Verb;

   Play_VP : Verb -> VerbPhrase;
}

concrete:

concrete TestEng of Test = open SyntaxEng,  ParadigmsEng, IrregEng in {
   
   lincat
       VerbPhrase  = VP;
       Verb        = V ;
   lin

       Play_V = variants {mkV(play_Str) ; put_on_V};

       Play_VP play_v = mkVP(play_v);


   oper

   play_Str : Str = variants {"play" ; "brodcast"};

   put_on_V : V = partV put_V "on";
}

Apparently the problem is not actually in the VP structure, but rather in the way that VP behave when calling an operation with variants.

Hopfully you guys look over this problem and figure out a solution.