Mini-update on flatbuffers-with-spirit

by Max Galkin

There’s not a whole lot of news about flatbuffers-with-spirit since the last time I blogged about it, but I’ve followed up on some todo-items:

  • I have filed a few bugs for CLion based on my experience;
  • I have restructured the project to move tests into a separate Biicode block from the implementation (so that now, for example, I can “publish” my implementation block and anyone else can depend on it via Biicode);
  • I have updated dependencies of the block so that Catch framework would be also downloaded from Biicode rather than being stored in repo as a header file;
  • I have updated the instructions on how to download and build the repo;
  • I have played with Boost.Karma to see if it will be a good fit for the task and set up a very simple generator.

Now I’m going to cover the new structure of the project in more detail and also summarize what I’ve learned.

 

 

Here are the dependencies I currently have in my blocks:

flatbuffers-with-spirit-test (block with Catch tests) 
|                                                     
+--diego/catch                                        
|                                                     
+--flatbuffers-with-spirit (main implementation block)
   |                                                  
   +--biicode/boost                                   
   +--biicode/cmake                                   

diego/catch is a Bii block with the Catch framework header file (link). biicode/boost is a Bii block with some extra utilities for CMake, for example, I use it to request C++11 compilation key in a cross-platform manner, but you can share pretty much any CMake macro this way. biicode/boost is… uhm… Boost.

The nicest part is that really there is very little overhead from the dependencies in my repository, I only have 1-2 extra files per block to describe the relationships, everything else Biicode figures out automatically. And when you download the repo locally, you just need to run 2-3 bii commands to generate all the rest of CMake files, download the dependencies, and build the project! So this all works pretty well for me.

 

 

The second thing I wanted to mention is that I’ve added a trivial Boost.Karma generator in preparation for generating the C++/C#/Java code based on flatbuffers IDL file. For uninitiated, Boost.Karma is a “co-library” for the Boost.Qi I’ve talked about last time. They have very similar syntax to describe a grammar, but in Boost.Qi you define a parsing grammar, and in Boost.Karma you define a generating grammar. The official Boost documentation page illustrates the typical flow neatly and what I need for flatbuffers is pretty close to it:

Typical data flow with Boost.Spirit

Now, Boost.Qi seems to be a good fit for the parsing task for Flatbuffers IDL. I’ve implemented a parser for simple structs, and I’m pretty confident I can extend it to cover most of the IDL grammar without much trouble and it would still be pretty manageable.

However, with Boost.Karma and code-generation things are not so great for two reasons.

Firstly, there is a missing layer of abstraction. When I reviewed flatbuffers code earlier I pointed out that it should really be using some kind of DOM model for the code it generates. I meant something like .NET CodeDom interface. But what plain Boost.Karma provide is really lower level than that. You basically work with a text template, which can solve the problem, of course, but it’d be close to the manually written generator solution and the important parts of the generation may end up lost in the “noise” of the template itself.

And secondly, I’ve discovered that Flatbuffer’s layouts aren’t fully specified in the documentation. They are specified to some extent, enough to understand the implementation, but the ultimate specification is in the code, and I don’t think it’s a good idea for me to try to emulate that code’s behavior, because such an approach would be quite fragile. For example, below is a simple struct (“table” to be precise and use Flatbuffers vocabulary) and an excerpt from the C++ header file Flatbuffers generates for it:

namespace test;


table Test 
{
  id:int;
  name:string;
}

root_type Test;
struct Test FLATBUFFERS_FINAL_CLASS : private flatbuffers::Table {
  int32_t id() const { return GetField<int32_t>(4, 0); }
  const flatbuffers::String *name() const { 
    return GetPointer<const flatbuffers::String *>(6); 
  }
  bool Verify(flatbuffers::Verifier &verifier) const {
    return VerifyTableStart(verifier) &&
           VerifyField<int32_t>(verifier, 4 /* id */) &&
           VerifyField<flatbuffers::uoffset_t>(verifier, 6 /* name */) &&
           verifier.Verify(name()) &&
           verifier.EndTable();
  }
};

I need to mimic the binary layout precisely, and there are offsets like “4”, “6” in calls to intermediate layer of flatbuffer APIs, and those offsets include an offset for a size of a virtual table, and the shift based on the number of the field. However, if the interface of internal methods like GetField changes at some point, e.g. to just take the index of the field as input instead of the raw offset, my generated headers will not know about that and would produce incorrect results… That’s what I mean by “fragile”.

Anyway, to summarize, here is what I’ve learned:

  • Biicode worked well for me and had very low overhead in terms of extra stuff you need to have in your repo, and considering that it substitutes some of the CMake files you’d have to have without it, it can even reduce the total number of files;
  • there’s a place for libraries building higher-level generators on top of Boost.Karma, for example CodeDom generators for C++/C#/Java… though I’m not sure, maybe such libraries exist already;
  • for some libraries it could be beneficial to maintain a specification detailed enough to let others create implementations independently… of course, such approach makes the initial development cost higher, so there’s that tradeoff.