Strangely Consistent

Musings about programming, Perl 6, and programming Perl 6

Macros: allowing two terms in a row

Perl 6 syntactic constructs are privileged in that they can break the TTIAR rule in Perl 6.

if 2 + 2 == 4 {
    say "math!";
}

What's "TTIAR"? The acronym stands for "Two Terms In A Row", and the parsing tradition behind it all actually goes back all the way to Perl 5. When the Perl 5 parser sees a /, how does it know whether it's a division operator, or the start of a regex literal?

The first part of that answer is that the parser constantly keeps track of whether it's expecting a term or an operator. And so it can tell those two things apart just fine. The second part of the answer involves all the things that participate in that dance between expect-term and expect-op: everything from prefix and postfix ops to so-called listops. A few examples help show the complexity of this dance:

2 + 3 * 4;               Ⓣ2 Ⓞ+ Ⓣ3 Ⓞ* Ⓣ4 Ⓞ;
$x++ + ++$y;             Ⓣ$x Ⓞ++ Ⓞ+ Ⓣ++ Ⓣ$y Ⓞ;
say "OH HAI";            Ⓣsay Ⓣ"OH HAI" Ⓞ;

The concept of "TTIAR" says that there is a rule the code author cannot break: it's not allowed to put a term where the parser expected an operator.

"will " "concatenate?";  Ⓣ"will " Ⓞ<<<PARSE ERROR>>>

This is all good and well, and works to our advantage. TimToady has described it on several occasions as Perl 6's "self-clocking" mechanism: if a term shows up where an operator was expected, we give a parse error. Often we're able to give a more specific parse error than just "Confused" — in fact, many of the excellent errors we give are improved versions of the TTIAR error.

# excerpt from inside of the `panic` method in STD.pm6
if self.lineof($startpos) != self.lineof($endpos) {
    $m ~~ s|Confused|Two terms in a row (previous line missing its semicolon?)|;
}
elsif @*MEMOS[$startpos]<listop> {
    $m ~~ s|Confused|Two terms in a row (listop with args requires whitespace or parens)|;
}
elsif @*MEMOS[$startpos]<baremeth> {
    $here = $here.cursor($startpos);
    $m ~~ s|Confused|Two terms in a row (method call with args requires colon or parens without whitespace)|;
}
elsif @*MEMOS[$startpos]<arraycomp> {
    $m ~~ s|Confused|Two terms in a row (preceding is not a valid reduce operator)|;
}
else {
    $m ~~ s|Confused|Two terms in a row|;
}

Ok, great. So, back to the if statement from the top. It breaks TTIAR.

if 2 + 2 == 4 {          Ⓣif Ⓣ2 Ⓞ+ Ⓣ2 Ⓞ== Ⓣ4 Ⓞ{

That last brace there introduces a block, which is a term. (A big one.) But as you see, the parser is in op-expecting mode. Boom... no, not boom! Just a regular day at the if statement parsing assembly line. The parser, instead of throwing a hissy fit about a term that's out of line, just turns around real quick and goes "oh, right! this is an if statement, so it's fine". (In fact, this little hiccup is the secret sauce that allows Perl 6 to drop the parentheses in if statements.)

The same goes for all the big-shot block-accepting control statements out there: unless, while, until, repeat, for, given, and when. They flaunt the rule, practically laughing in the face of the 99%, the less fortunate grammatical constructs who have to get all their terms and operators in the accepted order. Ha ha!

I don't know about you, reader, but I think this tyranny should end. We should put the power of TTIAR breakage into the hands of the author, where it belongs. They clearly belong in macros. In fact, they shouldn't be allowed to call themselves macros if they couldn't do this, and emulate an if statement. In my opinion.

People have a standard response to this, and I think it's problematic as it stands. They say "well, that's OK; for such macros we just have to use the is parsed trait. That way, the macro can take over for a while from the compiler's parser, and supply its own, up to and including breaking TTIAR to its heart's content."

And yes, maybe that is the solution. I hope it can be. But not as it stands today. Let's look at the only example in S06 that declares a macro with is parsed:

macro circumfix:«<!-- -->» ($text) is parsed / .*? / { "" }

I personally hope that this particular declaration will never work in Perl 6. If I squint, I can... sort of... see how it would work out. Ok, so the parser comes and finds a circumfix in an expression, yes. Maybe the expression is this:

2 + <!-- lol an SGML comment --> 2

Somehow that regex / .*? / gets tried again and again until it matches the whole comment, right? (Bear with me here.) The macro, full of mirth, returns the empty string for the parser to re-integrate into the source code. The parser is then apparently supposed to go "oops! what a great macro that was! but even though I am now after a circumfix call, which would normally have me hungry for an operator, it left me with nothing, which must obviously mean I'm now back to expecting a term! Yeah, that's a healthy way to modify code, oh boy!"

In other words, the one spec'd example we have of the is parsed trait runs on 80% magic and 20% wishful thinking. (And apparently as implementor, my way to cope is to get really sarcastic. I leave the mocking of the is parsed macro in E06 as an exercise to the reader.)

Here's what I think is going on. In the latter half of the naughties, we got amazing STD parser technology. We basically figured out how Perl 6 is parsed. The macro spec (and the is parsed trait) largely comes before that. The Perl 6 chorus today sings about grammars, and sometimes action methods. But the is parsed trait still mumbles about its regexes, making itself a bit of an embarrassment, to be honest. It hasn't gotten the memo that all the rest of us are doing structured language parsing, not just text munging.

What if when I declared a macro, I got the option to play the same game as if and for and the other big cats? What if I got to effectively extend the current Perl 6 grammar being parsed? (This is also the goal of slangs.) I think a lot of the problems would be solved simply with addition.

Implementation

At the point the macro is parsed, but before parsing its arguments, we need to give the macro a chance to do its own parsing. This parsing should be able to call into methods declared in the Perl 6 parser/grammar.

Let's look at some of the statement control rules in STD.pm6 to get a feel for whether this is realistic. Here's if, for example:

rule statement_control:if {
    <sym>
    <xblock>
    [
        [
        | 'else'\h*'if' <.sorry: "Please use 'elsif'">
        | 'elsif'<?keyspace> <elsif=.xblock>
        ]
    ]*
    [
        'else'<?keyspace> <else=.pblock>
    ]?
}

(Here, <xblock> expects an expression and a block (that's the TTIAR breakage right there), and <pblock> expects a block with a possible "pointy" -> parameter declaration on it.)

I don't know about you, but this way of specifying how Perl 6 code should be parsed feels at the same time very uncluttered, natural, and overall a good fit. There's no waste here. Some error handling, but that's all. In terms of bang-for-the-buck, we're doing very well.

while statement?

rule statement_control:while {
    <sym>
    [ <?before '(' ['my'? '$'\w+ '=']? '<' '$'?\w+ '>' ')'>   #'
        <.panic: "This appears to be Perl 5 code"> ]?
    <xblock>
}

Again, yes. Couldn't be much simpler. given?

rule statement_control:given {
    <sym>
    <xblock>
}

The paragon of simplicity. It's a symbol, and then an expression-block. What about when?

rule statement_control:when {
    <sym>
    <?dumbsmart>
    <xblock>
}

Again, that's exac... hey! Who are you calling dumbsmart? What did I ever do to you, when?

An is parsed macro could perhaps look something like this (running with the example from the last post):

macro transact is parsed / <sym> <xblock> / {
    # mumble handwave need to extract $conn and &block from <xblock>
}

Inside the macro, we'd have access to the parse tree thus generated. Probably also the regular action methods of all the rules that fired should also run as they usually do during a parse. Which means that we'll have a full AST built too, QAST and all. Here my crystal ball grows blurry, because we already said we don't want to be interacting with the QAST.

We'll have to attack this in a later post. But we're already in a better position here than when we started; we can now harness the power of the Perl 6 grammar from our macro.

If after that, it turns out that it's still unnecessarily cumbersome to specify something as simple as the transact macro above — if it feels like writing boilerplate every time we do that — then clearly we should find some sugared way to write it so as to simply express "keyword, expression, block — you know what to do". It goes without saying that, for dogfooding reasons, that type of sugar should be provided through macros, maybe in module-space.

Not addressed by this proposal

Identified in a previous post.

Macros: nesting macros

Some features that the program author wants to implement need to straddle more than one macro. A common relationship between macros seems to be the outside-inside relationship.

AngularJS has this feature. When you declare directives (which can then act as elements or attributes in your application HTML), one directive can declare that it needs another one around it to function using the require option. See this example from UI Bootstrap, with <tabset> being the outer directive, and <tab> the inner.

There are two important parts of this feature:

(Further down in that example, there's even a tabHeadingTransclude directive which nests inside tab. That is, a directive can be both a child and a parent. Though the tabHeadingTransclude is so simple that it only requires the inclusion for the "validity" reason above, not for "sharing".)

Meanwhile, in Perl 6 macros

I believe this feature is something that macro authors will want, and find useful. I think macros will end up working in groups like this sometimes, and by far the most common way to group things will be the parent/child relation. (Or ancestor/descendant, to be exact. Hm. Some things will want to be tightly nested with no stuff in between parent and child; other things will be more lenient.)

One example that's already in Perl 6: given and when.

given $food {
    when /pie/ { say "mm, pie" }
    default { say "waiter, could you send in some pie?" }
}

Actually, what Perl 6 requires in this case is that the when (and default) find itself lexically inside a topicalizer block, not necessarily a given. So this is fine:

for @foods {
    when /pie/ { say "mm, pie" }
    default { say "waiter, could you send in some pie?" }
}

sub review-food($_) {
    when /pie/ { say "mm, pie" }
    default { say "waiter, could you send in some pie?" }
}

Which indicates that in some cases, the child macro might want to specify that it wants to be the child of (semantically) an any junction of parent macros.

Note that this example does not extend to the following likely parent/child constructs, which are too dynamic in nature and therefore the domain of the runtime rather than the compiler.

How about a more DSL-y example?

I went hunting for a good example of this, preferably one that exists already. I guess my DSL advent post is one. It does illustrate both the "validity" and "sharing" benefits. But I feel I need another example.

Let's imagine a DSL for making database transactions.

transact $conn {
    # do some queries
    # change stuff around in several steps
    rollback
        unless $success1;
    # more changes
    rollback
        unless $success2;
    commit;  # this would probably be optional at the end, though
}

I like this example more than the one in the advent post, because the parent macro transact and the child macros rollback and commit are collaborating on a type of data very central to the language itself: control flow. In the sense that we want a commit or rollback to also exit the transact block.

That makes the example feel real to me. Likely the mechanism for this would be transact setting up a custom handler, and commit and rollback throwing (control) exceptions with different cargo.

This type of examples inhabits a Goldilocks zone where the macros have to be not-too-simple (because then a frothy mix of subs, dynamic variables, and exception handling would work), but also not too much like a proper language (because then slangs would rush in and soak up the use cases). I think any more complicated than this and it'd be a slang. In fact, I don't mind if there's a nice, sliding scale, so that you can essentially evolve a cluster of macros of this type into a slang if you want.

Grammars do it bottom-up

I don't know what to make of the fact that in our Perl 6 grammars we've ended up with a solution where

Parsing is different from macros, so maybe it's fine. But by current design, macros can neither reach downwards to their macro children or upwards to their macro parents. And I just find it a bit odd that the design I find natural with macros (children declaring their need for/communication with parents) runs counter to the design we find useful in grammars (parents grabbing information from children by default).

I notice that I am confused. ☺

Implementation

Remember, "generate, analyze, and typecheck". The thing I'm suggesting here falls under "analyze", because we mainly want to introspect/read the program structure, and communicate things across it. Maybe the "validation" requirement falls under "typecheck".

Anyway, I fully expect there to be a general framework through which macros could do this the "hard way":

(Hm. Will parent macros therefore leave a detectable trace of themselves in the Q-tree? Probably. I half-expected macros to desugar into more primitive types of nodes. But maybe parent macros are an exception, since they want to be found. Or they desugar to something primitive, like a block, but it's marked up in a standard way, with a once I was a macro symbol.)

What I'm proposing here is basically just sugar, for people writing the macros, to set up this relationship between parent and child the "easy way":

macro transact($conn, $block) { # TTIAR, but see separate post
    # ...
}

macro commit() is inside(&transact) { #`[...] }
macro rollback() is inside(&transact) { #`[...] }

This is enough to declare the relation between the macros. There also needs to be a mechanism to get the object representing the actual callsite of the macro call. That's the one in our example that would hold $conn. In the worst case, we could fall back on asking about this through a namespace or global object somehow. There may be a cuter/saner way that I'm missing right now. Either way, it's possible.

Not addressed by this proposal

Identified in a previous post.

Feedback on "Macros: thunkish parameters"

Collecting comments here. These are not all the comments, only a selection according to some unspecified criterion. This list is just as much for my future refernce as it is for people who don't read the IRC logs as if it were the morning news.

I feel like a lawyer saying this, but don't take my blogging a comment collection as an implicit promise that I'll keep up that habit for the rest of these posts. The posts themselves are the core thing.

These comments pertain to this post.