Recursion for SHACL

The W3C recommendation that defines SHACL explicitly leaves recursion undefined. In this post, I want to discuss this topic in light of existing academic work:

The goal of this post is to clarify what it means to add recursion to SHACL, and to illustrate the ideas presented in these papers. I will discuss the different semantics without defining them first. The different semantics are: Supported Model semantics, Stratified Recursion, Stable Model semantics, and Well-Founded semantics.

Setting the stage

This post will use a running example to illustrate the different semantics. Consider an access control setting, where users can be authorized to access resources or not. The RDF data will represent an overview of an access control setting where there are users and resources. Users can authorize other users to access files. There is a distinguished user, the administrator, who is authorized by default.

A SHACL shape graph describes what is a valid access control setting, i.e., if the SHACL shapes graph validates, it is a valid access control setting.

In this post, we'll use the standard SHACL prefix, sh:, and introduce our own local prefix, @prefix : <https://mjakubowski.info/example/>.

Positive recursion

A shape is said to be recursive if it references itself (directly, or indirectly). Consider the following access control shapes graph:

:Authorized a sh:NodeShape ;
  sh:or (
    [ a sh:NodeShape ;
      sh:hasValue :admin
    ]
    [ a sh:PropertyShape ;
      sh:path [ sh:inversePath :approves ] ;
      sh:qualifiedValueShape :Authorized ;
      sh:qualifiedMinCount 1
    ]
  ) .
:Authorized sh:targetSubjectsOf :accesses .

In natural language, the :admin node is :Authorized and every node that is approved by an :Authorized node is also :Authorized. We target all nodes that access something. In essence, this shapes graph validates when all nodes that access a resource are :Authorized.

There are two things to note about this shape. First, it is clearly recursive, as it mentions itself through the sh:qualifiedValueShape. Second, it mentions itself positively, that is, the recursion does not happen through some kind of negation.

When talking about recursion, negation adds an extra layer of complexity to the discussion. Almost all the differences between the proposed semantics are due to negation. We'll discuss this further later in this post.

Semantics for positive recursion

Consider the following data graph:

:admin :approves :user_a .
:user_b :approves :user_c .
:user_c :approves :user_b .

Let us focus first on deciding which users could actually be assigned the :Authorized shape, forgetting for a moment the targeting specified in our shapes graph. Which of the users could be said to be :Authorized? That is the question we need to be able to answer when we have defined recursive semantics for SHACL.

Stratified recursion, Well-Founded, and Stable Model semantics all agree that the set of :Authorized nodes needs to be only :admin and :user_a. Supported Model semantics, however, states that there is an additional possible set of :Authorized nodes: all of them!

The idea behind Supported Model semantics is that there is a possible assignment of shape names to nodes, such that all assigned nodes satisfy the shape. In this case, when both :user_b and :user_c are assigned to be :Authorized, then they also both satisfy the :Authorized shape. Indeed, :user_b is authorized by an authorized user (:user_c) and similarly for :user_b. Clearly, this does not match our intention, since this would allow every user to authorize themselves!

For the other semantics, the idea in this case is to have a minimal assignment of shape names to nodes. This is commonly referred to as the minimal model semantics in logic programming literature.

To complete our discussion on positive recursion, we will talk about targeting for a moment. Consider the following addition to the data graph:

:user_b :accesses :resource1 .

It is easy to verify that this data graph does not validate under Stratified, Stable Model, and Well-Founded semantics. We can see this without knowing the definition, because none of these semantics ever assign :Authorized to :user_b.

The Supported Model semantics has two possible assignments, but which one do we look at? There are two ways to look at this, according to the literature. First, brave validation states that we will just require there to be just one assignment that assigns :user_b to be :Authorized. The other, cautious validation states that we require all possible assignments to assign :user_b to be :Authorized. Thus, to make this concrete, if we use Supported Model semantics with cautious validation it will agree with the other semantics and the data graph will not validate. Otherwise, with brave validation, Supported Model semantics dictates that the data graph validates.

Adding negation

We will add some negation. Consider the following addition to the shapes graph:

:PotentialThreat a sh:NodeShape ;
  sh:property [
    sh:path :accesses ;
    sh:minCount 1
    ] ;
  sh:not :Authorized .
:PotentialThreat sh:targetClass :BlacklistedUser .

In natural language, this shape states that a node is a :PotentialThreat if it is not :Authorized and accesses a resource. All nodes that access a resource and are not :Authorized are a :PotentialThreat. We want to ensure that all users that are a :BlacklistedUser are a :PotentialThreat.

Considering the following data graph:

:admin :approves :user_a .
:user_b :approves :user_c .
:user_c :approves :user_b .

:user_b :accesses :resource1 .
:user_c :accesses :resource1 .

:user_b a :BlacklistedUser .

which users would be assigned to be a :PotentialThreat, and which ones would be assigned to be :Authorized? The idea of minimal assignments is not going to help us here. All users will be assigned a label. They are either :Authorized or a :PotentialThreat.

Of course, the intuition is that :user_b and :user_c do not have any bussiness being :Authorized. They are only so in a self-supporting way. So, we would likely want them to be a :PotentialThreat.

One idea is to first resolve the positive recursion part of the shapes graph. So we would apply minimal models on :Authorized and afterward try to assign :PotentialThreat. This is exactly the idea behind Stratified Recursion. This only works when we look at the shapes graph in "layers" where we first resolve the positive recursion and work our way up. That is also where the name comes from "stratified" is an archaic way of saying "layered". Again, Stable Model and Well-Founded semantics agree with this approach. Only Supported Model semantics, like before, considers two possible assignments. The one where all users are :Authorized and the one where :user_b and :user_c are a :PotentialThreat.

An interesting remark here is that with Supported Model semantics neither brave nor cautious validation will always result in a satisfying validation result. Indeed, with cautious validation :user_b will not be marked a :PotentialThreat because there is an assignment where :user_b is :Authorized. With brave validation, we can do whatever we want. If :user_b is not blacklisted, it would be fine, because there is an assignment where :user_b is :Authorized! In my personal view, this excludes Supported Model semantics as a suitable semantics for recursive SHACL.

We can not always stratify easily

Indeed, we cannot always stratify easily as discussed above. Consider a new shapes graph:

:Authorized a sh:NodeShape ;
  sh:or (
    [ a sh:NodeShape ;
      sh:hasValue :admin
    ]
    [ sh:and (
        [ a sh:PropertyShape ;
          sh:path [ sh:inversePath :approves ] ;
          sh:qualifiedValueShape :Authorized ;
          sh:qualifiedMinCount 1
        ]
        [ sh:not [ a sh:PropertyShape ;
                   sh:path [ sh:inversePath :reports ] ;
                   sh:qualifiedValueShape :Authorized ;
                   sh:qualifiedMinCount 1
                 ]
        ]
     )
    ]
  ) .
:Authorized sh:targetSubjectsOf :accesses .

this shape is very similar to the :Authorized shape from before, only now, you can only be :Authorized if you are also never reported by an :Authorized user.

Note that this shape cannot be stratified as we did with the :PotentialThreat before. It directly references itself negatively. We will next look at the following data graph scenario:

:admin :approves :user_a ;
  :approves :user_b .
:user_a :approves :user_c .
:user_c :approves :user_d .

By now, you should be thinking about the natural question: what users should be :Authorized here? Clearly, all of them. There is nothing special going on as long as we do not have anyone reporting anyone. We will add the following line:

:user_b :reports :user_c .

Now, because :user_c has been reported, they are no longer :Authorized and neither is :user_d, even though :user_d has not been explicitly reported. All this the semantics agree on. But what if we add:

:user_a :reports :user_b .
:user_b :reports :user_a .

to the data graph?

The Stable Model semantics will tell you there are two possible assignments: one where :user_a is :Authorized and :user_b is not, and one where the opposite is true. Again, to know how to choose between these two assignments, we can apply the ideas of cautions and brave validation.

The Well-Founded semantics will take a different approach. Unlike all the semantics discussed before, it will not just assign shapes to nodes. In the Well-Founded semantics, assignments are three-valued. Assignments are true, false, or unknown. In this example, it will say the assignment of :Authorized for :user_a and :user_b are unknown.

Concluding thoughts

The ideas can be summarized as follows. First, Stratified Recursion is the only approach that somehow restricts the kinds of recursion you can write. The other approaches are also nicely defined on all possible shapes graphs. Second, Stable Model and Supported Model semantics consider multiple possible assignments. In these two semantics, it is useful to then distinguish cautious and brave validation. Third, the Well-Founded semantics considers three-valued assignments, but only one assignment is defined for a shapes graph, disregarding the need for the cautious and brave distinction. Finally, all semantics, except for Supported Model semantics, agree if we only consider shape graphs with only positive recursion — even on Stratified shape graphs!

In essence, the purpose of this post is to clear up the different ideas for semantics for recursive SHACL. They are all based on research in logic programming. There, they have been studies and evaluated to really correspond to peoples' intuitions. Therefore, when we apply them to SHACL, we get well-defined recursive semantics.

If I would suggest anything, I would say the proper recursive semantics for SHACL should be the Well-Founded semantics. Shapes are definitions and the Well-Founded semantics are very well suited for this. We defined the Well-Founded semantics for SHACL (and also by extension most other semantics coming from logic programming, like the ones discussed in this post) in our work Fixpoint Semantics for Recursive SHACL.

Thanks to Bart Bogaerts for the corrections and inspiring suggestions.