In recent months I have been wondering about why Prolog or other similar logic programming implementations do not generally support (open ended) set unification. There has been a lot of studies on finite set unification, which still didn’t seem to make into standardized libraries or common idioms in logic programming, but there are only a few studies on open ended set unification. The reason I started to wonder about this is because I needed a map unification for describing type inference including records, for instance {x:int, y:bool} should be equivalent to {y:bool, x:int}. A simplified problem, ignoring the type annotations, is the set unification, where {x,y} and {y,x} should be considered equivalent.

More generally, in order to handle extensible records, we need open ended unification where the tail (or the rest) of the set is a variable. For example consider unifying a set containing at least x denoted by {x|S1} and a set containing at least y denoted by {y|S2}. We intuitively know that the most general unification for the query ?- {x|S1} = {y|S2}. is S1 = {x|S}, S2 = {y|S} where the substitution for S1 and S2 shares the same tail S. This idea of open ended set unification has already been explored and published by Frieder Stolzenburg (1996). I was able to write this down in Prolog at https://github.com/kyagrd/ExtensibleRecordsWithSetMembLP (see setunify_naive.pl and setunify.pl in the repository) and it seems to work well in Prolog implementations. Therefore, in a way, this is a solved problem. However, this kind of open ended set unification is not wired into Prolog or other logic programming implementations as a primitive built-in unification operation.

So, why haven’t open ended sets been wired in as primitive values that could be unified in logic programming paradigm? The short answer to might be that it is hard to come up with an efficient internal representation of open ended sets, especially for nested open ended sets, for instance set of sets. We can deal with unification of open ended list of open ended list (here again open ended mean that the tail could be an uninstantiated logic variable), simply by applying the recursive definition of structural unification and get the most general unifier as the first answer when executing it as a logic program. However, when we are dealing with set of sets, this becomes a bit irritating. Consider the following unification:

?- { {x|S1} | SS1 } = { {y|S2} | SS2 }.


What is the most general unifier of this unification? We can think of two candidates.

• First, try the top level open ended set unification without taking consideration that the elements are open ended sets:
SS1 = { {y|S2} | SS }, SS2 = { {x|S1} | SS }

• Second, considering that elements are open ended sets try unifying the known elements and if that unification of elements succeeds then unify its tail:
S1 = {y|S}, S2 = {x|S}, SS1 = SS2

Both are unifiers. Although the latter looks more compact, the former is actually a more “general” unifier in the traditional sense because it represents more possibilities. When we instantiate S1 and S2 in the former as the latter unifier, we effectively get the latter. That is, further instantiating the former unification by S1 = {y|S}, S2 = {x|S}, it is equivalent to the latter because duplication is ignored in sets. Still, when we are in a situation where we eventually want the latter answer, the internal representation of sets could be far from compact by following the algorithm that generates the former. So, there are may be more things to consider to implement such set unification more efficiently when we start supporting unifications of open ended sets in addition to the usual structural unification in a general purpose logic programming system. And from my experience of writing down the algorithms for set unification it is quite difficult to predict whether the algorithm will behave as the former or the latter (or even differently from either of them).

Fortunately, for the problem of type inference involving extensible records, we do not need to consider this set-of-set situation, because the field names of a record are always atomic labels (e.g., x and y in the examples above), not a set, not even a compound structure. But this problem is interesting in itself, and logic programming enthusiasts might want to solve it. Since my current priority is expressing type inference involving record types, I only need one level of set unification on the labels and then the type annotation of that label could be another record and so on. That is, there could be a pattern of alternating structural unification and set unification (which is map unification essentially) but not the set-of-set unification when handling types of records. So, I don’t think I can take an initiative to solving this set-of-set unification problem considering efficient implementation right now but happy to discuss who might have an initiative to push forward on this particular problem.

I still need a generic unification opeator for both structural values and set values for type inferenece specifications because type systems with records would have both record types and non-record types (e.g., basic types, function types, …). In the implementation (internally hacking logic programming implementation based on micorKanren), the tricky part is tying the knot to provide a generic unification operator that could apply for both structural values and set values. The implementation should starting from a structural recursion, and adding a hook that calls open ended set unification when it meets a set constructor. However, the set operations are also defined in terms of a unification operator, so carelessly tying the knot could easily run into an infinite loop. I think I am getting close to tying the knot. I was able to implement set-of-set unification by tying the knot at the membership constraint so that set unification could call set unification for its elements if needed. I havn’t gotten to the last step of tying the knot at the structural equality to invoke set unification when it meets the set constructor.