infer a type for common fields in two records

1.3k views Asked by At

Bear with me if this is a foolish question. How can I type a generic function that takes two records and returns an array of their common fields?

Let's say I have:

type A = { name :: String, color :: String }
type B = { name :: String, address :: Address, color :: String }

myImaginaryFunction :: ???
-- should return ["name", "color"] :: Array of [name color]

I want to write a function that takes ANY two types of records and return an array of common fields. A haskell solution would work as well.

4

There are 4 answers

2
K. A. Buhr On BEST ANSWER

To express two record types with common fields in Haskell, you'll need a GHC extension:

{-# LANGUAGE DuplicateRecordFields #-}

and to introspect the names of the fields, you'll need generics based on the Data class:

{-# LANGUAGE DeriveDataTypeable #-}
import Data.Data ( Data, Typeable, DataRep(AlgRep), dataTypeRep
                 , dataTypeOf, constrFields)
import Data.List (intersect)
import Data.Proxy (Proxy(..), asProxyTypeOf)

This will allow you to define two data types using the same field names:

data Address = Address String deriving (Typeable, Data)
data A = A { name :: String, color :: String }
    deriving (Typeable, Data)
data B = B { name :: String, address :: Address, color :: String}
    deriving (Typeable, Data)

and then you can retrieve the field names using:

fieldNames :: (Data t) => Proxy t -> [String]
fieldNames t = case dataTypeRep $ dataTypeOf $ asProxyTypeOf undefined t of
  AlgRep [con] -> constrFields con

and get the common fields with:

commonFields :: (Data t1, Data t2) => Proxy t1 -> Proxy t2 -> [String]
commonFields t1 t2 = intersect (fieldNames t1) (fieldNames t2)

After which the following will work:

ghci> commonFields (Proxy :: Proxy A) (Proxy :: Proxy B)
["name", "color"]
ghci>

Note that the implementation of fieldNames above assumes that only record types with a single constructor will be introspected. See the documentation for Data.Data if you want to generalize it.

Now, because you're a help vampire, I know that you will demand a type level function, even though you said nothing in your question about requiring a type-level function! In fact, I can see you've already added a comment about how you're interested in somehow returning an array of name | color though no such thing exists in Haskell and even though you explicitly said in your question that you expected the term-level answer ["name", "color"].

Still, there may be non-vampires with a similar question, and perhaps this answer will help them instead.

4
Fyodor Soikin On

Well, since your function really returns an array of string, then the return type should just be Array String.

Types of arguments would be genetic, since you don't know types in advance. If you really want to make sure that these types are actually records, you can make your generic parameters not records themselves, but type rows, and then type value parameters as Record a.

So:

myImaginaryFunction :: forall a b. Record a -> Record b -> Array String

This is how you type such function.

Or was your question really about how to implement it?

Also: have you noticed how cheating (by adding Haskell tag) didn't really bring you any help, but only some scolding? Please don't do this. Respect the community.

3
kakigoori On

For Haskell, I like K.A. Buhr's answer, but personally I would not use Typeable and instead reach for GHC Generics. I think that might be preference at this point though.

For PureScript, I wrote about this kind of problem in my blog post Making Diffs of differently-typed Records in PureScript earlier this month. The approach is completely different from what you have with languages that don't have row types (No, Elm does not have these. You really don't get a solution there other than to use a homogeneous String Map).

First off, if you're at all familiar with PureScript, you might want to use Union, but this won't work either, as you'd want to do something like:

Union r1' r r1

Where r1' would be the complement of the shared subtype r between your first record r1 and r2. The reason being that you have two unsolved variables here, and the functional dependencies of Union require that any two of three parameters of Union be solved for.

So since we can't use Union directly, we'll have to craft some kind of solution. Since I can get a RowList structure that is sorted by keys, I opted to use this to walk through the two different records' RowLists and get out the intersection:

class RowListIntersection
  (xs :: RowList)
  (ys :: RowList)
  (res :: RowList)
  | xs ys -> res

instance rliNilXS :: RowListIntersection Nil (Cons name ty tail) Nil
instance rliNilYS :: RowListIntersection (Cons name ty tail) Nil Nil
instance rliNilNil :: RowListIntersection Nil Nil Nil
instance rliConsCons ::
  ( CompareSymbol xname yname ord
  , Equals ord EQ isEq
  , Equals ord LT isLt
  , Or isEq isLt isEqOrLt
  , If isEq xty trashty yty
  , If isEq xty trashty2 zty
  , If isEq (SProxy xname) trashname (SProxy zname)
  , If isEq
      (RLProxy (Cons zname zty res'))
      (RLProxy res')
      (RLProxy res)
  , If isEqOrLt
      (RLProxy xs)
      (RLProxy (Cons xname xty xs))
      (RLProxy xs')
  , If isLt
      (RLProxy (Cons xname yty ys))
      (RLProxy ys)
      (RLProxy ys')
  , RowListIntersection xs' ys' res'
  ) => RowListIntersection (Cons xname xty xs) (Cons yname yty ys) res

Then I used a short definition for getting the keys of the resulting RowList out:

class Keys (xs :: RowList) where
  keysImpl :: RLProxy xs -> List String

instance nilKeys :: Keys Nil where
  keysImpl _ = mempty

instance consKeys ::
  ( IsSymbol name
  , Keys tail
  ) => Keys (Cons name ty tail) where
  keysImpl _ = first : rest
    where
      first = reflectSymbol (SProxy :: SProxy name)
      rest = keysImpl (RLProxy :: RLProxy tail)

So together, I can define a function like so to get the shared labels:

getSharedLabels
  :: forall r1 rl1 r2 rl2 rl
  . RowToList r1 rl1
  => RowToList r2 rl2
  => RowListIntersection rl1 rl2 rl
  => Keys rl
  => Record r1
  -> Record r2
  -> List String
getSharedLabels _ _ = keysImpl (RLProxy :: RLProxy rl)

Then we can see results we expect:

main = do
  logShow <<< Array.fromFoldable $
    getSharedLabels
      { a: 123, b: "abc" }
      { a: 123, b: "abc", c: true }
  -- logs out ["a","b"] as expected

If you're new to RowList/RowToList, you might consider reading through my RowList Fun With PureScript 2nd Edition slides.

I put the code for this answer here.

If all this seems too involved, your other solution may be to coerce the records into a String Map and get the set union of the keys. I don't know if this is an answer in Elm though, since the runtime representation of a String Map probably does not match Record's. But for PureScript, this is one option as StrMap's runtime representation is the same as a Record.

0
K. A. Buhr On

Actually, after giving this some more thought, I guess it is possible to do what you actually want to do in modern Haskell, if what you actually want to do is work with a record type with named fields at the type level, including doing things like compile-time derivation of a new record type using the common fields from two other records.

It's a little involved and a little ugly, though some bits work surprisingly well. Yes, of course it's "too much ceremony for such a simple task", but bear in mind that we're trying to implement a brand new, non-trivial, type-level feature (a sort of dependent structural typing). The only way to make this a simple task is to bake the feature into the language and its type system from the beginning; otherwise, it's going to be complicated.

Anyway, until we get the DependentTypes extension, you have to enable a small number (ha ha) of extensions explicitly:

{-# LANGUAGE AllowAmbiguousTypes       #-}
{-# LANGUAGE GADTs                     #-}
{-# LANGUAGE KindSignatures            #-}
{-# LANGUAGE ScopedTypeVariables       #-}
{-# LANGUAGE TemplateHaskell           #-}
{-# LANGUAGE TypeApplications          #-}
{-# LANGUAGE TypeFamilies              #-}
{-# LANGUAGE TypeInType                #-}
{-# LANGUAGE TypeOperators             #-}
{-# LANGUAGE UndecidableInstances      #-}
{-# OPTIONS_GHC -Wincomplete-patterns  #-}

module Records where

We'll make a fair bit of use of the singletons package and its submodules: the Prelude for basic type-level functions like Map, Fst, and Lookup; the TH module for generating our own singleton and promoted functions with Template Haskell splices; and TypeLits for working with the Symbol type (i.e., string literals at the type level).

import Data.Singletons.Prelude
import Data.Singletons.TH
import Data.Singletons.TypeLits

We'll also need some other odds and ends. Text is only needed because it's the unlifted ("demoted") version of Symbol.

import Data.Function ((&))
import Data.Kind (Type)
import Data.List (intersect)
import qualified Data.Text as Text

We won't be able to work with usual Haskell records. Instead, we'll define a Record type constructor. This type constructor will be indexed by a list of (Symbol, Type) pairs, where the Symbol gives the field name, and the Type gives the type of the value stored in that field.

data Record :: [(Symbol, Type)] -> Type where

Already, there are several major implications to this design decision:

  • The same field name in different record types can refer to different field value types.
  • Fields are ordered in the record, so record types are only the same if they have the same fields, with the same types, in the same order.
  • The same field can appear multiple times in a record, even though the accessor function we provide will only access one (the last added).

In dependently typed programs, design decisions tend to run deep. If, for example, the same field could not appear multiple times, we would need to find a way to reflect that in the type and then make sure that all our functions were able to supply appropriate proof that a duplicate field wasn't being added.

Anyway, back to our Record type constructor. There will be two data constructors, a Record constructor to create an empty record:

  Record :: Record '[]

and a With constructor to add a field to a record:

  With :: SSymbol s -> t -> Record fs -> Record ('(s, t) : fs)

Note that With requires a runtime representative for the s :: Symbol in the form of a symbol singleton SSymbol s The convenience function with_ will make this singleton implicit:

with_ :: forall s t fs . (SingI s) => t -> Record fs -> Record ('(s, t) : fs)
with_ = With sing

with the idea that by allowing ambiguous types and using type application, we expose the following reasonably succint syntax for defining records. The explicit type signatures are not necessary here but are included to make it clear what's being created:

rec1 :: Record '[ '("bar", [Char]), '("foo", Int)]
rec1 = Record & with_ @"foo" (10 :: Int)
              & with_ @"bar" "Hello, world"
-- i.e., rec1 = { foo = 10, bar = "Hello, world" } :: { foo :: Int, bar :: String }

rec2 :: Record '[ '("quux", Maybe Double), '("foo", Int)]
rec2 = Record & with_ @"foo" (20 :: Int)
              & with_ @"quux" (Just 1.0 :: Maybe Double)
-- i.e., rec2 = { foo = 20, quux = Just 1.0 } :: { foo :: Int, quux :: Maybe Double }

To prove that this record type is useful, we'll define a type-safe field accessor. Here's one that uses an explicit singleton to select the field:

field :: forall s t fs . (Lookup s fs ~ Just t) => SSymbol s -> Record fs -> t
field s (With s' t r)
  = case s %:== s' of
      STrue -> t
      SFalse -> field s r

and a helper with an implict singleton:

field_ :: forall s t fs . (Lookup s fs ~ Just t, SingI s) => Record fs -> t
field_ = field @s sing

which is intended to be used with a type application like so:

exField = field_ @"foo" rec1

Note that trying to access a nonexistent field won't type-check. The error message isn't ideal, but at least it's a compile-time error:

-- badField = field_ @"baz" rec1  -- gives: Couldn't match type Nothing with Just t

The definition of field gives a hint of the power of the singletons library. We're using the type-level Lookup function which has been automatically generated via Template Haskell from a term-level definition that looks exactly like the following (taken from the singletons source and renamed to avoid conflicts):

lookup'                  :: (Eq a) => a -> [(a,b)] -> Maybe b
lookup' _key []          =  Nothing
lookup'  key ((x,y):xys) = if key == x then Just y else lookup' key xys

Using only the context Lookup s fs ~ Just t, GHC is able to determine that:

  1. Because the context implies this field will be found in the list, the second argument of field can never be the empty record Record, so there is no warning about incomplete patterns for field, and in fact you'll get a type error if you try to handle this as a runtime error by adding the case: field s Record = error "ack, something went wrong!"

  2. The recursive call to field is type-correct if we're in the SFalse branch. That is, GHC has figured out that if we can successfully Lookup the key s in the list but it's not at the head, we must be able to look it up in the tail.

(This is amazing to me, but anyway...)

Those are the basics of our record type. To introspect field names at either runtime or compile time, we'll introduce a helper, which we'll lift to the type level (i.e., the type-level function Names) using Template Haskell:

$(singletons [d|
  names :: [(Symbol, Type)] -> [Symbol]
  names = map fst
  |])

Note that the type-level function Names can provide compile-time access to the field names of the record, for example in the hypothetical type signature:

data SomeUIType fs = SomeUIType -- a UI for the given compile-time list of fields
recordUI :: Record fs -> SomeUIType (Names fs)
recordUI _ = SomeUIType

More likely, though, we'll want to work with field names at runtime. Using Names, we can define the following function to take a record and return its list of field names as a singleton. Here, SNil and SCons are the singleton equivalents of the terms [] and (:).

sFields :: Record fs -> Sing (Names fs)
sFields Record = SNil
sFields (With s _ r) = SCons s (sFields r)

And here's a version that returns a [Text] instead of a singleton.

fields :: Record fs -> [Text.Text]
fields = fromSing . sFields

Now, if you just want to get a runtime list of common fields of two records, you can do:

rec12common = intersect (fields rec1) (fields rec2)
-- value:  ["foo"]

What about creating a type with common fields at compile time? Well, we can define the following function to get the left-biased set of fields with common names. (It's "left-biased" in the sense that if matching fields in two records have different types, it'll take the type of the first record.) Again, we use the singletons package and Template Haskell to lift it to a type-level Common function:

$(singletons [d|
  common :: [(Symbol,Type)] -> [(Symbol,Type)] -> [(Symbol,Type)]
  common [] _ = []
  common (x@(a,b):xs) ys
    = if elem a (map fst ys)
      then x:common xs ys
      else   common xs ys
  |])

This allows us to define a function that takes two records and reduces the first record to the set of fields with the same name as fields in the second record:

reduce :: Record fs1 -> Record fs2 -> Record (Common fs1 fs2)
reduce Record _ = Record
reduce (With s x r1) r2
  = case sElem s (sFields r2) of STrue  -> With s x (reduce r1 r2)
                                 SFalse -> reduce r1 r2

Again, the singletons library is really remarkable here. I'm using my automatically generated Common type-level function together with the singleton-level sElem function (which is automatically generated within the singletons package from a term-level definition of the elem function). Somehow, through all this complexity, GHC can figure out that if sElem evaluates to STrue, I must include s in the list of common fields, while if it evaluates to SFalse, I must not. Try fiddling with the case results on the right-hand side of the arrows -- you can't get them to type check if you get them wrong!

Anyway, I can apply this function to my two example records. Again, the type signature is not needed but is given to show what's being produced:

rec3 :: Record '[ '("foo", Int)]
rec3 = reduce rec1 rec2

Like any other record, I have runtime access to its field names and compile-time type-checking of field access:

-- fields rec3           gives  ["foo"], the common field names
-- field_ @"foo" rec3    gives  10, the field value for rec1

Note that, in general, reduce r1 r2 and reduce r2 r1 will return not just different values, but different types if the order and/or types of the commonly names fields differ between r1 and r2. Changing this behavior would probably require revisiting those early and far-reaching design decisions I mentioned earlier.

For convenience, here's the entire program, tested using Stack lts-10.5 (with singletons 2.3.1):

{-# LANGUAGE AllowAmbiguousTypes       #-}
{-# LANGUAGE GADTs                     #-}
{-# LANGUAGE KindSignatures            #-}
{-# LANGUAGE ScopedTypeVariables       #-}
{-# LANGUAGE TemplateHaskell           #-}
{-# LANGUAGE TypeApplications          #-}
{-# LANGUAGE TypeFamilies              #-}
{-# LANGUAGE TypeInType                #-}
{-# LANGUAGE TypeOperators             #-}
{-# LANGUAGE UndecidableInstances      #-}
{-# OPTIONS_GHC -Wincomplete-patterns  #-}

module Records where

import Data.Singletons.Prelude
import Data.Singletons.TH
import Data.Singletons.TypeLits
import Data.Function ((&))
import Data.Kind (Type)
import Data.List (intersect)
import qualified Data.Text as Text

data Record :: [(Symbol, Type)] -> Type where
  Record :: Record '[]
  With :: SSymbol s -> t -> Record fs -> Record ('(s, t) : fs)

with_ :: forall s t fs . (SingI s) => t -> Record fs -> Record ('(s, t) : fs)
with_ = With sing

rec1 :: Record '[ '("bar", [Char]), '("foo", Int)]
rec1 = Record & with_ @"foo" (10 :: Int)
              & with_ @"bar" "Hello, world"
-- i.e., rec1 = { foo = 10, bar = "Hello, world" } :: { foo :: Int, bar :: String }

rec2 :: Record '[ '("quux", Maybe Double), '("foo", Int)]
rec2 = Record & with_ @"foo" (20 :: Int)
              & with_ @"quux" (Just 1.0 :: Maybe Double)
-- i.e., rec2 = { foo = 20, quux = Just 1.0 } :: { foo :: Int, quux :: Maybe Double }

field :: forall s t fs . (Lookup s fs ~ Just t) => SSymbol s -> Record fs -> t
field s (With s' t r)
  = case s %:== s' of
      STrue -> t
      SFalse -> field s r

field_ :: forall s t fs . (Lookup s fs ~ Just t, SingI s) => Record fs -> t
field_ = field @s sing

exField = field_ @"foo" rec1
-- badField = field_ @"baz" rec1  -- gives: Couldn't match type Nothing with Just t

lookup'                  :: (Eq a) => a -> [(a,b)] -> Maybe b
lookup' _key []          =  Nothing
lookup'  key ((x,y):xys) = if key == x then Just y else lookup' key xys

$(singletons [d|
  names :: [(Symbol, Type)] -> [Symbol]
  names = map fst
  |])

data SomeUIType fs = SomeUIType -- a UI for the given compile-time list of fields
recordUI :: Record fs -> SomeUIType (Names fs)
recordUI _ = SomeUIType

sFields :: Record fs -> Sing (Names fs)
sFields Record = SNil
sFields (With s _ r) = SCons s (sFields r)

fields :: Record fs -> [Text.Text]
fields = fromSing . sFields

rec12common = intersect (fields rec1) (fields rec2)
-- value:  ["foo"]

$(singletons [d|
  common :: [(Symbol,Type)] -> [(Symbol,Type)] -> [(Symbol,Type)]
  common [] _ = []
  common (x@(a,b):xs) ys
    = if elem a (map fst ys)
      then x:common xs ys
      else   common xs ys
  |])

reduce :: Record fs1 -> Record fs2 -> Record (Common fs1 fs2)
reduce Record _ = Record
reduce (With s x r1) r2
  = case sElem s (sFields r2) of STrue  -> With s x (reduce r1 r2)
                                 SFalse -> reduce r1 r2

rec3 :: Record '[ '("foo", Int)]
rec3 = reduce rec1 rec2
-- fields rec3           gives  ["foo"], the common field names
-- field_ @"foo" rec3    gives  10, the field value for rec1