Lambdas Query with Reason

I will be visiting Simon Thompson and the PLAS seminar organized by Olaf Chitil at University of Kent on 15 Oct 2018.


Ralf Lämmel, Facebook London and University of Koblenz-Landau (on leave)


Lambdas Query with Reason


Much of the Big Data hype focuses on the size of data and on the use of ML/AI to get something out of the data and on the programming technologies and infrastructure to deal with size, ML, and AI. Our research focuses on a complementary problem: the ontological semantics of data and how to use it for querying data programmatically and to help programmers in the tradition of static typing.

In this talk, I present two strongly connected pieces of work: i) $\lambda_{\mathit{DL}}$ -- a lambda calculus with description logic-aware type system and means of querying semantic data ('triples'); ii) a completed language integration such that description logic and a subset of the standardized Sparql language are embedded into Scala. The integration reuses existing components -- triple store, ontological reasoner,  and Sparql query engine -- and it extends the Scala type system appropriately.

Joint Work


The research discussed here is carried out outside my mandate at Facebook, as part of my continued research affiliation with the SoftLang Team and collaboration with the WeST Institute at University of Koblenz-Landau. 

Slides of the talk


Further Reading




Message Everyone -- Startup Announcement

Please check out the date at the bottom of the post before sending more inquiries to me and others.

I have taken unpaid leave of absence from my faculty job to work on a startup idea — public codename: “Message Everyone”. Sorry for misleading colleagues and family about my career plans for the last few months.

The idea took shape more than two years ago, when I was organizing the IFL 2015 conference on campus and where I had to communicate with 2 secretaries, 3 more local staff members, and 3 student volunteers; I had to keep in touch with about 8 persons shortly before, during, and shortly after the event to ensure a smooth operation. Upfront, I was offering the following access paths: Skype, Telegram, WhatsApp, Twitter, FB Messenger,  Google Hangout, SMS (and Email and phone). This wouldn’t cover everyone’s messaging preferences and so I would need to install two more apps which I don't remember by now. 

If I ran this conference today, Signal, Viber, SnapShat, LINE, and a few other messaging apps should be on my device. I don’t really mind using different messaging apps. As long as you have a powerful phone and don’t mind recharging during the day, this is easy enough to handle. 

One thing did bother me though (and many times before and after the conference on a smaller scale). That is, you would fail to have effective group communications. Unless you are in a corporate setting, you cannot assume that everyone would be on or join one preferred network such as Slack. Thus, you would constantly spend time on copying over content from one channel to another back and forth. As the person in charge, you would act as a relay service. You would also need to check all the time in all the right places (only slightly simplified by the notification panel) that your infos arrived and are acted upon and replied to. Even worse, knowledge islands would develop because you would forward only when deemed necessary which however subsequently meant you had to contextualize forwards to account for unawareness regarding some previous posts or apparent absence of a person on a given network. What a waste of human brain power!

So our startup (Tel Aviv and London and San Francisco) is going to build the “Message Everyone” app which makes it really easy and secure to connect your existing accounts with traditional messaging services. Our app essentially serves as a messaging mediator (interceptor, forwarder). 

“Message Everyone” does not have any real users — by intention (not to be confused with the Myspace model :-)). We leverage existing accounts with classic messaging services. In particular, the “Message Everyone” app, by itself, will not feature any UI for messaging; its UI serves just the purpose of connecting your ’real’ accounts, creating groups, and some more settings (screenshot in my next post). You will use one of your preferred apps to send and receive messages. In a group communication, not everyone has to install our app. In practice, it is enough, if just one person has the app installed. However, everyone has to confirm, based on a messaging-based micro DSL, to participate in an inter-messaging service group. Privacy is the biggest challenge, but we think we have a simple solution. :-)

To better understand how it works, consider a group communication where users U1 and U2 want to talk to each other. U1 uses service S1 whereas U2 uses service S2. 

Communication succeeds if:
  • Base case: S1 = S2 and U1 and U2 are connected on S1=S2. In this case, of course, our app is of no use, but the base case is as useful as in the case primitive recursion.
  • U1 has our app installed (U2 hasn’t necessarily) and U1 and U2 are connected on S2. This is the case where U1 can use S1 effectively and relaying to S2 works locally on the phone of U1 by virtue of our app. In an extreme case, U1 has all conceivable services installed and can message all users from all services for as long as U1 is connected to a user on at least one service. (I would be U1 in the conference scenario.)
  • There is another user U3 in the group on a service S3 such that N1 and N3 as well as N2 and N3 can communicate. This is some sort of transitivity aspect of "Message Everyone". N1 would not be able to add N2 to a group communication; N1 would rely on N3 to kindly do that. Thus, groups have no manifestation beyond the fragments of connected users on any given phone.

Thus, everyone can safely message everyone else in the most heterogenous group and that’s our mission! All such messaging would be consensual and existing apps could easily add an option to suppress relayed messages, if users are concerned. (In my conference scenario, this would correspond though to some sort of riot.) 

One question that I am getting a lot is this: What’s the business model of MCS? More technical people start off with a different question: What does it take in terms of infrastructure and staff? The answer is ‘very little’. In one approach, “Message Everyone” would rely on the existing services to maintain active sessions and groups. We feel this may be too much to ask for. Also, it could trigger privacy concerns. Instead, we favor a peer-to-peer model at this point where sessions are maintained by group participants with our app being installed. Zero info about group participants is cross-posted or made available in any way to other group participants on different services or without friend relationship on the same service.

For classic services to participate we only rely on APIs for intercepting and sending messages and accessing contact lists for the apps. Everything happens on the phones. We do no rely on any server infrastructure. The existing APIs for most messaging apps do pretty much already today provide sufficient means, but we are getting in touch with all the players to tune things. We need few app developers. Most of our effort will go into setting up collaboration with existing services to ensure their smooth integration. For instance, I am visiting Facebook after Easter.

Stay tuned. 


1st April 2018


Big Code Science

I am pitching "Big Code Science" (my take on the mashup of mining software repositories, source-code analysis, program comprehension, etc.) to an inter-faculty audience at my university. (I am about to start an extended unpaid leave of absence to join Facebook and do work possibly a bit related to big code science.) I will just have 15min in a brown-bag setting and thus, I am going to use images, charts, and simple messages.

Title: Big Code Science

Abstract: Code Science is Data Science for code. Big Code Science is the scientific approach to accessing, analyzing, and understanding big data where the data here is code or data related to software development. There is several reasons why Big Code Science has taken off. (i) Open Source development has exploded in the last 10 years so that we have access to terabytes of source code, version history, developer communication, documentation, release infos, bug tracking info, etc.; not trying to learn from the past would be crazy. (ii) Big IT et al. corporations (Facebook, Google, IBM, Microsoft, Philips, Siemens, ...) critically depend on their super-huge code bases for their businesses to function and to develop further which is an extraordinary challenge because robustness, performance, security, maintainability, evolvability, and other critical parameters are increasingly harder to control when code bases grow; size does matter and science must come to the rescue. (iii) Machine learning, information retrieval, data mining, parallel programming, text analysis, traceability recovery, program analysis, reverse engineering and yet other relevant techniques have matured, also in the context of industrial scale software engineering so that we are definitely able to deal with big code both technically and methodologically. In this talk, I am going to look at a few topics that my research team have addressed in the context of Big Code Science over the last few years. I also hint at some challenges ahead -- some of which I also hope to look into during my appointment at Facebook.


Acknowledgment: This is a team effort; I am grateful to these former and current students and team members:

  • Hakan Aksu (current PhD student)
  • Johannes Härtel (current PhD student) 
  • Marcel Heinz (current PhD student)
  • Rufus Linke (former diploma  student)
  • Ekaterina Pek (former PhD student)
  • Jürgen Starek (former diploma student)
  • Andrei Varanovich (former PhD student)


Hardware lovers --- it's Christmas time!

I have enjoyed this collection long enough.

It's time to pass it on to a broader audience or more committed individuals.

Constraints for passing on stuff:
  • I like Saint-Émilion (Grand Crux specifically).
  • The stuff is located in Koblenz; I live in Bonn.
  • Pick up preferred; I can deliver to "nearby" institutional collectors.
  • Let's take photos of the hardware, collector, and me -- and post it on Instagram. 
There is this stuff:
I should also not that I have endless amounts of other legacy hardware such as phones, modems, printers, cables, and what have you. So you are encouraged to visit me in Koblenz and take stuff and leave some Saint-Émilion Grand Crux behind.


Thoughts on a very semantic wiki


101wiki started as a boring mediawiki installation to document software systems in the chrestomathy ‘101’semantic wiki extensions were quickly adopted; eventually our team developed a full blown proprietary semantic wiki sort of from scratch. Now we also rehosted it and provided it with new looks. (BTW, the 101companies brand name is now all gone. It's now just '101' really.)


The biggest mistake we (me!) made in said project ‘101’ is that we had only very loose specs for system implementation and system documentation; we had no proper process for checking and accepting contributions either. Thus, the 101wiki content was always a big mess and it still is. This problem is so serious that we switched to discouraging contributions a few years ago and rather deal with what we have and add content only when absolutely necessary. However, we depend on the 101wiki content for teaching; we also use it as a linked data hub for software language engineering-related research projects such as MetaLibMegaLib, and YAS.

With a small group of people, we are starting now a significant content and ontology-modeling push, which hopefully will lead to some islands of sanity on 101wiki. In what follows I am going to describe the rationale for what’s emerging.

Feedback more than welcome.

Semantic wiki basics

  • Typed links: Property names are used to qualify (to ‘type’) links. For instance, we use ‘sameAs’ to express that a 101wiki entity (page) is the same as some entity (page) elsewhere. Also, we use ‘uses’ to express that a contribution (a system implementation) uses some language or technology. We tend to relate to 101wiki entities (pages) to Wikipedia resources. See here for a list of 101wiki’s properties.
  • Typed pages: We organize pages in ‘namespaces’ such as 'Language', 'Technology', or 'Contribution'. We use namespace names as prefixes/qualifiers of page names. For instance, we say ‘Language:Java’ rather than ‘Java_(Programming language)’ on Wikipedia. The fact that Java is a programming language is taken care of by a semantic property. That is, Java is declared to be an instance of 'OO programming language' which is a subtype of 'Programming language'. See here for a list of 101wiki’s namespaces.
  • Bits of content management: We expect that the structure of pages can vary, in our case, depending on the namespace (the ‘type’) of page. That is, there are different sections that may be used and each type of section may come with certain expectations regarding its content. For instance, a ‘headline’ is a section that should be used by any 101wiki page while a ‘motivation’ is (currently) only expected by a page for a system 'feature'. See here for a list of 101wiki’s sections.

For instance, here is (most) of the content of 101wiki's page for the Haskell programming language:

Content for https://101wiki.softlang.org/Language:Haskell

In fact, we show the metadata section of the Haskell page separately:

Metadata for https://101wiki.softlang.org/Language:Haskell

That is, Haskell is also located on haskell.org and Wikipedia. We use 'sameAs' to express that these are all resources describing the Haskell language. There is also an 'instanceOf' property to express that Haskell is a functional programming language. 'Inbound' properties are also shown to help the user realize what other pages relate to Haskell.

Semantic wiki self-description

  • Link types are to be declared on the wiki itself: This means, in our case that, there is a type (a ‘namespace’) of properties. It also means that there are ‘meta-properties’ dealing with the properties of properties. That is, each property, just like in Semantic Web, has a domain and a range.
  • Pages types are to be declared on the wiki itself: This means, in our case, that there is a type (a ‘namespace’) of namespaces. It also means that there are ‘meta-properties’ dealing with the properties of (pages as members of) namespaces. That is, each namespace associates with mandatory and optional sections and properties. Accordingly, there is also a type (a ‘namespace’) of sections.
  • Link endpoint types are to be declared on the wiki itself: This means, in our case, that there is a type (a ‘namespace’) of types. There is basically a type for each 101wiki namespace, but there are additional types such as ‘String’ for string-typed properties, ‘URI’ for reaching out of 101wiki, and ‘Any’ to refer to the union of all 101wiki namespaces.
For instance, these are the properties for the namespace of languages:

Metadata for https://101wiki.softlang.org/Namespace:Language

That is, the namespace relates to the concept of 'software language'. Each page in the namespace, must have a 'headline' as well as a section with metadata; it may have sections 'details', 'quote', and 'illustration'. The metadata must at least exercise the 'instanceOf' property for classification. The 'exemplifiedBy' property at the bottom of the figure is a bit special; we discuss it just below.

Semantic wiki quality monitoring

Given how much messy content there is on 101wiki, given how difficult it still is to agree on semantics of page and link types, we are starting to use one magic property, ‘exemplifiedBy’, to designate 101wiki pages that are reasonably representative of a type (a namespace, a property, a section, etc.). This helps the team to consult these exemplars in trying to migrate more legacy to an emerging 'metamodel'. The metadata for the property is mind-boggling.

Metadata for https://101wiki.softlang.org/Property:exemplifiedBy

That is:

  • The page describing the property is linked to the notion of Exemplar.
  • Subjects of the property maybe a namespace, section, or property page. That is, these kinds of pages can be 'exemplified'.
  • Objects of the property maybe pages in 'any' namespace. This is a bit weakly typed because, we expect of course that an exemplar for namespace should be a page in the namespace. (So basically 101wiki's type system is not powerful enough to capture all details.)
  • It so happens that the property page for 'exemplifiedBy' itself is a feature page for the property; see 'this exemplifiedBy this'.
  • We also see how the use of the property is documented in the 'metamodel' of the namespaces namespace, section, and property. 


I take responsibility for the content mess on 101wiki, but I like to acknowledge some people who have contributed or are contributing to 101 in a significant way, despite my epic failure. Hopefully this acknowledgment will not be used against them :-)

  • Andrei Varanovich (former developer and content author)
  • Thomas Schmorleiz (former developer)
  • Kevin Klein (the incredible current developer)
  • Marcel Heinz (current content author and ontologist)
  • Johannes Härtel (current content author and data miner)
  • Hakan Aksu (current content author and educator)
  • Wojciech Kwasnik (the team's logo artist acknowledged here)

The logo of '101': it hints at the Tower of Babel and how the project illuminates hopefully the knowledge area of software languages, technologies, and concepts on the grounds of an advanced chrestomathy approach .



Peano goes Maybe

Just for the fun of it, let's represent Nats as Maybies in Haskell.

import Prelude hiding (succ)
-- A strange representation of Nats
newtype Nat = Nat { getNat :: Maybe Nat }
-- Peano zero
zero :: Nat
zero = Nat Nothing
-- Peano successor
succ :: Nat -> Nat
succ = Nat . Just
-- Primitive recursion for addition
add :: Nat -> Nat -> Nat
add x = maybe x (succ . add x) . getNat
-- Convert primitive Int into strange Nat
fromInt :: Int -> Nat
fromInt 0 = Nat Nothing
fromInt x = succ (fromInt (x-1))
-- Convert strange Nat into primitive Int
toInt :: Nat -> Int
toInt = maybe 0 ((+1) . toInt) . getNat
-- Let's test
main = print $ toInt (add (fromInt 20) (fromInt 22))

I wrote this code in response to a student question, whether and, if so, how one could code recursive functions on maybies. This inspired me towards the exam question as to how the above code compares to more straightforward code which would uses an algebraic datatype with Zero and Succ constructors instead of maybies.


An ontological approach to technology documentation

SE talk at Chalmers, Gothenburg, Sweden

An ontological approach to technology documentation

Room 473 / Wed March 1 - 11:00 - 12:00 

Speaker: Ralf Lämmel, University of Koblenz-Landau

Abstract: In this talk, I am going to present an ontological approach to software technology documentation. That is, usage scenarios of a technology (such as an object/relational mapper, a web-application framework, or a model transformation) are captured in terms of the involved entities (e.g., artifacts, languages, abstract processes, programming paradigms, functions, and function applications) and the relationships between them (e.g., membership, conformance, transformation, usage, and reference). I am going to discuss language and tool support for and experiences with developing such technology documentation. In the SoftLang team at Koblenz, we work on the related but broader notion of "linguistic software architecture" or "megamodeling". I will briefly discuss applications of megamodeling other than technology documentation, namely build management and regression testing. More information: http://www.softlang.org/mega

Slidesin preparation