More than you ever wanted to know about grammar-based testing

Preamble: Ever since 1999 +/- 100 years, I have been working (sporadically, intensively) on grammar-based testing. The latest result was our SLE'11 paper on grammar comparison (joint work with Bernd Fischer and Vadim Zaytsev). I have tried previously to compile a comprehensive slide deck on grammar-based testing, also with coverage on this blog, but this was relatively non-ambitious. With the new SLE'11 results at hand and with the firm goal of pushing grammar-based testing more into CS education (in the context of both formal language theory and software language engineering), I have now developed an uber-comprehensive slide deck with awesome illustrations for the kids. If you are reading this post ahead of the lecture, if you are still planning to attend, then you are well advised to bring brains and coffee. You may also bring a body bag, in case you pass out or worse. As it happens, this is "too much stuff" for a regular talk, lecture, or any reasonable format for that matter. I will run a first "user study" on this slide deck in a class on formal language theory in Omaha this Thursday; thanks to Victor Winter's trust in the survivability of this stuff, or why would he share his class with me otherwise? As a last resort and an exercise in adaptive talking, I am just going to skip major parts based on (missing) background of my audience. To summarize, if I get under the bus today, then all the grammar-based testing stuff is documented for mankind. (That's what Victor said.)

Title of the lecture: Quite an introduction to grammar-based testing

Slides of the lecture: [.pdf]

Elevator pitch for the lecture: Grammars are everywhere; resistance is futile. (More verbosely: If it feels like a grammar (after due consideration and subject to a Bachelor+ degree in CS), then it's probably just one. Just because some grammars mask themselves as object models, schemas, ducks, and friends, you should not move over to the dark side.) Seriously, non-grammars are cool, but life is short, so we need to focus. (I am sort of focusing on grammars and I am not even @grammarware.) Now, even grammars and grammar-based software components have problems, and testing may come to rescue. Perhaps, you think you know what's coming, but you don't have a clue.

Abstract of the lecture: Depending on where you draw the line, grammar-based testing was born, proper, in 1972 with Purdom's article on sentence generation for testing parsers. Now, computer scientists were really obsessed with parsers and compilers in the last millenium and much work followed in the seventies, eighties, and early nineties. Burgess' survey on the automated generation of test cases for compilers summarized this huge area in 1994. Why would you want to test a compiler: it could suffer from regressions along evolution; it could be different than another compiler that serves as reference; it could fail to comply with the language specification (perhaps even the grammar in there); it could break when being stressed; it could simply miss some important case. Non-automated testing really does not suffice in these cases. You cannot possibly (certainly not systematically) test a grammar-based software component other than by generating test data (in fact, test cases) automatically, unless the underlying grammar is trivial. Grammar-based testing suddenly becomes super-important, when much software turns out to be grammar-based (other than parsers and compilers): virtual machines, de/-serialization frameworks, reverse and re-engineering tools, embedded DSL interpreters, APIs, and what have you. Such promotion of grammar-based testing to the horizon of software engineering was perhaps first pronounced by Sirer and Bershad's paper on using grammars for software testing in 1999. Grammar-based testing is not straightforward, by all means, in several dimensions. For instance, coverage criteria for test-data generation must be convincing in terms of "covering all the important cases" and "scaling for non-trivial grammars". Also, all the forms of grammars in practice are "impure" more often than not; think of semantic constraints represented in different ways. Related to the matter of semantics, any automated test-data generation approach relies on an automatic oracle, and getting such an oracle is never easy. This lecture is going to present a certain view on grammar-based testing, which is heavily influenced by the speaker's research and studies. In addition to the speaker's principle admiration of grammars and grammar-based software, the reason for such obsession with grammar-based testing is that this domain is so exciting in terms of combining formal language theory, (automated) software engineering, and declarative programming. This lecture is an attempt to convey important techniques and interesting challenges in grammar-based testing.

Bio of the speaker: As earlier this week. (Nothing much has happened very recently.)

Acknowledgement: The presented work was carried out over several years in collaboration with (in alphabetical order) Bernd Fischer (University of Southampton, UK), Jörg Harm (akquinet AG, Hamburg, Germany), Wolfram Schulte (MSR, Redmond, WA, USA), Chris Verhoef (Vrije Universiteit, Amsterdam, NL), Vadim Zaytsev (CWI, Amsterdam, NL)

Related papers by the speaker (and collaborators):

Related patent:

Have fun!

Ralf

Comments

Popular posts from this blog

SWI-Prolog's Java Interface JPL

Software Engineering Teaching Meets LLMs

Lecture series on advanced (functional) programming concepts