Big Code Science

I am pitching "Big Code Science" (my take on the mashup of mining software repositories, source-code analysis, program comprehension, etc.) to an inter-faculty audience at my university. (I am about to start an extended unpaid leave of absence to join Facebook and do work possibly a bit related to big code science.) I will just have 15min in a brown-bag setting and thus, I am going to use images, charts, and simple messages.

Title: Big Code Science

Abstract: Code Science is Data Science for code. Big Code Science is the scientific approach to accessing, analyzing, and understanding big data where the data here is code or data related to software development. There is several reasons why Big Code Science has taken off. (i) Open Source development has exploded in the last 10 years so that we have access to terabytes of source code, version history, developer communication, documentation, release infos, bug tracking info, etc.; not trying to learn from the past would be crazy. (ii) Big IT et al. corporations (Facebook, Google, IBM, Microsoft, Philips, Siemens, ...) critically depend on their super-huge code bases for their businesses to function and to develop further which is an extraordinary challenge because robustness, performance, security, maintainability, evolvability, and other critical parameters are increasingly harder to control when code bases grow; size does matter and science must come to the rescue. (iii) Machine learning, information retrieval, data mining, parallel programming, text analysis, traceability recovery, program analysis, reverse engineering and yet other relevant techniques have matured, also in the context of industrial scale software engineering so that we are definitely able to deal with big code both technically and methodologically. In this talk, I am going to look at a few topics that my research team have addressed in the context of Big Code Science over the last few years. I also hint at some challenges ahead -- some of which I also hope to look into during my appointment at Facebook.

Slideshttp://softlang.uni-koblenz.de/180322-koblenz.pdf

Acknowledgment: This is a team effort; I am grateful to these former and current students and team members:

  • Hakan Aksu (current PhD student)
  • Johannes Härtel (current PhD student) 
  • Marcel Heinz (current PhD student)
  • Rufus Linke (former diploma  student)
  • Ekaterina Pek (former PhD student)
  • Jürgen Starek (former diploma student)
  • Andrei Varanovich (former PhD student)




Comments

Popular posts from this blog

SWI-Prolog's Java Interface JPL

Software Engineering Teaching Meets LLMs

Lecture series on advanced (functional) programming concepts