Webpage of Jan Claes | Achievements

What is process mining?

Visually explained

Making models

The world is complex. Therefore the structure (who, what, where) of many organizations is complex. Therefore the processes (how and in which order) in many organizations are complex. When managers think and deliberate about problems in organizations they like to rely on schemes. A clear picture can summarize the complex structure and the processes of an organization in a clear way. All kinds of drawings that describe the structure or the processes of an organization are called models. I concentrate mainly on process models: the drawings that describe a series of followed actions or actions to be followed. To be correct, I have to admit that a model isn't always a drawing. It is a representation of reality and therefore it could also be a piece of text. But mostly a model is a graphical representation.
The world is complex, so we make models

Generating models

As I already said. The world is complex. Therefore the structure of many organizations is complex. Therefore the processes in many organizations are complex. And therefore many process models are wrong. They are incomplete, not up to date or sometimes just wrong. The reason is that models are mostly made by people and people mostly make mistakes in complex situations (at least I do). Therefore a technique is developed to generate process models automatically by a computer program. In this way faster, better and more complete process models are made. This technique is called process mining. Many computer programs log information about their actions. This information could be useful to analyze afterwards where things went wrong. But this information can also be used to reveal events and their order. We make a nice drawing of these events and end up with a process model.
Programs log information, so we make better models

From log files

The files where computer programs keep ('log') such information are called computer log files. It works just like a black box in a plane or the log book on a ship where the captain occasionally writes down the condition of the ship and where also every incident is logged. In the same way computer programs log what they do and what goes wrong. Mostly this information contains a timestamp, a username and a message.
Black box in plane, log book on ship, log file in organization

How does it work?

Let me try to explain the essential way in which process mining works. How does this information get converted to process models? You should realize that normally a process is executed multiple times. In the context of a ship there are multiple journeys, in a sales process there are multiple sales, in a work application process there are multiple applications, etcetera. In all log files we mostly find the same order of events multiple times (from embarkment to debarkation, from sales offer to booking invoice payment, from writing a job offer to the new employee's first day). In this way we discover a fixed order of events in a process and we can make a drawing with a square per step and edges between the squares. Note that we search repeating orders of steps without needing to know what the steps mean.

Sometimes we discover in one execution a certain order of steps and at other times we observe a different order. But if always only one of these two possible ways is followed, we can add this to our drawing. We then draw multiple ways from start to end. Sometimes real exceptions are logged and we should consider if we would like to include them in our drawing. Sometimes there are errors in the logging itself. We should try to avoid that these errors influence our drawing. Those are just some - but important - focus points of process mining research.

Combining log files

Not everything in an organization is supported by the same computer program. A sales offer mail is not sent with the same program where you book an invoice. The invoice is made with yet another program. Because the different steps in a process are mostly supported by different computer programs, you probably wouldn't find all information in one log file. Of course there is a trend of integrating as much as possible in the same system. Some software offers a wide range of possibilities (e.g. SAP). Yet there will always be steps that are recorded by other programs (e.g. mailing?) or that will even not be logged by a program (e.g. telephone calls?). To get the maximum out of our technique, I developed techniques to combine the information of multiple log files (from multiple programs) into one more complete process model.
My research: merging log files

Challenges

This is of course not that easy. There's no uniform way of making computer log files and these log files aren't made with the purpose of process mining. Each computer program logs in its own way, with its own focus, with its own goals and by consequence with other information. Merging this information from different sources is not trivial. The computer programs not only probably use a different name for the same thing (e.g. user, person, owner, ...), there's the possibility that logging happens on different levels (e.g. one program logs an account being created, another program mentions appointment of a user name, password and e-mail address). Information also can partially or fully overlap (e.g. accounting keeps records of payment of an invoice, but you can find this also on bank account extracts).

Videos

Researcher