This post is the first in a series on using the logging facilities in Python. Logging is an incredibly powerful tool for developers during early system development, and also during deployment.

One of the interesting features of the logging package is an optimization where rendering of the log message is delayed as long as possible. What's not obvious is how this optimization will impact execution time.

In order to understand the consequences of how we construct the arguments to a call to a logger, we're going to run three experiments. The first experiment will explore logging performance with a single logger, the second will use two loggers, and the third will use three loggers.

Logging Configuration

In order to study only the effect of the arguments on logging performance, we use a custom logging.Handler subclass that performs no I/O, but does go through the motions of preparing the text for a log message.

1
2
3
4
5
6
7
    class FormattingNullHandler(logging.Handler):
        """Custom logging.Handler for performance measurements."""

        def emit(self, record):
            """Ensure that the record is formatted, but avoid doing I/O. This
            ensures that our expensive argument processing always occurs."""
            ignored_result = self.format(record)

Logging was configured as follows for the single handler experiment:

 1    # Configure logging
 2    logger = logging.getLogger("")
 3    logging._srcfile = None  # Disable file lookup.
 4    logging.logThreads = 0  # Disable logging thread information
 5    logging.logProcesses = 0  # Disable logging process information
 6    logging.logMultiprocessing = 0  # Disable logging multiprocess information
 7
 8    # Configure a single log handler with a minimalist formatter.
 9    handler = FormattingNullHandler()
10    formatter = logging.Formatter('%(message)s')
11    handler.setFormatter(formatter)
12    logger.addHandler(handler)

Experimental Code

 1    def exercise_driver():
 2        """Exercise the logging system, with different reporting levels."""
 3        results = {}
 4        for log_level in LOGGING_LEVELS:
 5            level = logging.getLevelName(log_level)
 6            logger.setLevel(log_level)
 7            results[level] = exercise_log()
 8        return results
 9
10
11    def exercise_log(iteration_limit=10, expensive_iterations=2500000):
12        """Run the experiment with a logger that has been externally configured."""
13        timer = Timer()
14        running_total = 0
15        iteration = 0
16        expensive_operation = ExpensiveCounter(expensive_iterations=expensive_iterations)
17
18        while iteration < iteration_limit:
19            with timer, expensive_operation:
20                for level in LOGGING_LEVELS:
21                    logger.log(level, "%s", expensive_operation)
22            running_total += timer.time()
23            iteration += 1
24
25        object_counter = expensive_operation.count
26        object_time = running_total / iteration_limit
27
28        running_total = 0
29        iteration = 0
30        while iteration < iteration_limit:
31            with timer, expensive_operation:
32                for level in LOGGING_LEVELS:
33                    logger.log(level, "%s", expensive_operation())
34            running_total += timer.time()
35            iteration += 1
36
37        invoked_counter = expensive_operation.count
38        invoked_time = running_total / iteration_limit
39
40        return {'invoked_count': invoked_counter,
41                'invoked_time': invoked_time,
42                'obj_count': object_counter,
43                'obj_time': object_time}

The experimental code makes 2 logging calls in exercise_log. The first call, on line 21, passes an expensive_operation object reference as an argument to the logging call, while the second call, on line 33, invokes the expensive_operation object, passing the result to the logging call. In both cases the logging call is made for each of the standard log levels. The exercise_log function is called for each of the standard log levels. The expensive_operation object is just a while loop that counts from 0 to expensive_iterations by 1s when it is invoked or converted to a string.

Tests

The initial test uses one log handler and has the following results:

Execution times in seconds for logging when passing an object vs a method invocation as log message parameter to a single logger with one handler.
Level Filter Obj Ref Avg. Time Method Call Avg. Time
CRITICAL 1 0.0434 5 0.2182
ERROR 2 0.0861 5 0.2177
WARNING 3 0.1294 5 0.2177
INFO 4 0.1724 5 0.2179
DEBUG 5 0.2427 5 0.2450

The first item of note is the Obj Ref column, which is the count of the number of times the object was converted to a string. In keeping with the behavior that message formatting is deferred as long as possible, we see that the string conversion only happened when the message would be rendered under the logging level configuration. When we passed the result of invoking the object, we performed the expensive operation every time, resulting in longer execution times.

The second test uses two handlers associated with a single logger and had the following results:

Execution times seconds for logging when passing an object vs a method invocation as log message parameter to a single logger with two handlers.
Level Filter Obj Ref Avg. Time Method Call Avg. Time
CRITICAL 2 0.0864 5 0.2175
ERROR 4 0.1732 5 0.2188
WARNING 6 0.2600 5 0.2202
INFO 8 0.3466 5 0.2186
DEBUG 10 0.4310 5 0.2181

The third test uses three handlers associated with a single logger and had the following results:

Execution times in seconds for logging when passing an object vs a method invocation as log message parameter to a single logger with three handlers.
Level Filter Obj Ref Avg. Time Method Call Avg. Time
CRITICAL 3 0.1295 5 0.2174
ERROR 6 0.2595 5 0.2174
WARNING 9 0.3890 5 0.2174
INFO 12 0.5198 5 0.2180
DEBUG 15 0.6494 5 0.2183

Looking at the second and third tests we see that the number of times we convert an object to a string increases with the number of handlers. The initial test might have given the impression that object references are better, but we can see that the situation is more complicated.

Discussion

As we can see from the test results, minimizing the amount of time we spend in logging calls is not simple. We need to be mindful of the number of handlers, the logging level, and the cost of preparing the arguments for the logger. In general it will be better to have the arguments created once for a logging call, avoiding the need to perform expensive operations for each handler.

The logging cookbook recommends using the logger.isEnabledFor(level) function to decide if a logging call should be made. This practice is unsatisfying as it encourages the use of expensive functions as arguments to logging calls. A better option would be to refactor your code so that the generation of arguments for logging calls is inexpensive, using the logger.isEnabledFor(level) method only when inexpensive arguments cannot be used.

An important caveat for logging calls, regardless of how computational expense is managed, is that they should not alter system state. If it is necessary to alter system state via functions that generate input for logging arguments, then those functions should be idempotent.


Published

Category

tech-musings

Contact