Performance Zone is brought to you in partnership with:

Java / JavaEE Architect working in software development (architecture, design and coding) since July 2000, currently aspiring to establish herself as an independent consultant architect in keeping with changing responsibilities and age. Worked in companies like Wipro, Oracle, Siemens, IBM and Sapient. Archanaa has posted 3 posts at DZone. You can read more from them at their website. View Full User Profile

High Performance And Smarter Logging

06.29.2011
| 34486 views |
  • submit to reddit

1. Summary

Logging, often a back-burner activity in the development cycle can actually be the very lifeline of the development team once the application is in production. Assuming that readers are well acquainted with various logging frameworks like Log4J, SLF4J, etc, this article lays guidelines for 'real-world' production logging, examines its influence on the application quality attributes and also rediscovers a forgotten high performance, smarter logging technique with codes and centralized logging.

2         Introduction

Logging is often relegated to a back-burner activity while architecting, designing and developing applications or even when enhancing applications. Finally when the application is ready, you deploy it. And then what happens?

 

Your application now is out of your development environment - it is not running out of your favorite IDE workbench and much less having a debugger at your service. It is now that you realize the importance of logs. And as you wade through a huge set of logs trying to debug any problem with your application, you realize that it is not a trivial task. It is boring, tedious and time-taking. You ask yourself, with all your experience and expertise, is this what you were destined to do - finding a needle in a haystack of logs! Or possibly you off-load this activity to your sub-ordinates or an application maintenance team who equally curse you for their tedious and thankless job! Or possibly they would just throw up their hands, and you the specialized architect/designer would get involved, muttering under your breath how your skills are being wasted away.

 

Logs can be the very lifeline of your production application and should not be taken lightly or as an afterthought. Sure, most of us know the basics - that there are various logging levels which can be turned on or off at will, that there are log categories and using one's favorite logging framework - Log4J, Commons Logging or SLF4J one can direct logs to various destinations such as Files, Databases,  JMS Queues, etc.

 

But how many of us actually plan our logs? How many of us understand how logging affects other system quality attributes? How many of us continually prune logs keeping in mind how logs can impact our system and our lives once the application goes live? How many of us have used the advanced features of our favorite logging frameworks to achieve high performance logging?

This article attempts to emphasize the importance of this oft-ignored but vital aspect of an application and provides guidelines for application logging. Finally it revives the design of a forgotten logging technique that could achieve high quality logging without impacting performance and other quality attributes.

3         Logging with respect to other System Quality Attributes

3.1.1       Monitoring

Logging is the most frequent implementation of the Monitoring attribute. It helps the application developer get to the root cause of application bugs. But often it is made out to be the only implementation of Monitoring, possibly because it is the easiest thing in the world for an application developer to write a System.out.println() or logger.log() statement at various places in the code. However there is such a thing as too much logging, which can result in having useful incident logs being rolled over thus defeating the very purpose and intention of logging. Application developers would do well to find other techniques of monitoring the system, such as using or developing JMX Consoles. This would be discussed in detail in the section 'Logging for production-ready applications'.

3.1.2       Performance

Ask any performance specialist or architect and in nearly 90% of the applications, excessive logging has been found to have a very detrimental effect on performance. Logging is an I/O intensive activity and can get in the way of the application's performance, particularly when logging is done in the traditional way that is, synchronously writing to a FileAppender in the context of application threads. Not only that, it can often contribute to heavy heap consumption and garbage collection due to logging code such as -

if (logger.isDebugEnabled())

     logger.debug("name "+person.name+" age "+person.age+" address "+person.address);

Not only that, the doAppend() method that is internally called for logging to a Log4J Appender is synchronized for thread safety. What this also means is that application threads are not only doing heavy disk I/O operation synchronously, they are also blocking each other while writing to the log! In one of the performance critical situations for an e-Governance Portal, a thread dump revealed that portal threads were blocking each other for doing logging into a single file appender as the application logs were being written to a single appender file set!

AppThreadsLockedOnFileAppender

 

In fact, ascertaining the current log level of the application is the first and foremost thing that a performance specialist does, and often just changing the log level from DEBUG to INFO or WARNING gives significant performance benefits. However, after the performance benchmarking exercise or immediate scalability problem is resolved, the log level is changed back again to DEBUG by some application developer trying to find the root cause of a functional bug in the application. This is an indication of unhealthy logging practices. In 'Logging for production-ready applications' we also discuss discipline and maintaining hygiene of logs. The 'High performance and quality logging' section elaborates techniques to achieve quality logging without compromising on the performance of the application.

3.1.3       Security (auditing  and other sensitive information)

Audit logs are a special class of logs that enable security auditing of the application, and are used to track actions of users. This is an example of how logging aids security practices.

At the other extreme, however, if logs carry sensitive information such as passwords of user accounts, they may expose a system vulnerability.

Thirdly, logging application events and flow may aid developer's monitoring and debugging of an application but may also expose the internal architecture of the application which is certainly not desirable.

In today's cloud application environments, where applications may be hosted on public clouds, such vulnerabilities can pose risks to the Intellectual Property of the application owner.

3.1.4       Scalability and high availability

Logging similarly influences and is influenced by scalability and high availability. By improving performance of an application with 'High performance and quality logging' techniques, you improve scalability and availability of the application with lesser hardware, as your application is already able to pack a heavier punch with the existing resources and has lesser reasons to fail.

When an application is scaled out or is enabled for an active-passive configuration for availability, there are multiple instances of the application and the strategy of logging becomes important. Would the application support or development team like to gather logs from 10 different machines or directories or have a centralized location for collecting logs? This is where centralized distributed logging becomes essential.

3.1.5       Recoverability

Major databases like Oracle already use redo logs to ensure recoverability of transactions. Applications may also borrow a leaf and use a special class of logs for recoverability if required.

3.1.6       Error Handling and Fault tolerance

In most applications, logging is one of the, or sometimes only measure taken for error handling. In cases of recurrent errors such as unavailability of SMS / Mail server or even database for long periods of time, there is no benefit gained and even much harm incurred by logging the error repeatedly and frequently, especially with huge exception stack traces, due to increased I/O activity. In the process, one may discover that important logs that were important for analyzing a week old frustrating issue have been rolled over. This approach only makes a bad situation worse.

3.1.7       Capacity

While doing capacity sizing of an application, architects would do well to consider the size of logs that would be generated by the application during production and estimate the disk space required, the provisioning of centralized file system and so on.

For centralized logging in a distributed environment, it is also relevant to estimate the size of log objects distributed over the network to a remote machine.

4         A case study of missing the importance of logging

This section presents the experiences of the author in the form of an embedded application case study, where the importance of logging was missed during architecture, design and development phase, only to be driven home later on through bitter lessons learned the hard way.

The embedded application in question was a GPS device unit which was to be deployed on vehicles for enabling location tracking. The device did not provide any user interface apart from LEDs and a few buttons, so there would be nobody to administer the application inside the vehicles, unlike the luxury one is used to with server side applications. So if anything went wrong with the device application, how would the application developers supposed to diagnose it, when all logs were being written inside the device being conveyed around the whole of the country inside trucks?

What the application developers had naively assumed was that the Client Organization would have service engineers nearly around the whole of India who could take down the device and bring it to them for analysis. That being not possible, one of the developers had to accompany the service engineers to copy the logs by logging into the device operating system. You can imagine what an effect those late night sojourns had on his health and state of mind.

The next thing done by the application developers was to build a log download button in the desktop servicing application for the device. The service engineers would carry the desktop servicing application in laptops and download the logs from the devices. This was at least an improvement, that the device would not have to be taken off the vehicle and there was respite for the poor developer who was earlier accompanying the service engineers to remote places at ungodly hours. However, soon it became clear that the service engineers also could not travel to remote locations all the time and they were demanding that it should be the application development team's lookout to download the logs.

Finally, the development team had to enhance the device application so that it could send its logs to the backend server on demand over wireless GPRS just like it was sending other tracking data.

The point to be noted here was that all this extra effort had certainly not been captured during requirements gathering or budgeted for during initial estimation and development. The development team had a typical customer functional requirement driven mind-set. Application logging was not a customer-driven requirement and was not a prominent non-functional attribute and the novice developers did not have the foresight to convince their equally short-sighted seniors/managers to give them time to build such a facility as a budgeted effort. And when dozens of those devices went live (with 100s more waiting at the store) and they met all the above problems, they had to burn midnight oil while their managers and clients breathed angrily down their necks. Such a thankless job!

5         Logging for production-ready applications

The previous section on 'Logging with respect to other System Quality Attributes' we have already come across some issues that are faced in production environment. Let us enlist them and more such dos and don'ts in this section, which would make an application production ready.

·         Discipline and code review of logs

Discipline is of prime importance and this single step would pave the path for other best practices discussed in this article. There are many production systems having frivolous logs such as 'Hi', 'Came over here', 'Done', 'XXXYYYYZZZZ'. Most probably these were introduced during debugging of the application or even during unit testing in the development phase of the application.

The excuse given for such logs is that they would be at the debug level and could easily be turned off. However, in practice, this is rarely done and what is more, there are other logs at debug level which are not so frivolous. To control such logs, one would require extremely fine-grained and careful level of configuration of the logging framework and this is quite often missed out in production.

Code reviews have to be more effective and senior developers or team leads reviewing the logs must be firm to get less useful logs removed even on the face of defiant comments such as - 'I have just done it for testing', 'What's the brouhaha about the whole thing; it is just a simple log'. It would be well to have a rule that logs should not be the means of debugging basic problems during development – for that we have debuggers in IDEs!

·         Categorize logs with the same care as you would do for modularizing your application

This seems an obvious practice but it is surprising how many times it has been seen in the production environment that application logs are being written with the Application Server's SystemOut or SystemError files, or as discussed previously, the case where an e-Governance Portal's threads were blocking one another, contending for a single FileAppender and in turn a single file I/O.

At the very least, developers are expected to have package-based or fully qualified class naming convention for obtaining Loggers and possibly use different Appender destinations for different classifications of logs.

It would be well to keep in mind that the log level during production is supposed to be WARNING or depending on the information in logs, at INFO level.

An effective technique would be to list down all the possible logs in a central configuration or constants classes and allow developers to only use those logs. A more effective technique is discussed in 'The Design of High Performance Smarter Logging'.

·         Logging in a clustered and distributed environment

Nearly all production server-side applications have to be clustered and distributed for scalability and/or availability. In a clustered environment, logs should reflect the components, modules, subsystems and even process instances from where they emerged.

There is great benefit of having a centralized log server in a distributed and clustered environment to prevent the hassle of collecting logs from multiple directories and machines. This would also have the extremely beneficial side effect of moving the I/O to a separate machine so that the performance of the application is not impeded by logging I/O.

·         Logging in thick distributed clients or embedded applications

As illustrated in the anecdote covered in section ‘A case study of missing the importance of logging’, the logs in thick clients or embedded applications are not easily accessible to the development team to aid their analysis of defects. The mechanism of transferring the logs remotely should be well thought out and developed with proper inclusion in the project plan.

·         Using Mapped Diagnostic Context and Nested Diagnostic Context

Log4J's MappedDiagnosticContext (MDC) and NestedDiagnosticContext (NDC) use ThreadLocal storage to store context specific information. They can be used to store information such as user name or transaction id to identify all operations done by the particular user or transaction. This obviates the need to pass context specific information across classes and methods just for the purpose of logging. With PatternLayout's %x or %X{key}, the stored value would be rendered in the log.

·         Planning life-cycle and maintenance of logs

This includes planning the size of each log file and the maximum number of files before they get rolled over. This is a no-brainer, but often occurs - the size of the log file should not be so huge that it can’t be opened with a text editor! Just as there are scripts to take regular database backups, there should be scripts to backup and archive logs as well. It is a good idea to zip the log files in case there are limits of disk space, which would also make them easier to transport remotely.

·         Resist the temptation to log source location information at runtime

Location information comes at the price of loss of performance as the logging framework tries to determine the current thread stack and thereby the method, filename and line number, which is a costly operation. Rather, the log information in itself should suffice to find out the source of the log by giving information of the server, subsystem, module, component, thread and so on.

·         Avoid recording errors with long stack traces repeatedly

 If possible, there should be enough information in the log to indicate where the error has occurred and huge stack traces should be avoided as far as possible. Of course, this is not a mandate especially for exceptions such as NullPointerException. But it can be done for some specific application errors which can be easily identified. Also, when there is a recurrent problem that stays for a long duration such as connectivity issue with Email, SMS servers or Database Servers also, it would be well to record the issue every 5 minutes or so rather than filling up the logs with huge stack traces every few seconds.

·         Do not use AOP indiscriminately for injecting logs, especially in production

The most basic textbook AOP example given to a newbie is for logging. The argument is that logging is a cross-cutting concern and one can use the powers of AOP to inject logs before entering or after exiting a method. This example or argument should be seriously considered before applying in production worthy applications. As has already been established above,  logging is no trifling matter but which deserves planning like most other Non Functional Requirements (NFRs).

·         Do not use logging as a replacement for other monitoring strategies

One of the most bizarre cases of using logging is for ‘performance logs’ such as the following –

19 Sept 2010 10:20:30 PERF INFO Thread-25 OrderInsertAction.java Time taken in processing OrderInsertAction: 50ms

19 Sept 2010 10:20:33 PERF INFO Thread-8 OrderInsertDao.java Time taken in insert 30ms

The author has been guilty of doing this herself, not realizing that she was jeopardizing the very performance she was measuring by increasing the I/O.

What would be infinitely better practice is to capture a ‘TimeStatistic’ with total time and a counter to give the average time, the maximum time and the minimum time taken and to display the same with a GUI screen.


6         The Design of High Performance Smarter Logging

The strategy discussed in this section is essentially to combine centralized logging with a logger facade that uses integer codes instead of Strings for logs. This technique has been used by the author at the advice of her mentor with much success.

There are numerous articles on the internet to build centralized logging with JMS queues and topics or even with sockets. Centralized logging enables high performance by moving the I/O activity to a different machine leaving the application nodes to have a slight overhead due to network I/O.

However, the key here is to combine the centralized logging with the usage of codes instead of lengthy strings. The practice of using strings to log information innocently encouraged by existing frameworks like Log4J or Commons Logging has the effect of using up precious memory, disk and network resources where a simple code could suffice.

A separate file can list the mapping between the error codes and the complete human decipherable String.

Consider the following log-

[090822 16:02:48] TX WARNING (Tx-2-thread-1: 1163 transmitData): Server has not responded with an ACK, so trying again.

in contrast with

1300604499194,4,192168001002,20600,1001,2,500000

The above log indicates the timestamp in long, the log level, the ip address of the machine generating the log, integer values for process, module of the process, the instance id of the application and an integral error code which when translated would convey the same meaning as the complete lengthy string. Such an object is also extremely light to transport over the network in binary format. In case there is contextual information to further qualify the information in the log, an Object[] array can also be passed while the main error code would translate to a String with placeholders as in printf() format.

Using short codes for errors is a practice that is quite common in nearly all major products such as Oracle, WebSphere, Microsoft. For example in Microsoft Office applications when there is an error, all one gets in the error dialog box, is an undecipherable integer code which can be sent to Microsoft support to diagnose.

The various error codes could be translated to the complete strings by a log translator while viewing.

There are the following benefits to consider -

·         Reduction of disk space and thereby longer shelf life of logs and smaller file sizes

·         Some security that the application's internal design and working is not revealed by log. The logs can be translated to full text by using a translator for viewing offline.

·         Reduction of memory usage by avoiding construction or network transfer of lengthy strings.

·         As the logs transferred in the network are extremely light, it is possible to keep even debug logging on with minimal overhead.

·         Random construction of logs is prevented

·         More efficient searching of specific errors by custom tools.

Furthermore, logging discipline can be enforced by preventing direct usage of Log4J or other such frameworks and writing a custom facade on top of your favorite logging framework or custom logging such as -

public class LogClientFacade {

      public void log(int logLevel, int instanceId, int subsystemId, int componentId, int errorCode);

      public void logWithContext(int logLevel, int instanceId, int subsystemId, int componentId, int errorCode, Object[] contextInfo);

      public void logWithEx(int logLevel, int instanceId, int subsystemId, int componentId, int errorCode, Throwable ex);

...

}

Such a log interface would ensure that developers take care to log essential information such as subsystems, components and so on for diagnosing problems in a distributed environment and would obviate the need for example, of recording the source line number and file name by expensive runtime operations.

 designOfLoggingFramework

 

 

he above diagram shows the suggested design for such a solution. The intention is to make the process of logging as non-blocking as possible by collecting information in queues and asynchronously processing or transferring them in receiver threads. Maximum benefit would be obtained by transferring the information in binary serialized format over the network, especially if the complete solution is homogeneously using a single language.

There would be a simple GUI to watch logs as they pour into the server and also look at offline logs. It would use a translator to convert the logs into textual format. The logs themselves would be written on the disk in binary format. It may be noted that this methodology has significant relevance in cloud environments to safeguard the confidentiality of the deployer's intellectual property.

6.1       A note about the implementation

It is possible to extend existing frameworks such as Log4J / Commons Logging / SLF4J. However, in doing so there may be some reduction in efficiency to comply with the generic nature of the internal APIs of the frameworks. For example, Log4J serializes log messages and stack traces as strings to send them over network in SocketAppender and JMSAppender. It is fairly easy to extend and override selected portions of the chosen frameworks, such as by adding or extending new Appenders, extending internal data transfer objects like LoggingEvent and doing custom serialization. Or, one could just create a simple custom logging framework which would not take long to implement, if maximum flexibility is desired.

Another interesting decision is whether to use JMS which can be used if the application is running in an application server or by employing a standalone queue such as WebSphereMQ, HornetQ or ActiveMQ. However, the following is the author's advice if the chosen transport is JMS-

·         Use lenient quality attributes - such as avoiding transactions, avoiding persistence and allowing duplicates in the queue transport. Remember that stringent reliability will bring down performance and is most likely not necessary for logs.

·         Within a single JVM, that is either at the log server or at the client subsystems, it is advisable to use either lightweight java.util.concurrent Queues or an in-VM queue implementation which would avoid serialization overheads.

·         It would be advisable to use message brokers or bridges for the transport, rather than having a centralized queue and doing remoting calls to the same.

The author's personal preference is using simple sockets instead.


7         Conclusion

With this article, we have discussed best practices and identified malpractices in general for logging. We have also proposed a technique to combine centralized logging with the usage of codes instead of strings as a high performance smarter logging practice.

The author credits her former mentor, Mr Akash Gupta, Solution Architect at InterGlobe Technologies (http://in.linkedin.com/pub/akash-gupta/3/79/2b3) to have conceived the proposed design of High Performance and Smarter Logging, and under whose guidance she successfully implemented and observed the tremendous performance benefits of this technique.

8         References

Pro Apache Log4J by Samudra Gupta

Log4J Source code

 

Published at DZone with permission of its author, Archanaa Panda.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Jonathan Fisher replied on Wed, 2011/06/29 - 7:05pm

SLF4J does the whole "check for debug" thing for you automagically:

Other loggging frameworks:
if (log.isDebugEnabled){
log.debug("function() param:"+param);
}

SLF4J super awesome-ness:
log.debug("function param:{}", param);
Simple eh? SLF4J rocks! It's not a logging framework, it's a Facade!

Jonathan Fisher replied on Wed, 2011/06/29 - 7:03pm

>>>and writing a custom facade on top of your favorite logging framework or custom logging such...
I wholeheartedly disagree with the statement. You shouldn't have to re-invent the wheel!

That's why I recommend SLF4J; it's is a facade already! You shouldn't have to write a facade on top of your logging framework. If your logging framework (commons-loggin, log4j, JUL) makes you do this, it's time to switch frameworks. SLF4J solves this cross cutting concern in a simple, performant manner, while not binding you to a particular loggin implementation (again, it's not a logging framework, it's a Facade).

Mark Priest replied on Thu, 2011/06/30 - 1:12am in response to: Jonathan Fisher

The point of the check for debug level is to avoid the cost of generating the arguments to pass to the log method. I can't imagine how SLFJ can help much with that if you have to pass the arguments to log.debug anyway. Log4j also internally checks the log level inside the log.debug() method but by then you have already incurred the overhead of building the logging arguments.

Anilchandra Noo... replied on Thu, 2011/06/30 - 2:01am

log4j is best logging framework available but one has to tune it according to the application requirement considering all the points Archanaa Panda has explained.

Nicolas Frankel replied on Thu, 2011/06/30 - 2:43am in response to: Mark Priest

Mark, In the previous example, the argument is already available as a reference and we do not have to "generate" (or better, compute) it again. If we were to to that, I would wholeheartedly agree with you.  SLF4J helps in that you don't get the String concatenation cost, since it's SL4J's job to do it and you won't go there if the configuration is not on the right Log level. Hope it clarifies.

Allen Chee replied on Thu, 2011/06/30 - 3:48am

SLF4J/Logback should be used whenever possible. Log4J should be phased out. There's too many advantages using SLF4J compared to Log4J. Even the original writer of Log4J recommends SLF4J/Logback.

http://logback.qos.ch/reasonsToSwitch.html

Archanaa Panda replied on Thu, 2011/06/30 - 9:04am

Hi,

Thank you all for your comments. Since having written this article, I have used SLF4J with logback and it is definitely a vast improvement over Log4J with its ability to replace placeholders and do the string concatenation, and I would recommend it to as many people as I can :).

Having said that, my article had more emphasis on the logging strategy and the need for discipline as well. Also, the idea of using integer codes instead of strings where high performance and scalability is a premium requirement is something that currently APIs like Log4J and SLF4J/Logback don't support out-of-the-box. This may be an overkill for small applications but does make sense for high performance and scalable solutions.

I strongly believe however, that SLF4J/Logback's architecture may be more amenable to these simple customizations over Log4J, should anybody be inclined to implement them.

Thanks and Regards,

Archanaa

Jonathan Fisher replied on Thu, 2011/06/30 - 10:30am in response to: Mark Priest

@Mark Priest:

Mark, if you notice, the arguments are passed as parameters to the SLF4J debug function. No "+ sign" concat cost is incurred if debug is not enabled, as SLF4J will skip this is debug is not enabled. It's a simple idea that takes boilerplate code out of your business code.

@ The Author:
I'm sorry if I came off brash, it really is a well written article, and from writing technical articles myself, I can appreciate the work that goes into the effort. My apologies if I belittled your work.

Welcome to the SLF4J fanboy/fangirl club however! Maybe you could write another excellent article on SLF4J!

Mark Priest replied on Thu, 2011/06/30 - 12:34pm in response to: Jonathan Fisher

I understand your point about the string concatenation. However, in certain cases for a debug level log call you would be building up some summary of application state including calling toString() on various complex structures, etc. If you know that debug is not enabled then you can save the overhead of doing all that for nothing. So I think that skipping the isDebugEnabled() check is an anti-pattern that causes you to forget about the cost of pulling together debug information to report as a log messages. I agree in the simplistic example in the article that it doesn't matter much but I think it is a good habit to always check debug level for low-level log messages.

Jonathan Fisher replied on Thu, 2011/06/30 - 4:54pm

@Mark

Mark, I completely agree that calling toString when debug is NOT enabled is an antipattern. Fortunately, SLF4J takes care of the calling of toString for you automatically, and ONLY if debug is enabled. The debug method actually takes _Object_ parameters, not String parameters:
debug(String format, Object[] argArray) 
The .toString on those objects is only called if debug is enabled, otherwise the function is completely bypassed.

Check out the method signature here: http://www.slf4j.org/api/org/slf4j/Logger.html#debug(java.lang.String, java.lang.Object[])

Informant Necestpas replied on Mon, 2012/09/17 - 6:18am

I was searching for a centralized logging solution a while back and had this article as a reference when summarizing our requirements. Since it took me a long time to actually find a good solution, I wanted to inform all of you that clusterlog.net met our requirements and the implicit requirements of this article.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.