10+ Java Logging Best Practices: Getting the Most Out of Your Logs — Sematext
Having visibility into your Java application is crucial for understanding how it works right now, how it worked some time in the past and increasing your understanding of how it might work in the future. More often than not, analyzing logs is the fastest way to detect what went wrong, thus making logging in Java critical to ensuring the performance and health of your app, as well as minimizing and reducing any downtime. Having a centralized logging and monitoring solution helps reduce the Mean Time To Repair by improving the effectiveness of your Ops or DevOps team.
By following logging best practices you will get more value out of your logs and make it easier to use them. You will be able to more easily pinpoint the root cause of errors and poor performance and solve problems before they impact end-users. So today, let me share some of the best practices you should follow when working with Java applications. Let’s dig in.
1. Use a Standard Logging Library
Logging in Java can be done a few different ways. You can use a dedicated logging library, a common API, or even just write logs to file or directly to a dedicated logging system. However, when choosing the logging library for your system think ahead. Things to consider and evaluate are performance, flexibility, appenders for new log centralization solutions, and so on. If you tie yourself directly to a single framework the switch to a newer library can take a substantial amount of work and time. Keep that in mind and go for the API that will give you the flexibility to swap logging libraries in the future. Just like with the switch from Log4j to Logback and to Log4j 2, when using the SLF4J API the only thing you need to do is change the dependency, not the code.
2. Select Your Appenders Wisely
Appenders define where your log events will be delivered. The most common appenders are the Console and File Appenders. While useful and widely known, they may not fulfill your requirements. For example, you may want to write your logs in an asynchronous way or you may want to ship your logs over the network using appenders like the one for Syslog, like this:
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d %level [%t] %c - %m%n"/>
</Console>
<Syslog name="Syslog" host="logsene-syslog-receiver.sematext.com"
port="514" protocol="TCP" format="RFC5424"
appName="11111111-2222-3333-4444-555555555555"
facility="LOCAL0" mdcId="mdc" newLine="true"/>
</Appenders>
However, keep in mind that using appenders like the one shown above makes your logging pipeline susceptible to network errors and communication disruptions. That may result in logs not being shipped to their destination which may not be acceptable. You also want to avoid logging affecting your system if the appender is designed in a blocking way. To learn more check our Logging libraries vs Log shippers blog post.
3. Use Meaningful Messages
One of the crucial things when it comes to creating logs, yet one of the not so easy ones is using meaningful messages. Your log events should include messages that are unique to the given situation, clearly describe them and inform the person reading them. Imagine a communication error occurred in your application. You might do it like this:
LOGGER.warn("Communication error");
But you could also create a message like this:
LOGGER.warn("Error while sending documents to events Elasticsearch server, response code %d, response message %s. The message sending will be retried.", responseCode, responseMessage);
You can easily see that the first message will inform the person looking at the logs about some communication issues. That person will probably have the context, the name of the logger, and the line number where the warning happened, but that is all. To get more context that person would have to look at the code, know which version of the code the error is related to, and so on. This is not fun and often not easy, and certainly not something one wants to be doing while trying to troubleshoot a production issue as quickly as possible.
The second message is better. It provides exact information about what kind of communication error happened, what the application was doing at the time, what error code it got, and what the response from the remote server was. Finally, it also informs that sending the message will be retried. Working with such messages is definitely easier and more pleasant.
Finally, think about the size and verbosity of the message. Don’t log information that is too verbose. This data needs to be stored somewhere in order to be useful. One very long message will not be a problem, but if that line is repeating hundreds of times in a minute and you have lots of verbose logs, keeping longer retention of such data may be problematic and, at the end of the day, will also cost more.
4. Logging Java Stack Traces
One of the very important parts of Java logging are the Java stack traces. Have a look at the following code:
package com.sematext.blog.logging; import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import java.io.IOException; public class Log4JExceptionNoThrowable {
private static final Logger LOGGER = LogManager.getLogger(Log4JExceptionNoThrowable.class);
public static void main(String[] args) {
try {
throw new IOException("This is an I/O error");
} catch (IOException ioe) {
LOGGER.error("Error while executing main thread");
}
}
}
The above code will result in an exception being thrown and a log message that will be printed to the console with our default configuration will look as follows:
11:42:18.952 ERROR - Error while executing main thread
As you can see there is not a lot of information there. We only know that the problem occurred, but we don’t know where it happened, or what the problem was, etc. Not very informative.
Now, look at the same code with a slightly modified logging statement:
package com.sematext.blog.logging; import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import java.io.IOException; public class Log4JException {
private static final Logger LOGGER = LogManager.getLogger(Log4JException.class); public static void main(String[] args) {
try {
throw new IOException("This is an I/O error");
} catch (IOException ioe) {
LOGGER.error("Error while executing main thread", ioe);
}
}
}
As you can see, this time we’ve included the exception object itself in our log message:
LOGGER.error("Error while executing main thread", ioe);
That would result in the following error log in the console with our default configuration:
11:30:17.527 ERROR - Error while executing main thread java.io.IOException: This is an I/O error at com.sematext.blog.logging.Log4JException.main(Log4JException.java:13) [main/:?]
It contains relevant information — i.e. the name of the class, the method where the problem occurred, and finally the line number where the problem happened. Of course, in real-life situations, the stack traces will be longer, but you should include them to give you enough information for proper debugging.
To learn more about how to handle Java stack traces with Logstash see Handling Multiline Stack Traces with Logstash or look at Logagent which can do that for you out of the box.
5. Logging Java Exceptions
When dealing with Java exceptions and stack traces you shouldn’t only think about the whole stack trace, the lines where the problem appeared, and so on. You should also think about how not to deal with exceptions.
Avoid silently ignoring exceptions. You don’t want to ignore something important. For example, do not do this:
try {
throw new IOException("This is an I/O error");
} catch (IOException ioe) {}
Also, don’t just log an exception and throw it further. That means that you just pushed the problem up the execution stack. Avoid things like this as well:
try {
throw new IOException("This is an I/O error");
} catch (IOException ioe) {
LOGGER.error("I/O error occurred during request processing", ioe);
throw ioe;
}
6. Use Appropriate Log Level
When writing your application code think twice about a given log message. Not every bit of information is equally important and not every unexpected situation is an error or a critical message. Also, using the logging levels consistently — information of a similar type should be on a similar severity level.
Both SLF4J facade and each Java logging framework that you will be using provide methods that can be used to provide a proper log level. For example:
LOGGER.error("I/O error occurred during request processing", ioe);
7. Log in JSON
If we plan to log and look at the data manually in a file or the standard output then the planned logging will be more than fine. It is more user friendly — we are used to it. But that is only viable for very small applications and even then it is suggested to use something that will allow you to correlate the metrics data with the logs. Doing such operations in a terminal window ain’t fun and sometimes it is simply not possible. If you want to store logs in the log management and centralization system you should log in JSON. That’s because parsing doesn’t come for free — it usually means using regular expressions. Of course, you can pay that price in the log shipper, but why do that if you can easily log in JSON. Logging in JSON also means easy handling of stack traces, so yet another advantage. Well, you can also just log to a Syslog compatible destination, but that is a different story.
In most cases, to enable logging in JSON in your Java logging framework it is enough to include the proper configuration. For example, let’s assume that we have the following log message included in our code:
LOGGER.info("This is a log message that will be logged in JSON!");
To configure Log4J 2 to write log messages in JSON we would include the following configuration:
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<JSONLayout compact="true" eventEol="true"></JSONLayout>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
The result would look as follows:
{"instant":{"epochSecond":1596030628,"nanoOfSecond":695758000},"thread":"main","level":"INFO","loggerName":"com.sematext.blog.logging.Log4J2JSON","message":"This is a log message that will be logged in JSON!","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":1,"threadPriority":5}
8. Keep the Log Structure Consistent
The structure of your log events should be consistent. This is not only true within a single application or set of microservices, but should be applied across your whole application stack. With similarly structured log events it will be easier to look into them, compare them, correlate them, or simply store them in a dedicated data store. It is easier to look into data coming from your systems when you know that they have common fields like severity and hostname, so you can easily slice and dice the data based on that information. For inspiration, have a look at Sematext Common Schema even if you are not a Sematext user.
Of course, keeping the structure is not always possible, because your full stack consists of externally developed servers, databases, search engines, queues, etc., each of which has their own set of logs and log formats. However, to keep your and your team’s sanity minimize the number of different log message structures that you can control.
One way of keeping a common structure is to use the same pattern for your logs, at least the ones that are using the same logging framework. For example, if your applications and microservices use Log4J 2 you could use a pattern like this:
<PatternLayout>
<Pattern>%d %p [%t] %c{35}:%L - %m%n</Pattern>
</PatternLayout>
By using a single or a very limited set of patterns you can be sure that the number of log formats will remain small and manageable.
9. Add Context to Your Logs
Information context is important and for us developers and DevOps a log message is information. Look at the following log entry:
[2020-06-29 16:25:34] [ERROR ] An error occurred!
We know that an error appeared somewhere in the application. We don’t know where it happened, we don’t know what kind of error it was, we only know when it happened. Now look at a message with slightly more contextual information:
[2020-06-29 16:25:34] [main] [ERROR ] com.sematext.blog.logging.ParsingErrorExample - A parsing error occurred for user with id 1234!
The same log record, but a lot more contextual information. We know the thread in which it happened, we know what class the error was generated at. We modified the message as well to include the user that the error happened for, so we can get back to the user if needed. We could also include additional information like diagnostic contexts. Think about what you need and include it.
To include context information you don’t have to do much when it comes to the code that is responsible for generating the log message. For example, the PatternLayout in Log4J 2 gives you all that you need to include the context information. You can go with a very simple pattern like this:
<PatternLayout pattern="%d{HH:mm:ss.SSS} %-5level - %msg%n"/>
That will result in a log message similar to the following one:
17:13:08.059 INFO - This is the first INFO level log message!
But you can also include a pattern that will include way more information:
<PatternLayout pattern="%d{HH:mm:ss.SSS} %c %l %-5level - %msg%n"/>
That will result in a log message like this:
17:24:01.710 com.sematext.blog.logging.Log4j2 com.sematext.blog.logging.Log4j2.main(Log4j2.java:12) INFO - This is the first INFO level log message!
10. Java Logging in Containers
Think about the environment your application is going to be running in. There is a difference in logging configuration when you are running your Java code in a VM or on a bare-metal machine, it is different when running it in a containerized environment, and of course, it is different when you run your Java or Kotlin code on an Android device.
To set up logging in a containerized environment you need to choose the approach you want to take. You can use one of the provided logging drivers — like the journald, logagent, Syslog, or JSON file. To do that, remember that your application shouldn’t write the log file to the container ephemeral storage, but to the standard output. That can be easily done by configuring your logging framework to write the log to the console. For example, with Log4J 2 you would just use the following appender configuration:
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{HH:mm:ss.SSS} - %m %n"/>
</Console>
</Appenders>
You can also completely omit the logging drivers and ship logs directly to your centralized logs solution like our Sematext Cloud:
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d %level [%t] %c - %m%n"/>
</Console>
<Syslog name="Syslog" host="logsene-syslog-receiver.sematext.com"
port="514" protocol="TCP" format="RFC5424"
appName="11111111-2222-3333-4444-555555555555"
facility="LOCAL0" mdcId="mdc" newLine="true"/>
</Appenders>
11. Don’t Log Too Much or Too Little
As developers we tend to think that everything might be important — we tend to mark each step of our algorithm or business code as important. On the other hand, we sometimes do the opposite — we don’t add logging where we should or we log only FATAL and ERROR log levels. Both approaches will not do very well. When writing your code and adding logging, think about what will be important to see if the application is working properly and what will be important to be able to diagnose a wrong application state and fix it. Use this as your guiding light to decide what and where to log. Keep in mind that adding too many logs will end up in information fatigue and not having enough information will result in the inability to troubleshoot.
12. Keep the Audience in Mind
In most cases, you will not be the only person looking at the logs. Always remember that. There are multiple actors that may be looking at the logs.
The developer may be looking at the logs for troubleshooting or during debugging sessions. For such people, logs can be detailed, technical, and include very deep information related to how the system is running. Such a person will also have access to the code or will even know the code and you can assume that.
Then there are DevOps. For them, log events will be needed for troubleshooting and should include information helpful in diagnostics. You can assume the knowledge of the system, its architecture, its components, and the configuration of the components, but you should not assume the knowledge about the code of the platform.
Finally, your application logs may be read by your users themselves. In such a case, the logs should be descriptive enough to help fix the issue if that is even possible or give enough information to the support team helping the user. For example, using Sematext for monitoring involves installing and running a monitoring agent. If you are behind a very restrictive firewall and the agent cannot ship metrics to Sematext, it logs errors aimed that Sematext users themselves can look at, too.
We could go further and identify even more actors who might be looking into logs, but this shortlist should give you a glimpse into what you should think about when writing your log messages.
13. Avoid Logging Sensitive Information
Sensitive information shouldn’t be present in logs or should be masked. Passwords, credit card numbers, social security numbers, access tokens, and so on — all of that may be dangerous if leaked or accessed by those who shouldn’t see that. There are two things you ought to consider.
Think whether sensitive information is truly essential for troubleshooting. Maybe instead of a credit card number, it is enough to keep the information about the transaction identifier and the date of the transaction? Maybe it is not necessary to keep the social security number in the logs when you can easily store the user identifier. Think about such situations, think about the data that you store, and only write sensitive data when it is really necessary.
The second thing is shipping logs with sensitive information to a hosted logs service. There are very few exceptions where the following advice should not be followed. If your logs have and need to have sensitive information stored, mask or remove them before sending them to your centralized logs store. Most popular log shippers, like our own Logagent, include functionality that allows removal or masking of sensitive data.
Finally, the masking of sensitive information can be done in the logging framework itself. Let’s look at how it can be done by extending Log4j 2. Our code that produces log events looks as follows (full example can be found at Sematext Github):
public class Log4J2Masking {
private static Logger LOGGER =
LoggerFactory.getLogger(Log4J2Masking.class);
private static final Marker SENSITIVE_DATA_MARKER =
MarkerFactory.getMarker("SENSITIVE_DATA_MARKER"); public static void main(String[] args) {
LOGGER.info("This is a log message without sensitive data");
LOGGER.info(SENSITIVE_DATA_MARKER, "This is a a log message with credit card number 1234-4444-3333-1111 in it");
}
}
If you were to run the whole example from Github the output would be as follows:
21:20:42.099 - This is a log message without sensitive data 21:20:42.101 - This is a a log message with credit card number ****-****-****-**** in it
You can see that the credit card number was masked. This was done because we added a custom Converter that checks if the given Marker is passed along the log event and tries to replace a defined pattern. The implementation of such Converter looks as follows:
@Plugin(name = "sample_logging_mask", category = "Converter") @ConverterKeys("sc")
public class LoggingConverter extends LogEventPatternConverter {
private static Pattern PATTERN = Pattern.compile("\\b([0-9]{4})-([0-9]{4})-([0-9]{4})-([0-9]{4})\\b"); public LoggingConverter(String[] options) {
super("sc", "sc");
} public static LoggingConverter newInstance(
final String[] options) {
return new LoggingConverter(options);
} @Override public void format(LogEvent event, StringBuilder
toAppendTo) {
String message = event.getMessage().getFormattedMessage();
String maskedMessage = message;
if (event.getMarker() != null && "SENSITIVE_DATA_MARKER"
.compareToIgnoreCase(event.getMarker().getName()) == 0) {
Matcher matcher = PATTERN.matcher(message);
if (matcher.find()) {
maskedMessage = matcher.replaceAll("****-****-****-****");
}
}
toAppendTo.append(maskedMessage);
}
}
It is very simple and could be written in a more optimized way and should also handle all the possible credit cards number formats, but it is enough for this purpose.
Before jumping into the code explanation I would also like to show you the log4j2.xml configuration file for this example:
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN" packages="com.sematext.blog.logging">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{HH:mm:ss.SSS} - %sc %n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
As you can see, we’ve added the packages attribute in our Configuration to tell the framework where to look for our converter. Then we’ve used the %sc pattern to provide the log message. We do that because we can’t overwrite the default %m pattern. Once Log4j2 finds our %sc pattern it will use our converter which takes the formatted message of the log event and uses a simple regex and replaces the data if it was found. As simple as that.
One thing to notice here is that we are using the Marker functionality. Regex matching is expensive and we don’t want to do that for every log message. That’s why we mark the log events that should be processed with the created Marker, so only the marked ones are checked.
14. Use a Log Management Solution to Centralize & Monitor Java Logs
With the complexity of the applications, the volume of your logs will grow, too. You may get away with logging to a file and only using logs when troubleshooting is needed, but when the amount of data grows it quickly becomes difficult and slow to troubleshoot this way When this happens, consider using a log management solution to centralize and monitor your logs. You can either go for an in house solution based on the open-source software, like Elastic Stack, or use one of the log management tools available on the market like Sematext Cloud or Sematext Enterprise.
A fully managed log centralization solution will give you the freedom of not needing to manage yet another, usually quite complex, part of your infrastructure. Instead, you will be able to focus on your application and will need to set up only log shipping. You may want to include logs like JVM garbage collection logs in your managed log solution. After turning them on for your applications and systems working on the JVM you will want to have them in a single place for correlation, analysis, and to help you tune the garbage collection in the JVM instances. Alert on logs, aggregate the data, save and re-run the queries, hook up your favorite incident management software. Correlating the logs data with metrics coming from the JVM applications, system and infrastructure, real user, and API endpoints is something that platforms like Sematext Cloud are capable of. And of course, remember that application logs are not everything.
Conclusion
Incorporating each and every good practice may not be easy to implement right away, especially for applications that are already live and working in production. But if you take the time and roll the suggestions out one after another you will start seeing an increase in usefulness of your logs. Also, remember that at Sematext we do help organizations with their logging setups by offering logging consulting, so reach out if you are having trouble and we will be happy to help.
Related articles:
Originally published at https://sematext.com on August 3, 2020.