Maintenance Matters: Building Helpful Logs

Annie Kiley, Former Application Development Director

Article Categories: #Code, #Back-end Engineering

Posted on

When it comes to maintenance, a little effort spent improving application logs can go a long way.

This article is part of a series focusing on how developers can center and streamline software maintenance. The other articles in the Maintenance Matters series are: Continuous IntegrationCode Coverage, Documentation, Default FormattingTimely Upgrades, Code Reviews, Good Tests, and Monitoring.

A lot of the time, we don't think much about our logs until we're digging around trying to sort out something that has gone wrong. They aren't user-facing or admin-facing or very visible at all until they are needed, so most of the time, only developers think about them. But good logs can save more time than just about anything else during debugging and maintenance. I am a huge fan of customizing logging behavior as a part of feature work. It takes hardly any time and can have big payoffs. Removing useless information, adding helpful data, or setting up custom log files for important processes can be the difference between wading through oceans of data and pinpointing the source of a problem quickly. Here are some of our favorite ways to make application logs work for us.

Declutter #

Like cleaning out a closet, removing extraneous information from logs can make it much easier to find what you need, which in turn saves time. If you're logging a bunch of information that no one will ever care about, it's just cluttering what could be a valuable resource. This can be as simple as adjusting the log level to something more appropriate for an environment or switching up a configuration. It can also be more specific and situational, like this example from our work with WCS.

Some of the work we have done for WCS involves the system that emails people their zoo tickets after they purchase them online. One time, I set the log level on a staging environment to debug to get some more information on a weird bug. I quickly learned that at that level, Rails prints the entire contents of every email in the application logs, including attachments. When the attachments include several pages of pdfs (in our case, one page for each ticket), it will write those attachments to the logs as hex. This left tens of thousands of lines of this in the logs:

255044462d312e330a25c4e5f2e5eba7f3a0d0c4c60a332030206f626a0a
3c3c202f46696c746572202f466c6174654465636f6465202f4c656e6774
68203832203e3e0a73747265616d0a78012b540854285430004233432305
734b2385a2548570853c05fd80d4a2e4d48292d2c41c85a24ca01a903c48
9db9a5819eb9810588ad6766626660606cac909caba0ef996bc8e5920f34
2e1000762e145a0a656e6473747265616d0a656e646f626a0a312030206f
626a0a3c3c202f54797065202f50616765202f506172656e742032203020

I enjoy feeling like I'm monitoring the matrix as much as the next person, but this was a lot of useless code to sift through while I was looking for something else entirely. It only took a few minutes to override the payload to record the attachment filenames instead of the contents, and suddenly those debug level logs were useful again. After that, we were able to quickly find and fix the bug.

Add Useful Information #

Anyone who has ever looked at log files has thought something along the lines of, "This would be so much easier if insert valuable information that isn't being logged was in here." I've thought that a thousand times, and more often than not, left everything as it was once I finished doing whatever it was that sent me to the logs in the first place. But it is usually trivially easy to add a couple of lines of code that will put that information in the logs and make them more useful next time.

Maybe a multi-tenant application doesn't automatically record information about which tenant is active. Or maybe you're logging requests, but there's some information from a header that would make searching the logs much easier. Whatever it is, the effort to improve the logs is usually pretty small and will likely pay off the next time someone needs to be in there.

Utilize Custom Log Files #

Most web frameworks make it easy to customize logging behavior and write to specific files. I particularly like doing this for third-party integrations, background processes, cron jobs, or anything else that's important but happens behind the scenes.

For example, one recent project had a critical integration with a difficult API that often had unexpected errors, inconsistent responses, and long response times or timeouts. Since the integration was important but hard to work with, bugs were often difficult to reproduce and the source of a problem was rarely easy to pinpoint. We set up a log file to record every interaction we had with this API, including the request data, response data, and some relevant data from our application. That way, we could easily search for and find out exactly what happened with the API and use that information to prevent future bugs, even if we couldn't get the API to replicate the behavior.

Similarly, background processes or cron jobs can be difficult to debug because issues usually surface down the line, and it isn't immediately clear to a user that something has gone wrong. It only takes a couple of minutes to add a few lines to a cron job or background process and record relevant info about its behavior to a specific file. Then, if anyone ever needs to see what's going on, there are files like critical_cron_job.log or elasticsearch.log just waiting for them.

Here are a few lines from a custom log file for a set of background jobs on a recent project:

INFO -- : START processing 1188723
INFO -- :   1188723 - item.title=Very Important Item
INFO -- :   1188723 - Item is not in index. Let's wait two seconds.
INFO -- :   1188723 - Item has been indexed. Checking for matches.
INFO -- :   1188723 - Checking for match with First Category (category.id=1252)
INFO -- :   1188723 - No match with First Category (category.id=1252)
INFO -- :   1188723 - Checking for match with Second Category (category.id=234567)
INFO -- :   1188723 - Match with Second Category (category.id=234567)
INFO -- :   1188723 - Recorded match with Second Category (category_item.id=514004)
INFO -- :   1188723 - Skipping alert because Second Category is not currently alerting
INFO -- :   1188723 - Checking for match with Third Category (category.id=393933)
INFO -- :   1188723 - No match with Third Category (category.id=393933)
INFO -- : END processing 1188723

The jobs processed items coming in through an API, then checked them against certain categories to see if alerts should be sent. Since this all happened in the background, bug reports would usually be something like, "I expected to get an alert, and I didn't." An easily searchable and human-readable file lets us quickly see what happened and go from there. Here is most of the code that produced that log:

class ProcessNewItemJob < ApplicationJob
  def perform(item_id:)
    log "START processing #{item_id}"

    # ...
    # Do a bunch of processing to check for matches and send alerts
    # with several `log` statements recording critical steps 
    # ...

    log "END processing #{item_id}"
  end

  def log(msg)
    job_logger.info(msg)
  end

  def job_logger
    @job_logger ||= Logger.new([Rails.root, "log", "item_processing.log"].join("/"))
  end
end

With only a few additional lines of code and log statements sprinkled into the background job, we gave ourselves a debugging tool that saved us countless hours of tracking down issues.

Take Care of Production Logs #

Your production logs probably don't need as much information as your development logs, so set them to the appropriate level. Make sure logs on deployed environments are rotating regularly enough not to use too much memory or storage. This one isn't really optional since writing and storing gigantic files will slow and eventually crash a server.

Consolidating or centralizing logs can also be important. For example, if a load balancer is splitting traffic between a few servers and we're looking for events that might have happened in different places, it can be really nice to have the logs in one place so they are easily searchable and able to be combined if necessary. There are several great tools out there to help with this, like CloudWatch.

And while this isn't strictly related to maintenance, we just can't write about logging without saying that it's important not to have sensitive data anywhere it doesn't belong. Logs are data just like anything else, so we make sure server logs are only accessible by people that should have access to them and strip out information that shouldn't be printed. Passwords and API keys are some common examples of data that should be obscured, but other credentials, contact information, health information, financial information, and many other items might be worth hiding.

Plus, it's easy! #

Among developer chores, taking care of your logs can have some of the most maintenance benefits (measured in time saved or problems solved) for some of the least effort since it's just not that hard to tell computers to write or not write to a file. I like to think of good logs as a present to future me or any other developers who will find the information they need quickly and easily at times when that matters a lot.

The next article in this series is Maintenance Matters: Timely Upgrades.

Related Articles