How to work with Legacy Code

A photograph of the author: Phillip Whitaker

By: Phillip Whitaker

When you’re learning to be a developer the examples you learn from and the projects you work on are made from scratch (called a greenfield project) and it’s most likely you’ll be the only person working on the code too.

It’s a harsh reality when you enter the IT industry to find that most of the work will be working with existing code bases and in a team of people.

The code base and the team will have a set of conventions you’ll need to learn in order to understand the context of why the code was written and what the team is looking to achieve.

This archaeological dig for context is as much a developer’s job as writing code and fortunately it’s something that can be made easier.

What is Legacy Code?

Legacy Code put simply, is code that you lack the original context for why it was written, this can even be code you’ve written yourself (how many times have you returned to an old project to forget why you made particular decisions?).

The most common scenario for Legacy Code is when a team takes on another team’s codebase with minimal time for handover meaning that while the high level functionality has been covered the underlying code is still a mystery.

Other scenarios include:

  • Returning to projects after the team has context switched
  • Working in an area of the codebase that you didn’t write or review and the original team members are unavailable
  • Working with code that has out of date documentation and because of this the behaviour of the code doesn’t match the documented intent

Understanding the original intent

Being able to retroactively uncover the original intent can have varying levels of difficulty depending on what resources you have available to you.

Definition of intended behaviour

This makes things easier as you’ve got an artefact from the time which will help you understand what the code is meant to be doing and what value that behaviour was going to give to the user.

This behaviour can be found in:

  • User stories or other work items
  • Acceptance criteria or test scenarios
  • Automated tests such as unit, integration or system tests
  • Pages found in the company’s knowledge base, covering the functionality
  • Operational and Support documentation
  • Version control entries such as Git commits

Access to someone from the original team

This can make things easier but sometimes it can also make things a bit harder, depending on the individual and the position they held in the team.

I’ve seen benefit from talking to someone from the original team when they’ve worked directly with the functionality and have been able to recall the intent behind the design decisions made.

However, I’ve also seen the person from the original team cause more chaos as they were retro-actively adding behaviour that wasn’t in the original intent and due to this, the developer they were working with was getting more confused (I used to work on that original team too which is why I knew they were adding things not in the original specification).

Access to a running version of the application or users

This is probably the hardest means of understanding what the original intent of the code is as you’re making an educated guess based on the way the application currently works.

If you’ve got access to a user of the application then you might be able to understand the way the behaviour has evolved over time, which can add a little more context to things but this can also lead to a subjective portrayal of how the user uses the system and not what the team intended to happen.

Adding context back into Legacy Code

The easiest means of adding context back into Legacy Code is to add documentation.

This documentation can be static pages on a knowledge base somewhere that list what the intended behaviour of the code is or it can be dynamic, executable documentation such as automated tests.

In order to get the most benefit you should look to have this documentation as close to the code as possible. Unit tests in particular are a great way of doing this as you can write a set to cover off the behaviour you know about and as edge cases pop up you can add more to test that behaviour too.

Sometimes, especially with a large or fragmented codebase it’s not easy to add unit tests or integration tests so I would suggest a set of end-to-end automated tests (again try to get these as close to the code as possible).

Once you’ve got the behaviour of the functionality documented you then have a safety net with which you can catch any regressions to that functionality while you work on the code and make it easier to work with.

Refactoring Legacy Code

While it’s not always true, it’s often the case that Legacy Code doesn’t meet the current team’s coding standards and as such requires refactoring to bring it up to standard.

Separate concerns

One of the more common Legacy Code issues I’ve seen is functions that handle more than one concern, but as there’s now a clear picture on what the function does we can break out the concerns into separate functions.

By separating the concerns the new functions have a clearer scope of what they’re intended to do and if the team spots similar functionality elsewhere in the codebase they can replace that with a call to the new function.

As the code gets separated be sure to add unit tests to these smaller functions so you’re not creating even more legacy code for future developers, even if the functionality is covered by the testing done against the original function.

Modularise code

More often than not the reason you’re working with Legacy Code is that you’re trying to add a functional change to an existing system, so your goal is to keep that existing system running as it does while making changes to it.

By creating versioned modules of the code you can take control of the increments that make it into production as you can release individual modules separately while also verifying that a collection of modules work together.

Much like long-lived feature branches it’s important that you integration test often and if possible you should look to release small increments to production, using techniques such as feature toggles to ‘switch off’ functionality until it’s complete and ready for launch.

Create change requests

When working with Legacy Code it can be tempting when you find something that’s doesn’t seem to meet the defined behaviour to raise a bug or update the existing acceptance criteria on the user story to reflect what you feel is the correct behaviour.

While you may feel that the behaviour you’re seeing is incorrect there may be some contextual information you’ve not yet discovered that means the existing behaviour is actually correct.

By raising change requests instead of bugs or rewriting the ticket you can track the number of changes needed to bring the system behaviour inline with the team’s understanding while also tracking the number of actual defects separately.

This also gives the Product Owner a clearer backlog to work with as they can approach the change requests in a different context to the bugs. Similarly, testers on the team will be able to better track these changes instead of treating the existing functionality as a ‘known issue’.

How to prevent your code from becoming Legacy Code

It’s highly likely that you’re not going to be the last person to touch the code you write, even if it’s yourself picking it up in the future.

When working with any code you should look to make this future developer’s job as easy as possible to understand the context of the functionality and design decisions you’re making.

With a bit of team discipline it’s relatively easy to embed this context and make it easier to find using the following techniques.

Make your version control messages meaningful

If you’re using Git this can be as simple as making sure that branch names and merge commits contain a reference to a user story or work item so the changes can be traced back to this artefact where the contextual information can be found.

If you’re using a system that allows for Pull Requests, you can use the description box to explain the context around the changes being made and the comments left by the reviewers will also help to provide contextual information.

Write automated tests

Sometimes time constraints get in the way but if you find yourself having to compromise on adding tests you should raise this with the person setting the schedule as they’ll probably appreciate the time saved later when working with the same code.

By writing tests, even high level system or integration tests, you start to bring the contextual information back into the code and make it easier to refactor it at a later date, even if you don’t think you’ll have time to do so.

As the tests are automated they can be run quickly and provide a lot of feedback on what the code does so any developer picking up the code can learn in a matter of a few minutes what might usually take a number of days to unearth through other means.

Use consistent design patterns

Solving different problems in a similar manner makes it easier for developers picking up code in the future to understand the thought process of the original developer.

By using design patterns you’re providing future developers with a tool to frame the decisions being made and even if they don’t agree with those decisions they’ll be able to understand them better.

Keep your documentation up-to-date

As a system grows it can be hard to keep things updated. Even simple things like documentation strings on functions can be invalidated if they’re aren’t updated when the function is and this invalidation essentially renders them useless to someone who is trying to use them for contextual information.

In order to make things easier to keep updated, look to have a single source of truth and incorporate traceability into the codebase and other artefacts so future developers know where to go to find more information.

A better approach would be to implement bi-directional traceability so the source of truth also holds contextual information on the code and test cases for that functionality, making it really easy for future developers to know where to look in the code to make changes and the tests that might be impacted.

Summary

Legacy Code is something that developers aren’t taught to work with in the majority of courses but it’s something the majority of developers work with in their day-to-day jobs.

By understanding that Legacy Code is an issue of a lack of context a development team can work with the Product Owner to ensure that time is given to build up this contextual understanding and better estimate the work that’s being asked of them.

Additionally by refactoring code and putting measures in place to prevent their code from being Legacy Code, the development team can decrease the amount of time that future development effort in that area will take.