Sharing code and why git submodules is a bad idea.

Recently at a client, we had to build restful services that talk to each other. We had a choreographed set of services, which meant that any service could talk to another service. The problem we had was, we were constantly refactoring the messages that were used by the services to talk to one another. We started off by duplicating the messages on each service. It kinda worked initially when had just two services; we just had to change it in two places. Then we added a third service that used the same message. Now we had to change stuff in three places. We also had a fourth place, the acceptance tests which tested the services independently. It was cumbersome to do it in four places. The approach clearly didn’t scale. One approach we thought about was making the messages a library (a jar in our case), but that meant that we had a one place to change a message. It also meant that we had to independently change the message, check it in and wait for it build and publish it to an artefact repository. That would mean at least five minutes even for a trivial change. We wanted something faster.

We then thought about giving git submodules a try. We made the messages into a separate repository. It was then checked out as a regular source folder into every service project. Now we could finally use the IDE’s magic refactoring on the messages. All looked good initially; what we didn’t realize was,

1. It requires a lot of discipline and a good understanding of how git works plus the idiosyncrasies of submodules. You have to check the submodule in separately and push it before you push the parent repository changes. Checking in a bad reference to a submodule is the most common mistake people make.

2. Git checks out a submodule on a ‘no branch‘ when you use the ‘git submodule update’ command. This is a bit irritating because, when you have changed stuff and want to commit, you can’t. You have to check out master or some other branch and check it in. This means that I stash my changes first, checkout master, pop my stash and then commit.

3. Even if I update one repo to point to a new version of a submodule, this doesn’t mean that the other repos will point to the new version. Somebody has to go to each dependent repository, check the new reference in and push it. Manually this doesn’t scale even when you have three repos. So, we created a job on our CI server that monitors for check ins to the submodule, checks out all dependent repos and updates them (another manual task to update the job, when a new dependent repo comes along). This still doesn’t guarantee that another developer is happily working on an older version of the submodule, completely unaware, that a huge breaking change is coming her way (more importantly, maybe 3 hours later). This defeats the goal of fast feed back inherent to continuous integration. This wouldn’t be a problem if we had used an artefact repository and every build (even on a local workstation) picked up the latest changes. This would mean that the developer would know that there was a breaking change earlier.

Git submodules may look powerful or cool upfront, but for all the reasons above it is a bad idea to share code using submodules, especially when the code changes frequently. It will be much worse when you have more and more developers working on the same repos.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s