How to Format a Large Code Base Automatically

If you introduce code formatting rules retroactively, you have to solve the problem how to format existing code base according to the new formatting rules. You could checkout every code repository one by one in your IDE and click on Autoformat the whole project. But this is boring and waste of time. Fortunately, Intellij IDEA has a format CLI tool in its installation. You can locate it in the path <your Intellij IDEA installation>/bin. It's called format.sh. In the next section I'd like to show you how you can automate formatting big code base. First, I will show the preparation steps like exporting your code formatting rule setting from the IDE. Then, I will demonstrate how to use the CLI-Tool format.sh. At the end, I will show a small Groovy script that query all repositories (in this case they are Git repositories), formatting the code and push it back to the remote SCM.

Preparations

First at all, we need the code formatting rule setting exported from Intellij IDEA. In your Intellij IDEA follow the next step

  1. Open File -> Settings -Editor-> Code Style
  2. Click on Export...
  3. Choose a name for the XML file (for example, Default.xml) and a location where this file should be saved (for example, /home/foo ).

Then, checkout or clone your SCM repository and remember the location where you checkout/clone it (for example, /home/foo/myrepository).

Format Code Base Via format.sh  CLI Tool

Three parameters are important for format.sh:

  • -s : Set a path to Intellij IDEA code style settings .xml file (in our example: /home/foo/Default.xml).
  • -r : Set that directories should be scanned recursively.
  • path<n> : Set a path to a file or a directory that should be formatted (in our example: /home/foo/myrepository).
 1> ./format.sh
 2IntelliJ IDEA 2018.2.4, build IC-182.4505.22 Formatter
 3Usage: format [-h] [-r|-R] [-s|-settings settingsPath] path1 path2...
 4-h|-help Show a help message and exit.
 5-s|-settings A path to Intellij IDEA code style settings .xml file.
 6-r|-R Scan directories recursively.
 7-m|-mask A comma-separated list of file masks.
 8path<n> A path to a file or a directory.
 9
10> /format.sh -r -s ~/Default.xml ~/myrepository

It's possible that the tool cancels scanning because of a java.lang.OutOfMemoryError: Java heap space. Then, you have to increase Java's maximum memory size (-Xmx) in <your Intellij IDEA installation>/bin/idea64.vmoptions.

 1> nano idea64.vmoptions
 2-Xms128m
 3-Xmx750m // <- here increase the maximum memory size
 4-XX:ReservedCodeCacheSize=240m
 5-XX:+UseConcMarkSweepGC
 6-XX:SoftRefLRUPolicyMSPerMB=50
 7-ea
 8-Dsun.io.useCanonCaches=false
 9-Djava.net.preferIPv4Stack=true
10-Djdk.http.auth.tunneling.disabledSchemes=""
11-XX:+HeapDumpOnOutOfMemoryError
12-XX:-OmitStackTraceInFastThrow
13-Dawt.useSystemAAFontSettings=lcd
14-Dsun.java2d.renderer=sun.java2d.marlin.MarlinRenderingEngine

Groovy Script For Formatting Many Repository In a Row

Now, we want to bring everything together. The script should do four things:

  1. Find all repository URLs whose code has to be formatted.
  2. Check out / Clone the repositories.
  3. Format the code in all branches of each repostory.
  4. Commit and push the change to the remote SCM.

I choose Git as SCM in my example. The finding of the repository URLs depends on the Git Management System (like BitBucket, Gitlab, SCM Manager etc.) that you use. But the approach is in all system the same:

  1. Call the RESTful API of your Git Management System.
  2. Parse the JSON object in the response after the URLs.

For example, in BitBucket it's like that:

 1@Grab('org.codehaus.groovy.modules.http-builder:http-builder:0.7.1')
 2import groovyx.net.http.*
 3
 4import static groovyx.net.http.ContentType.*
 5import static groovyx.net.http.Method.*
 6
 7def cloneUrlsForProject() {
 8
 9    def projectUrl = "https://scm/rest/api/1.0/projects/PROJECT_KEY/repos?limit=1000"
10    def token = "BITBUCKET_TOKEN"
11
12    def projects = []
13    def cloneUrls = []
14
15    def http = new HTTPBuilder(projectUrl)
16    http.request(GET) {
17        headers."Accept" = "application/json"
18        headers."Authorization" = "Bearer ${token}"
19
20        response.success = { resp -> projects = new JsonSlurper().parseText(resp.entity.content.text)}
21
22        response.failure = { resp ->
23            throw new RuntimeException("Error fetching clone urls for '${projectKey}': ${resp.statusLine}")
24        }
25    }
26
27    projects.values.each { value ->
28        def cloneLink = value.links.clone.find { it.name == "ssh" }
29        cloneUrls.add(cloneLink.href)
30    }
31
32    return cloneUrls
33}

Then, we have to clone the repositories and checkout each branch. In each branch, the format.sh has to be called. For the git operation, we use the jgit library and for the format.sh call we use a Groovy feature for process calling. In Groovy it's possible to define the command as a String and then to call the method execute() on this String like "ls -l".execute(). So the Groovy script for the last three tasks would be looked like that:

 1#!/usr/bin/env groovy
 2@Grab('org.eclipse.jgit:org.eclipse.jgit:5.1.2.201810061102-r')
 3import jgit.*
 4import org.eclipse.jgit.api.CreateBranchCommand
 5import org.eclipse.jgit.api.Git
 6import org.eclipse.jgit.api.ListBranchCommand
 7import org.eclipse.jgit.transport.UsernamePasswordCredentialsProvider
 8
 9import java.nio.file.Files
10
11
12def intellijHome = 'path to your idea home folder'
13def codeFormatterSetting = 'path to your exported code formatter setting file'
14def allRepositoriesUrls = ["http://scm/repo1","http://scm/repo2"] // for simplifying
15
16allRepositoriesUrls.each { repository ->
17    def repositoryName = repository.split('/').flatten().findAll { it != null }.last()
18    File localPath = Files.createTempDirectory("${repositoryName}-").toFile()
19    println "Clone ${repository} to ${localPath}"
20    Git.cloneRepository()
21       .setURI(repository)
22       .setDirectory(localPath)
23       .setNoCheckout(true)
24       .setCredentialsProvider(new UsernamePasswordCredentialsProvider("user", "password")) // only needed when clone url is https / http
25       .call()
26       .withCloseable { git ->
27        def remoteBranches = git.branchList().setListMode(ListBranchCommand.ListMode.REMOTE).call()
28        def remoteBranchNames = remoteBranches.collect { it.name.replace('refs/remotes/origin/', '') }
29
30        println "Found the following branches: ${remoteBranchNames}"
31
32        remoteBranchNames.each { remoteBranch ->
33            println "Checkout branch $remoteBranch"
34            git.checkout()
35               .setName(remoteBranch)
36               .setCreateBranch(true)
37               .setUpstreamMode(CreateBranchCommand.SetupUpstreamMode.TRACK)
38               .setStartPoint("origin/" + remoteBranch)
39               .call()
40
41            def formatCommand = "$intellijHome/bin/format.sh -r -s $codeFormatterSetting $localPath"
42
43            println formatCommand.execute().text
44
45            git.add()
46               .addFilepattern('.')
47               .call()
48            git.commit()
49               .setAuthor("Automator", "no-reply@yourcompany.com")
50               .setMessage('Format code according to IntelliJ setting.')
51               .call()
52
53            println "Commit successful!"
54        }
55
56        git.push()
57           .setCredentialsProvider(new UsernamePasswordCredentialsProvider("user", "password")) // only needed when clone url is https / http
58           .setPushAll()
59           .call()
60
61        println "Push is done"
62
63    }
64}

Do you have another approach? Let me know and write a comment below.