Terminus
How To Remove Secrets From The Git History Remove Secrets From The Git History

How To Remove Secrets From The Git History Remove Secrets From The Git History

[#remove-secrets-with-bfg-repo-cleaner] Removing secrets with BFG Repo-Cleaner [#remove-secrets-with-bfg-repo-cleaner]

BFG Repo-Cleaner is a tool designed for removing large files and sensitive data, such as passwords and credentials, from a Git repository history.

The first thing to do before using BFG on a repository is to clone it on your local machine using the [.inline-code]git clone[.inline-code] command with the [.inline-code]--mirror[.inline-code] flag:

$ git clone --mirror <url>

Where:

  • [.inline-code]<url>[.inline-code] is the URL of the Git repository you want to clone.

This will create an exact and complete replica of the original repository, including all branches, tags, and commits. Note that it is usually recommended to make a backup of this copy in order to prevent unintentional or unexpected data loss.

Next, you can download the BFG Repo-Cleaner from the official website and run the following command to clean your repository:

$ java -jar bfg.jar --delete-files <files> <repository>

Where:

  • [.inline-code]java -jar bfg.jar[.inline-code] is used to execute the BFG tool.
  • [.inline-code]<files>[.inline-code] are the files you want to delete from the repository's history.
  • [.inline-code]<repository>[.inline-code] is the path to the Git repository on your local machine you want to clean up.

Finally, once the files have been removed by BFG, you can push the updated history version to the remote repository using the [.inline-code]git push[.inline-code] command:

$ git push

Note that, although BFG will clean the repository's commits, branches, and tags, it will not physically eliminate the unwanted data. For a thorough removal of this data, please refer to the "Removing all references" section later in this article.

[#remove-secrets-with-git-filter-repo] Removing secrets with git-filter-repo [#remove-secrets-with-git-filter-repo]

git-filter-repo is a versatile and powerful tool designed for filtering and rewriting the history of Git repositories, modifying commit messages, combining repositories, and more. It is especially useful for removing files containing sensitive information, managing large files, and restructuring repositories for optimal efficiency and security.

[#install-git-filter-repo] Installing git-filter-repo [#install-git-filter-repo]

To install it on Windows, you can use the following [.inline-code]pip[.inline-code] command:

$ pip install --user --upgrade git-filter-repo

To install it on macOS, you can use the following [.inline-code]brew[.inline-code] command:

$ brew install git-filter-repo

To install it on Linux, you can download the executable and place it anywhere in your [.inline-code]$PATH[.inline-code] to use it. See the installation documentation for more information.

Removing sensitive files

To remove sensitive files from a Git repository history, you can use the following [.inline-code]git filter-repo[.inline-code] command:

$ git filter-repo --path <files> --invert-paths

Where:

  • [.inline-code]<files>[.inline-code] are the path to the files you want to remove.
  • [.inline-code]--invert-paths[.inline-code] is used to include all files except those explicitly mentioned by [.inline-code]--path[.inline-code].

Note that, by default, the [.inline-code]git filter-repo[.inline-code] command will automatically remove all references to the remote repository in order to avoid accidental updates before everything is ship-shape.

After you've double-checked that the repository is as you want it to be, you can manually re-add those references using the following [.inline-code]git remote[.inline-code] command:

$ git remote add origin <url>

Where:

  • [.inline-code]<url>[.inline-code]: is the URL of the Git repository you are fixing.

To update the remote repository and forcefully overwrite its history, you can use the [.inline-code]git push[.inline-code] command with the [.inline-code]--force[.inline-code] flag as follows:

$ git push origin --force --all

Finally, to ensure that the sensitive files are also removed from your tagged releases, you can perform a force push against your Git tags as follows:

$ git push origin --force --tags

You can learn more about this command by visiting the official user manual page.

Note that, both the BFG Repo-Cleaner and [.inline-code]git-filter-repo[.inline-code] are pretty heavy-handed as they remove references and changes to these files since the beginning of the repository's existence. However, if the change is more recent (specifically the last Git commit you made), undoing the last Git commit might be of better service to you. 

[#easy-recall-syntax-with-ai] Easily retrieve this command using Warp’s AI Command Suggestions [#easy-recall-syntax-with-ai]

If you’re using Warp as your terminal, you can easily retrieve the [.inline-code]git-filter-repo[.inline-code] command using the Warp AI Command Suggestions feature:

Entering [.inline-code]remove secret from git[.inline-code] in the AI Command Suggestions will prompt a [.inline-code]git[.inline-code] command that can then quickly be inserted into your shell by doing [.inline-code]CMD+ENTER[.inline-code].

[#remove-all-references] Removing all references [#remove-all-references]

Although tools like BFG and git-filter-repo will clean up your repository's history from secrets, it is possible that the repository may still have references to some of these files and commits. To completely clean the repository from all references, you have to delete out-of-date references, expire and prune reflogs, force garbage collection, and encourage your collaborator to rebase their branches.

[#remove-out-of-date-references] Deleting out-of-date references [#remove-out-of-date-references]

In order to completely delete all old references (i.e. branches and tags) pointing to the previously cleaned up files and commits, you can use the following [.inline-code]git[.inline-code] command:

$ git for-each-ref --format="delete %(refname)" <namespace> | git update-ref --stdin

Where:

  • The [.inline-code]git for-each-ref[.inline-code] command is used to iterate over references in a Git repository.
  • The [.inline-code]--format="delete %(refname)"[.inline-code] flag is used to specify the format for each reference to be deleted followed by the reference name. This is used to create a list of commands to delete references.
  • The [.inline-code]<namespace>[.inline-code] argument is the source of the references you want to process.
  • The [.inline-code]git update-ref --stdin[.inline-code] command is used to read and execute the commands generated by the [.inline-code]git for-each-ref[.inline-code] command.

For example, the following command will delete all the references in the[.inline-code] refs/heads/feature/[.inline-code] namespace:

$ git for-each-ref --format="delete %(refname)" refs/heads/feature/ | git update-ref --stdin

[#expire-and-prune-reflogs] Expiring and pruning reflogs [#expire-and-prune-reflogs]

Once all the references have been deleted, the next step consists in deleting the repository's reflogs, which are records of the various updates made to the repository over time.

To expire the reflogs, you can use the following [.inline-code]git reflog[.inline-code] command:

$ git reflog expire --expire=now --all

Where:

  • The [.inline-code]--expire=now[.inline-code] flag is used to expire all reflog entries immediately, which means that any reflog entry older than this instant will be considered for deletion.
  • The [.inline-code]--all[.inline-code] flag is used to include all the reflogs of all the branches and references in the repository.

[#force-garbage-collection] Forcing the garbage collection [#force-garbage-collection]

To thoroughly clean up your repository and permanently erase any residual traces of sensitive data, you can perform a manual garbage collection using the following [.inline-code]git gc[.inline-code] command:

$ git gc --prune=now

Where:

  • The [.inline-code]--prune=now[.inline-code] flag is used to ensure that the garbage collection process also prunes unreachable objects in the repository that are not referenced by any branch or tag.

This will enhance your repository's security and cleanliness by removing unnecessary files and optimizing Git's objects.

[#use-rebase-instead-of-merge] Encouraging your collaborators to rebase [#use-rebase-instead-of-merge]

Lastly, since Git is by design a distributed system, some of your collaborators may still have bits of the old repository's history on their local machines.

This is why you should encourage them to rebase their branches instead of merging them, as it will minimize the chances of unintentionally reintroducing commits and references to the previously removed sensitive data back into the repository's history.

[#prevent-accidental-commits] Preventing future accidental commits [#prevent-accidental-commits]

While knowing how to clean up a repository's history is crucial to maintain its integrity and security, it is equally important to implement practices that will reduce the likelihood of sensitive data exposure.

[#review-staged-changes] Reviewing changes before pushing [#review-staged-changes]

The first thing you can do to prevent pushing sensitive data to a remote repository is to carefully review the staged files before they are committed using the [.inline-code]git status[.inline-code] command.

You can also use other Git tools such as lazygit, which offers a more visual representation of branches, commit history, files, and changes.

Finally, you can enhance your text editor with Git plugins, such as magit for Emacs or fugitive for Vim and Neovim, that provide support for Git commands and functionalities in a more interactive and visual way.

[#use-a-gitignore-file] Adding sensitive files to [.inline-code].gitignore[.inline-code] [#use-a-gitignore-file]

The easiest way to avoid accidentally committing unwanted, sensitive, or large files to a repository is to list their paths into the [.inline-code].gitignore[.inline-code] file.

This will tell Git not to track them and prevent them from being included in the staging area, even when staging the entire project using commands such as [.inline-code]git add .[.inline-code] or [.inline-code]git add -A[.inline-code].

For example, you can use the [.inline-code]echo[.inline-code] command the following way to add a file to the [.inline-code].gitignore[.inline-code] file:

$ echo "<file_name>" >> .gitignore

Where

  • [.inline-code]<file_name>[.inline-code] is the path to the file that contains your sensitive data.

Note that the [.inline-code].gitignore[.inline-code] file should also be committed to the repository to ensure consistency across all the environments the project is cloned in:

$ git add .gitignore 
$ git commit -m "Update gitignore"

[#use-git-hooks] Using Git hooks [#use-git-hooks]

With Git Hooks, you can write "pre-commit" hook and "pre-push" hook scripts that will scan the staged files for secrets using commands like [.inline-code]grep[.inline-code] or [.inline-code]sed[.inline-code], and prevent the files from being committed to the history or pushed to the remote repository until the data is removed.

Note that you can also use more advanced tools like git-secrets that will automatically scan the files being committed for any matches with a list of user-defined patterns or regular expressions that are commonly associated with sensitive information.