Solution for Ansible git module getting stuck on clone

If you use git module in Ansible to checkout a git repository over an SSH transport and it gets stuck on the initial clone, the problem most likely is that known_hosts file doesn’t exist or it doesn’t contain a host entry for the server you want to clone from.

If you use verbose log level (-vvv option), the output will look something like this:

<> EXEC ['ssh', '-tt', '-q', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/home/user/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'Port=22', '-o', 'IdentityFile=/home/user/.vagrant.d/insecure_private_key', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'User=vagrant', '-o', 'ConnectTimeout=10', '', "/bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-1381761458.81-264107624355792 && chmod a+rx $HOME/.ansible/tmp/ansible-1381761458.81-264107624355792 && echo $HOME/.ansible/tmp/ansible-1381761458.81-264107624355792'"]
<> REMOTE_MODULE git repo=[email protected]:user/repo.git dest=/data/code/ version=master
<> PUT /tmp/tmpIR_UtZ TO /home/vagrant/.ansible/tmp/ansible-1381761458.81-264107624355792/git
<> EXEC ['ssh', '-tt', '-q', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/home/user/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'Port=22', '-o', 'IdentityFile=/home/user/.vagrant.d/insecure_private_key', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'User=vagrant', '-o', 'ConnectTimeout=10', '', '/bin/sh -c \'sudo -k && sudo -H -S -p "[sudo via ansible, key=lylesdmziljbasahwtnbbiozodifcccn] password: " -u root /bin/sh -c \'"\'"\'/usr/bin/python /home/vagrant/.ansible/tmp/ansible-1381761458.81-264107624355792/git; rm -rf /home/vagrant/.ansible/tmp/ansible-1381761458.81-264107624355792/ >/dev/null 2>&1\'"\'"\'\'']

Unlike other remote command execution frameworks like fabric, Ansible doesn’t propagate prompts (which imo is good, since you really should automate everything when using a deploy tool) and it simply get stuck and eventually times out (getting stuck is definitely a bad UX, but that’s a different topic).

A lot of online tutorials simply suggest to disable strict host checking in the SSH config. Unless you really know what you are doing, this is simply a bad idea from a security perspective.

The correct thing to do is to add host key of the server you are cloning from to the .ssh/known_hosts file.

Since you are using Ansible, you should also use it to manage known_hosts file. Two approaches which show how to do this are displayed bellow.

1. Use Ansible to manage the whole known_hosts file (advised)

This approach simple copies known_hosts file from the local ansible files/ directory to a remote server.

  - name: Install known_hosts file
    copy: src=known_hosts dest=/home/${ansible_ssh_user}/.ssh/known_hosts owner=${ansible_ssh_user} group=${ansible_ssh_user}

2. Use Ansible to make sure some known_hosts entries are present

This approach assumes file already exists and makes sure specified host entries are present.

- name: Add known hosts to known_hosts file
  lineinfile: dest=/home/${ansible_ssh_user}/.ssh/known_hosts regexp=^${} line="${item.line}"
   - { "host": "", "line": ", ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==" }