Skip to content

Installing a John the Ripper Cluster (Fedora 23/24)

Last updated on 28 September 2016

John the Ripper is an excellent password cracking tool that I regularly use during penetration tests to recover plaintext passwords from multiple hash formats.

I recently started building a new dedicated rig with the sole purpose of cracking passwords. I didn’t have any money to invest in this project, so I am using whatever servers and workstations are lying around unused in the back of the server room. I therefore decided my best bet for maximum hash cracking goodness would be to use John in parallel across all these machines. This is a first for me so I thought I had better document how I did it for when it all burns to the ground and I have to start again. There are several guides online dictating how to achieve this kind of setup using Kali or Ubuntu, but I prefer Fedora so had to alter a lot of the commands to suit and I encountered some odd errors along the way.  I hope this (rambling) guide is useful to others as well as my future self.

[Note this post is a little bit of a work in process as I continue to build and refine the cracking rig.] 

I’m using Fedora 23 Server on each of the hosts,  have configured a static IP address using the Cockpit interface on port 9090 (i.e https://ip_address:9090), and created a user which will be used to authenticate between the hosts.

[Update: I have successfully used the same process on Fedora 24]

On a side note, if when running dnf commands you encounter an error like:

Error: Failed to synchronize cache for repo 'updates' from 'https://mirrors.fedoraproject.org/metalink?repo=updates-released-f23&arch=x86_64': Cannot prepare internal mirrorlist: Curl error (60): Peer certificate cannot be authenticated with given CA certificates for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f23&arch=x86_64 [Peer's Certificate has expired.]

You may not have access to the Internet because your network firewall / proxy is trying to mitm  – in my case I fixed this by shouting over to the firewall guy in the office.

As I need support for a large variety of hash formats the version of John in the Fedora repositories is useless to me, instead I am using the community enhanced edition.

Compiling John

Non-clustered

There are several dependencies that you need before attempting to build:

sudo dnf install openssl openssl-devel gcc

I first tried to install this from the tarball available on the openwall site. However when running:

cd src/
. /configure && make

I encountered some errors like this:

In file included from /usr/include/stdio.h:27:0,
 from jumbo.h:20,
 from os-autoconf.h:29,
 from os.h:20,
 from bench.c:25:
 /usr/include/features.h:148:3: warning: #warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Wcpp]
 # warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE"
 ^
 gpg2john.c: In function ‘pkt_type’:
 gpg2john.c:1194:7: warning: type of ‘tag’ defaults to ‘int’ [-Wimplicit-int]
 char *pkt_type(tag) {
 ^
 /usr/bin/ar: creating aes.a
 dynamic_fmt.o: In function `DynamicFunc__crypt_md5_to_input_raw_Overwrite_NoLen':
 /opt/john-1.8.0-jumbo-1/src/dynamic_fmt.c:4989: undefined reference to `MD5_body_for_thread'
 dynamic_fmt.o: In function `DynamicFunc__crypt_md5':
 /opt/john-1.8.0-jumbo-1/src/dynamic_fmt.c:4425: undefined reference to `MD5_body_for_thread'
 dynamic_fmt.o: In function `DynamicFunc__crypt_md5_in1_to_out2':
 /opt/john-1.8.0-jumbo-1/src/dynamic_fmt.c:4732: undefined reference to `MD5_body_for_thread'
 dynamic_fmt.o: In function `DynamicFunc__crypt_md5_to_input_raw':
 /opt/john-1.8.0-jumbo-1/src/dynamic_fmt.c:4903: undefined reference to `MD5_body_for_thread'
 dynamic_fmt.o: In function `DynamicFunc__crypt_md5_to_input_raw_Overwrite_NoLen_but_setlen_in_SSE':
 /opt/john-1.8.0-jumbo-1/src/dynamic_fmt.c:4946: undefined reference to `MD5_body_for_thread'
 dynamic_fmt.o:/opt/john-1.8.0-jumbo-1/src/dynamic_fmt.c:4817: more undefined references to `MD5_body_for_thread' follow
 collect2: error: ld returned 1 exit status
 Makefile:294: recipe for target '../run/john' failed
 make[1]: *** [../run/john] Error 1
 Makefile:185: recipe for target 'default' failed
 make: *** [default] Error 2

I have no idea why this is and could not find out how to fix it, however I did discover that the version on GitHub is not affected by this problem.

sudo dnf install git
git clone https://github.com/magnumripper/JohnTheRipper.git
cd JohnTheRipper/src
./configure && make -s clean && make -s

You should now be able to use John :

cd ../run
./john --test

This gives a useable version of John for a single machine, but it will not work for a cluster.

Clustered (openmpi support)

For a cluster we need openmpi support.

First install the dependencies:

sudo dnf install openssl openssl-devel gcc openmpi openmpi-devel mpich

If you now try to build with openmpi support :

cd src/
./configure --enable-mpi && make -s clean && make -s

You will probably encounter an error like this:

checking build system type... x86_64-unknown-linux-gnu
 checking host system type... x86_64-unknown-linux-gnu
 checking whether to compile using MPI... yes
 checking for mpicc... no
 checking for mpixlc_r... no
 checking for mpixlc... no
 checking for hcc... no
 checking for mpxlc_r... no
 checking for mpxlc... no
 checking for sxmpicc... no
 checking for mpifcc... no
 checking for mpgcc... no
 checking for mpcc... no
 checking for cmpicc... no
 checking for cc... cc
 checking for gcc... (cached) cc
 checking whether the C compiler works... yes
 checking for C compiler default output file name... a.out
 checking for suffix of executables...
 checking whether we are cross compiling... no
 checking for suffix of object files... o
 checking whether we are using the GNU C compiler... yes
 checking whether cc accepts -g... yes
 checking for cc option to accept ISO C89... none needed
 checking whether cc understands -c and -o together... yes
 checking for function MPI_Init... no
 checking for function MPI_Init in -lmpi... no
 checking for function MPI_Init in -lmpich... no
 configure: error: in `/opt/john/JohnTheRipper/src':
 configure: error: No MPI compiler found
 See `config.log' for more details

The error here is that the mpi compiler and it’s libraries (which we installed previously) cannot be found because they are installed to a directory not in your path.

To fix this temporarily:

export PATH=$PATH:/usr/lib64/openmpi/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib

To fix more permanently add the following to your ~/.bashrc

PATH=$PATH:/usr/lib64/openmpi/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib

You should now be able to build John with openmpi support:

. /configure --enable-mpi && make -s clean && make -s

Setting up the cluster

Now that we know how to get John installed successfully with openmpi support on Fedora Server 23, it’s time to setup the other services required to allow hashes to be cracked as part of a cluster.

In the example commands below, the master node is 192.168.0.1 and the slave node is 192.168.0.2.

EDIT: [I have recently noticed that ALL nodes need to be able to authenticate to and communicate with EVERY other node, once the number of nodes passes a certain number. Therefore every node must have an SSH key, with the public key in the authorized_keys file on every other node, and firewall rules allowing traffic between all nodes. I will find out how to automate this process and update this post in the future.]

Setup SSH Keys

On the master node generate an SSH key. The following command will prevent the passphrase prompt from appearing and set the key-pair to be stored in plaintext (this is not a very secure and I will be changing this to a more secure option on the near future, however since this key is only going to be used to access the slave nodes, and I am the only user on the box, it will do for now…):

ssh-keygen -b 2048 -t rsa -f ~/.ssh/id_rsa -q -N ""

Issue the following command on the master node to copy the ssh key to the authorized_keys file for the user of the slave node:

ssh-copy-id -i ~/.ssh/id_rsa.pub username@192.168.0.2

Note it is easiest if the user account on all of the nodes has the same name, however I believe it is possible to use different usernames (and keys) provided the ~/.ssh/config file has appropriate entries.

Ensure the permissions on the slave node are correct if you are having trouble using your key:

chmod 600 ~/.ssh/authorized_keys
chmod 700 ~/.ssh

And make sure that selinux is not getting in the way (again on the slave node):

restorecon -R -v ~/.ssh

Setup NFS

We will use an NFS share to store files that need to be shared between the nodes, for example the john.pot file and any wordlists we wish to use.

NFS and RPCBind are already installed in Fedora Server, they just need some configuration.

Make a directory to use as the nfs share on the master node and change the ownership to the user account we are using:

sudo mkdir /var/johnshare
sudo chown -R username /var/johnshare

Add this directory to the list of “exports”. Note that using ‘*’ in place of the host (in this case ‘192.168.0.2’) is a security vulnerability as it would allow ANY host to mount the share. It should be noted that any user on 192.168.0.2 will be able to mount this share, in this case this is not a serious issue since I have sole control over the node. (Information about securing NFS can be found here).

sudo vim /etc/exports
/var/johnshare  192.168.0.2(rw,sync)

Start exporting this directory and start the service:

sudo exportfs -a
sudo systemctl start nfs

At this point you should be able to see the export on localhost:

showmount -e 127.0.0.1
Export list for 127.0.0.1:
/var/johnshare 192.168.0.2

But if you try it from the slave node you will get the error:

clnt_create: RPC: Port mapper failure - Unable to receive: errno 113 (No route to host)

This is because the ports are not open on the firewall of the master node. The following commands will reconfigure the firewall to allow the services we need (on the master node):

firewall-cmd --permanent --add-service=nfs
firewall-cmd --permanent --add-service=mountd
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --reload

Note the above commands will open several ports through your firewall from ANY host, while this is often not too much of a concern on an internal trusted network, ideally you should limit the rule to only the hosts required (see below).

Mount the NFS Share on the Nodes

Make a directory to mount the nfs share onto on the slave node and change the ownership to the user we are using:

sudo mkdir /var/johnshare
sudo chown -R username /var/johnshare

You could mount the nfs share manually using the following command:

sudo mount 192.168.0.1:/var/johnshare /var/johnshare

However this will not survive a reboot. Instead you could use /etc/fstab as described here, but apparently using autofs is more efficient.

Add the following configuration file to the slave node to use autofs:

cat /etc/auto.john
/var/johnshare -fstype=nfs 192.168.0.1:/var/johnshare

Edit file /etc/auto.master on the slave node and add the line:

/- /etc/auto.john

Start the automount service to mount the share and start it at boot:

sudo systemctl start autofs
sudo systemctl enable autofs

Mpiexec Firewall Rules

Ideally we would add specific ports for mpiexec to the firewall configuration, however a large number of ports are used for communication and are dynamically assigned. According to the documentation and mailing lists it should be possible to restrict the range of ports that will be used and then allow these through the firewall. Unfortunately during my setup I was unable to accomplish this. Unwilling to fall back to turning off the firewall permanently out of frustration I decided to go for the middle ground of allowing all ports but only to specific hosts. While this is not as granular as I would like, it is certainly more secure than no firewall at all.

sudo firewall-cmd --add-rich-rule 'rule family="ipv4" source address="192.168.0.2/32" accept'
sudo firewall-cmd --reload

Enter the same commands on the slave node, substituting the IP address.

[Update: it is possible to restrict the ports mpiexec uses by entering the following configuration into /etc/openmpi-x86_64/openmpi-mca-params.conf on all of the nodes:

oob_tcp_port_min_v4 = 10000
oob_tcp_port_range_v4 = 100
btl_tcp_port_min_v4 = 10000
btl_tcp_port_range_v4 = 100

This will make mpiexec use TCP ports 10000 to 10100, and you can therefore restrict the firewall configuration by both host and ports to minimise the attack surface. Note that as more nodes are added more ports may be required.]

You can test that your openmpi installation is working by issuing the following command on the master node:

mpiexec -n 1 -host 192.168.0.2 hostname

If all is well you should see the hostname of the slave node. If you see something like this:

ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
 one or more nodes. Please check your PATH and LD_LIBRARY_PATH
 settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
 Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
 Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
 (e.g., on Cray). Please check your configure cmd line and consider using
 one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
 lack of common network interfaces and/or no route found between
 them. Please check network connectivity (including firewalls
 and network routing requirements).

Then as the error indicates you have a problem that could have multiple causes, however the one that I encountered was the local firewall blocking the unsolicited connection from the remote node. Ensure you have added appropriate allow rules and either issued the reload command or restarted the service.

Install John on the Slave Node

Assign a static IP address, add the ssh key (ensure it is possible to authenticate with the key). Follow the steps above to install the dependencies, clone john from the repo, and build it (note that slave nodes also require openmpi). Ensure that it is in the same path as the other nodes. Configure and start autofs.

Running John Across the Cluster

Create a list of nodes and available cores on that node, on the master node:

cat nodes.txt
192.168.0.1 slots=8
192.168.0.2 slots=8

To run John on the configured nodes from the master node:

mpiexec -display-map -tag-output -hostfile nodes.txt ./john /var/johnshare/hashes.pwdump --format=nt --pot=/var/johnshare/john.pot --session=/var/johnshare/my_session_name

The -display-map parameter will output a list of the nodes and the host they are on at the start of the job, -tag-output will prefix every line of output from the program with the job id and the node number. I find this information helpful, however if you prefer less verbose information they can be omitted.

Note, it is important that the session file is accessible by all nodes (i.e. it must be on the NFS share) otherwise it will not be possible to resume a crashed/cancelled session. If you do not store the session file on the share, you will see an error like the one below from each of the cores on each of your slave nodes, but the session will resume on the master node:

9@192.168.0.2: fopen: my_session_name.rec: No such file or directory
8@192.168.0.2: fopen: my_session_name.rec: No such file or directory

Another error that you may encounter is:

9@192.168.0.2: 8@192.168.0.2: [192.168.0.2:28647] *** Process received signal ***
[192.168.0.2:28647] Signal: Segmentation fault (11)
[192.168.0.2:28647] Signal code: Address not mapped (1)
[192.168.0.2:28647] Failing at address: 0x1825048b64c4
[192.168.0.2:28648] *** Process received signal ***
[192.168.0.2:28648] Signal: Segmentation fault (11)
[192.168.0.2:28648] Signal code: Address not mapped (1)
[192.168.0.2:28648] Failing at address: 0x1825048b64c4
[192.168.0.2:28647] [ 0] /lib64/libpthread.so.0(+0x109f0)[0x7f504ffb19f0]
[192.168.0.2:28647] [ 1] /lib64/libc.so.6(_IO_vfprintf+0xaef)[0x7f504fc2c38f]
[192.168.0.2:28647] [ 2] /lib64/libc.so.6(+0x4e441)[0x7f504fc2e441]
[192.168.0.2:28647] [ 3] /lib64/libc.so.6(_IO_vfprintf+0x1bd)[0x7f504fc2ba5d]
[192.168.0.2:28647] [ 4] /opt/john/JohnTheRipper/run/john[0x6354eb]
[192.168.0.2:28647] [ 5] /opt/john/JohnTheRipper/run/john[0x625b03]
[192.168.0.2:28647] [ 6] /opt/john/JohnTheRipper/run/john[0x6237cb]
[192.168.0.2:28647] [ 7] /opt/john/JohnTheRipper/run/john[0x624227]
[192.168.0.2:28647] [ 8] /opt/john/JohnTheRipper/run/john[0x62516c]
[192.168.0.2:28647] [ 9] /lib64/libc.so.6(__libc_start_main+0xf0)[0x7f504fc00580]
[192.168.0.2:28647] [10] /opt/john/JohnTheRipper/run/john[0x4065a9]
[192.168.0.2:28647] *** End of error message ***
[192.168.0.2:28648] [ 0] /lib64/libpthread.so.0(+0x109f0)[0x7f386381b9f0]
[192.168.0.2:28648] [ 1] /lib64/libc.so.6(_IO_vfprintf+0xaef)[0x7f386349638f]
[192.168.0.2:28648] [ 2] /lib64/libc.so.6(+0x4e441)[0x7f3863498441]
[192.168.0.2:28648] [ 3] /lib64/libc.so.6(_IO_vfprintf+0x1bd)[0x7f3863495a5d]
[192.168.0.2:28648] [ 4] /opt/john/JohnTheRipper/run/john[0x6354eb]
[192.168.0.2:28648] [ 5] /opt/john/JohnTheRipper/run/john[0x625b03]
[192.168.0.2:28648] [ 6] /opt/john/JohnTheRipper/run/john[0x6237cb]
[192.168.0.2:28648] [ 7] /opt/john/JohnTheRipper/run/john[0x624227]
[192.168.0.2:28648] [ 8] /opt/john/JohnTheRipper/run/john[0x62516c]
[192.168.0.2:28648] [ 9] /lib64/libc.so.6(__libc_start_main+0xf0)[0x7f386346a580]
[192.168.0.2:28648] [10] /opt/john/JohnTheRipper/run/john[0x4065a9]
[192.168.0.2:28648] *** End of error message ***
Session aborted
3 0g 0:00:00:02 0.00% (ETA: Wed 09 Jul 2031 01:57:37 BST) 0g/s 0p/s 0c/s 0C/s
4 0g 0:00:00:02 0.02% (ETA: 00:00:21) 0g/s 305045p/s 305045c/s 859007KC/s bear902..zephyr902
[192.168.0.1][[28350,1],3][btl_tcp_endpoint.c:818:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.2 failed: Connection refused (111)
5 0g 0:00:00:04 0.37% (ETA: 21:56:45) 0g/s 2599Kp/s 2599Kc/s 7557MC/s 47886406..M1911a16406
[192.168.0.1][[28350,1],4][btl_tcp_endpoint.c:818:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.2 failed: Connection refused (111)
6 0g 0:00:00:06 0.78% (ETA: 21:51:27) 0g/s 3527Kp/s 3527Kc/s 10677MC/s skyler.&01..virgil.&01
[192.168.0.1][[28350,1],5][btl_tcp_endpoint.c:818:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.2 failed: Connection refused (111)
7 0g 0:00:00:08 1.12% (ETA: 21:50:34) 0g/s 3873Kp/s 3873Kc/s 11466MC/s Angie18%$..Cruise18%$
[192.168.0.1][[28350,1],6][btl_tcp_endpoint.c:818:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.0.2 failed: Connection refused (111)
--------------------------------------------------------------------------
mpiexec noticed that process rank 7 with PID 0 on node 192.168.0.2 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

In my experience this extremely unhelpful error means that the slave node cannot access the NFS share. I first encountered this when a slave node rebooted and I learned that I had not enabled autofs to start on boot.

You can get each process to output it’s status by sending the USR1 signal using pkill (you should also run this command before cancelling a session to ensure that only the minimum work possible is lost):

pkill - USR1 mpiexec

You can force a pot sync to stop other nodes from working on hashes and salts that have already been cracked by sending the USR2 signal:

pkill - USR2 mpiexec

Adding New Nodes

Follow the same process above to add the ssh public key of the master node to the new slave node, install John with openmpi support, configure autofs to mount the nfs share, add the new node to the /etc/exports file on the master node, to the list of nodes on the master node, restrictions the ports in use if required, and add the firewall rules on the master and slave node to allow full access between them.

More Troubleshooting

After adding some nodes to my cluster and running some jobs I started to find that long running tasks (such as running john with a large wordlist and some complex rules) hung with the following error:

mca_btl_tcp_frag_recv: readv failed: Connection timed out

The Google results I found talking about this error were mostly discussing when this error was encountered at the start of a job and were often the result of host based firewalling errors. These did not seem to be relevant to my situation since the task completed, but hung on this error instead of exiting.

A quirk of my cluster  is that it is distributed around a network rather than being on its own network segment as recommended for MPI clusters. This means that the connections between the nodes are passing through a firewall, and although the ports that are required are allowed, the firewall turned out to be the cause of my problem.

The idle connection timeout on the firewall rule was configured to the default 180 seconds, increasing this to 86400 seconds (1 day) resolved the issue. 

My best guess for why this is the case is that the connection is established at the start of the job, but then remains idle until a hash is cracked, therefore on long running jobs the idle timeout is exceeded and the firewall terminates the connection. I would have expected the program to reestablish the connection when it needs to use it again, but this does not appear to be the case and instead timeouts trying to use its existing (terminated) connection.

Obviously a better solution would be to move the nodes onto the same network to remove the firewall from the equation, but unfortunately that isn’t an option right now.


As always, if you have any questions, comments or suggestions please feel free to get in touch.

Published inInstalling and Configuring (notes to my future self)

3 Comments

  1. Bill E. Ghote Bill E. Ghote

    Would like those seen performance numbers and comparison to published benchmarks.

    • GrimHacker GrimHacker

      Hi Bill, thanks for your feedback.
      I’m still in the process of building nodes at the moment. The existing nodes are an old server and laptop which just happened to be to hand, so doubt they will be breaking any records!
      But I’ll add benchmark results in the near future. 🙂

Leave a Reply to GrimHacker Cancel reply

Your email address will not be published. Required fields are marked *