Tag Archives: Python

Loading Dirty JSON With Python

Recently I needed to parse some data embedded in HTML. At first glance it appeared to be JSON, so after pulling the text out of the HTML using BeautifulSoup, I tried to load it using the json module, however this immediately threw an error:

ValueError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

This is because,  despite first appearances, the data I was trying  to extract was a python object built from strings, lists, integers, floats, and dictionaries which had been passed to the ‘print’ statement. But it was quite close to JSON so I decided that the best course of action in this instance was to ‘fix’ the data so that I could load it as JSON.

First, as the error above indicates, double quotes are required, not the single quotes mostly (but not always prefixed with a ‘u’  (indicating unicode) which my data had.

After removing these I encountered the error:

ValueError: No JSON object could be decoded

This thoroughly unhelpful error sent me scurrying to Google. Apparently this error is thrown in a variety of situations, but the one relevant to my data was the case of the boolean key words (True and False) in python they are capitalised, but in JSON they need to be lowercase. (This error is also thrown when there are trailing commas in lists).

I used regular expression substitution to implement these alterations. I decided to share these few lines of code for my future self and anyone else who may find it useful. (Note that this worked for my use case, but as soon as exceptions stopped being thrown I moved on. Therefore it may not be a robust or complete solution. You have been warned.)

import re
import json

def load_dirty_json(dirty_json):
    regex_replace = [(r"([ \{,:\[])(u)?'([^']+)'", r'\1"\3"'), (r" False([, \}\]])", r' false\1'), (r" True([, \}\]])", r' true\1')]
    for r, s in regex_replace:
        dirty_json = re.sub(r, s, dirty_json)
    clean_json = json.loads(dirty_json)
    return clean_json

GP3Finder – Group Policy Preference Password Finder

Group Policy preferences were introduced by Microsoft in Windows 2008 allowing administrators to configure unmanaged settings (settings which the user can change) from a centrally managed location – Group Policy Objects (GPO) [1].

Among the preference items configurable through Group Policy preferences are several that can contain credentials: Local Groups and User Accounts, Drive Mappings, Schedule Tasks, Services, and Data Sources.

These credentials are stored within the preference item in SYSVOL in the GPO containing that preference item. In order to obscure the password from casual users it is encrypted in the XML source code of the preference item [2]. However anyone who gains access to SYSVOL can decrypt the passwords because Microsoft published the Advanced Encryption Standard (AES) encryption key [1]:

4e 99 06 e8  fc b6 6c c9  fa f4 93 10  62 0f fe e8
f4 96 e8 06  cc 05 79 90  20 9b 09 a4  33 b6 6c 1b

Microsoft addressed this issue in MS14-025 [4] however this update only prevented the creation of new Group Policy Preference items containing credentials; it did not remove any existing instances as this was considered too disruptive. Therefore network administrators must take action to find and remove these vulnerable items.

Several tools exist to exploit this vulnerability including:

Get-GPPPassword (PowerShell – http://obscuresecurity.blogspot.co.uk/2012/05/gpp-password-retrieval-with-powershell.html)

gpp (Metasploit Post Module – http://www.rapid7.com/db/modules/post/windows/gather/credentials/gpp)

gpprefdecrypt.py (Python – http://esec-pentest.sogeti.com/public/files/gpprefdecrypt.py)

gpp-decrypt-string.rb (Ruby – http://carnal0wnage.attackresearch.com/2012/10/group-policy-preferences-and-getting.html)

However each of these existing tools have a significant weakness. Get-GPPPassword must be run from a Windows machine, the gpp Metasploit post module requires a meterpreter session, gpprefdecrypt.py and gpp-decrypt-string.rb require you to manually extract the cpassword for decryption, and finally the version of gpprefdecrypt.py available for download no longer works at the time of writing (due to an update to PyCrypto that removed the default iv of 16 bytes of zeros).

I therefore wrote a new cross platform tool, dubbed GP3Finder (Group Policy Preference Password Finder), to automate the process of finding, extracting and decrypting passwords stored in Group Policy preference items. This tool is written in Python (2.7) and depends on PyCrypto and PyWin32 on Windows or subprocesses on *nix based operating systems.

GP3Finder has been released open source under the GPL2 license here a compiled executable for Windows is also available here.

Update v4.0

On a recent test I had compromised a single Windows host and had remote desktop access as a low privilege user. Since I couldn’t map the C$ share remotely, and didn’t want to search through the dozens of Group Policy Preference items using built in Windows utilities, I quickly added the functionality to gp3finder instead.

Note: Group Policy Preferences are cached locally under the (hidden) directory: “C:\ProgramData\Microsoft\Group Policy\History\” by default.

In this update I also add the option to specify the start path when searching a remote share. This allows you to quickly search for Group Policy Preference passwords when you have access to the C$ share without searching the entire drive.

Another significant change is that you can now specify multiple hosts to search – ideal if you have access to C$ on a number of hosts and want to check all of them. Note, this functionality is not threaded (yet) so can take some time to complete.

Finally I have changed some of the command line options to ensure they are as intuitive as possible (see below or –help).

Example Usage

Decrypt a given cpassword:

gp3finder.py -D CPASSWORD

The following commands output decrypted cpasswords (from Groups.xml etc) and list of xml files that contain the word ‘password’ (for manual review) to a file (‘gp3finder.out’ by default, this can be changed with -o FILE).

Find and decrypt cpasswords on domain controller automatically:

 Password: PASSWORD

Maps DOMAIN_CONTROLLER’s sysvol share with given credentials.

Find and decrypt cpasswords on the local machine automatically:

gp3finder.py -A -l

Searches through “C:\ProgramData\Microsoft\Group Policy\History” (by default) this can be changed with -lr PATH

Find and decrypt cpasswords on a remote host:

gp3finder.py -A -t HOST -u DOMAIN\USER -s C$ -rr "ProgramData\Microsoft\Group Policy\History"

Find and decrypt cpasswords on hosts specified in a file (one per line):

gp3finder.py -A -f HOST_FILE -u DOMAIN\USER -s C$ -rr "ProgramData\Microsoft\Group Policy\History"

Note: the user this script is run as must have permission to map/mount shares if running against a remote host.

Additional options are available:

gp3finder.py --help


[1] [Online]. Available: http://www.microsoft.com/en-us/download/details.aspx?id=24449).
[2] [Online]. Available: http://blogs.technet.com/b/grouppolicy/archive/2009/04/22/passwords-in-group-policy-preferences-updated.aspx.
[3] [Online]. Available: http://msdn.microsoft.com/en-us/library/2c15cbf0-f086-4c74-8b70-1f2fa45dd4be.aspx.
[4] [Online]. Available: http://blogs.technet.com/b/srd/archive/2014/05/13/ms14-025-an-update-for-group-policy-preferences.aspx.


As always, if you have any comments or suggestions please feel free to get in touch.

Raw HTTP Requests to Burp Proxy

On a recent Web application test I encountered a new challenge. The Web application presented a Web API intended to be used by a mobile application, in order for developers to utilise this API the documentation was also served from the Web application.

In order to assess each API function for vulnerabilities I first had to build valid requests from the documentation and then get them into my Web assessment tool of choice Burp Suite Pro.

It would have been possible to accomplish this by reading the documentation and patiently typing the raw HTTP request into Burp repeater. However with over thirty API functions to test and a tight schedule this was not a viable option. I therefore decided to script it.

The first step was to download all of the HTML documentation and parse each page to extract the HTTP method, path, example URL parameters, and, if present, the example body parameters. Using this information I built raw HTTP requests which I stored in text files. (As this first script is quite specific to the client’s application I will not be releasing it at this time).

With a directory full of raw HTTP requests it was time to import them into Burp and start testing proper. However I could not find any method of importing my raw HTTP requests into Burp other than manually copying and pasting them into repeater, an achievable task with the relatively small number of functions I had to test in this instance but a chilling prospect for future, larger tests.

After a coffee I had the idea to simply send the raw HTTP request through Burp by sending them from a Web client with a proxy configured. Since the requests had a variety of HTTP methods and body parameters a Web browser wasn’t an option. I briefly tried using telnet and netcat but these failed for some reason I haven’t identified. I also tried using curl, but this required further processing to issue the request using the curls’s command line options. I therefore turned back to Python and wrote a script to read files from a directory, then for each file: parse them into an object (using BaseHttpRequestHandler), build a request using urllib2 and send this via a proxy.

This resulted in the HTTP request being stored in Burp ready for assessment like any normal request to a Web application – visible in the site map, proxy history and easily sent to Intruder, Repeater, Scanner and Sequencer.

I’ve released this script under the GPLv2 licence in the hope that it will be useful to others, it is available here.

Example Usage

Parse one or more files and send via the default proxy (

raw2proxy.py -f FILENAME FILENAME...

Parse a directory of files and send via a proxy running on port 9001:

raw2proxy.py -d DIRECTORY -p

Additional options are available:

raw2proxy.py --help

As always, if you have any comments or suggestions please feel free to get in touch.

Python Script to Standalone Executable (with Icon)

When releasing tools, and proof of concepts, to the industry and more often to clients, I find I need to provide a standalone executable that can be run without installing Python and any required modules.

To accomplish this I use py2exe . While other options exist (for example pyinstaller) personally I have found py2exe quicker and easier to use once a few stumbling blocks were overcome. I therefore decided to write a short post describing how I setup and use py2exe for when laptop rebuild time comes around and in the hope it will be useful to others.

First, at the time of writing, py2exe does not support creating a single executable using 64 bit Python, throwing the error:

error: bundle-files 1 not yet supported on win64

So step 1 is to install 32 bit Python (being careful not to overwrite your existing 64 bit installation) and 32 bit versions of any non standard library modules that are required by your script.

Next you need to install py2exe itself. The project home page points to the SourceForge project page. Ensure you download the 32 bit version for the version of Python you have installed.

Now you are ready to create the script that will create your standalone executable. There are many options available, but I find the following minimal script very effective. This script will create a single executable (‘bundle_files’) for script.py.

from distutils.core import setup
import py2exe, sys


        options = {
                    'py2exe': {'bundle_files': 1,
                               'compressed': True
        console = [{
                    'script': "script.py"
        zipfile = None,

The one additional option I sometimes use is to add a custom icon to the executable. To do this I first create my icon image (256×256 pixels) in an image editor and export the required sizes (16×16, 32×32, 48×48, 256×256) in the png image format. I then use png2ico to create a .ico file, note the order in which you add the different size images is important it must be largest to smallest otherwise the icon may not be displayed at all! i.e:

png2ico favicon.ico icon_256.png icon_48.png icon_32.png icon_16.png

With the icon (favicon.ico) created the following script can be used to turn script.py into a standalone executable with an icon.

from distutils.core import setup
import py2exe, sys


        options = {
                    'py2exe': {'bundle_files': 1,
                               'compressed': True
        console = [{
                    'script': "script.py",
                    'icon_resources': [(0, 'favicon.ico')]
        zipfile = None,

Once the setup.py script above has been written, the standalone executable can be created simply by running it using your 32 bit Python installation (my 32 bit installation is at ‘C:\Python27_x86\python’):

C:\Python27_x86\python setup.py

By default the executable will be created in the “dist” directory.

As always, if you have any comments or suggestions please feel free to get in touch.


TLDR: Python script to automate the extraction of hashes from ntds.dit and system files. Available here : https://bitbucket.org/grimhacker/esedbxtract.

During an internal Penetration Test, once I’ve gained Domain Administrator access the fun doesn’t stop. In order to test the strength of user account passwords I need to retrieve the password hashes.

There are several ways to do this with either specialist tools or builtin Windows utilities as  @lanjelot discusses here and Inquis discusses here and here.

My preferred method is to use Volume Shadow Copy to extract a copy of the NTDS.dit and SYSTEM files, since this is an administrative task carried out with Windows utilities it does not normal cause alerts or require the Antivirus to be disabled – as is the case with some of the other options.

Once I have these files on my Kali box I use esedbexport from libesedb to export the data and link tables, these tables are used with the SYSTEM file by ntdshashes.py (available here by LaNMaSteR53 based on ntdsusers.py from ntdsxtract) to get the hashes.

This method can take a long time if the Active Directory is very large, but at this point during the assessment I’m not usually in a rush, it is less intensive on the domain controller, and it doesn’t panic the IT staff of the organisation with alerts.

There are a few reasons why I wrote a script to automate the extraction of the hashes from the NTDS.dit and SYSTEM files.

First and foremost, it’s a multi step process that doesn’t actually require any brain power – just for the output of one tool to be fed into another in the correct way. Writing a script to do this means that time isn’t wasted waiting for me to come back to it between each step. And of course it is another success for my continuing mission to replace myself with a small script…

Second ntdshashes.py didn’t work with the the latest version of ntdsxtract when I last rebuilt my machine (because a ntdsxtract added a new required working directory parameter). I patched the script (and raised an Issue on the repository) but decided it wouldn’t take much work to rewrite it as a class and include it in a larger tool.

Finally I hadn’t played with calling external programs as a subprocess in python before and this seemed like a reasonable excuse.

The result is esedbxtraxt.py available on BitBucket: https://bitbucket.org/grimhacker/esedbxtract.

Dependencies are stated in the README.

Normal usage will result in the password hashes being written to ‘hashes.pwdump’:

python esedbxtract.py -n /path/to/NTDS.dit -s /path/to/SYSTEM

Use ‘–help’ or see the README for further options.

As always, if you have any questions, comments or suggestions please feel free to get in touch.



I recently had need to interpret bitfields with Python.
I’m quite happy with the 3 lines of code that I came up with so I thought I’d share them in case they are of use to anyone else.

Bitfields are basically a binary number where each bit is assigned a meaning which can either have a value of True ‘1’ or False ‘0’.
Usually they are interpreted using bit shifting and bitwise AND operations but this seemed to be quite involved to get the data into a usable form so I found another way.

Consider the pwdProperties attribute from Active Directory (http://msdn.microsoft.com/en-us/library/ms679431(v=vs.85).aspx) which contains several settings for the account as a bitfield which can be retrieved using an LDAP query.

Each of the bits of this attribute mean the following:

So if the pwdProperties attribute has a value of 17 in decimal, which equals 010001 in binary, the 1st and 5th bits (from the right) are set to 1 indicating that the domain requires complex passwords and stores passwords in cleartext.

Using python-ldap this attribute is returned in a dictionary as a decimal number represented as a string within a list, i.e.

attrs = {'pwdProperties': ['17']}

So the first step is to extract the string of the number and convert it to an integer:

pwd_properties = int(attrs['pwdProperties'][0])

Next the decimal number is converted to a string representation of the binary number with left 0 padding to the correct length:

pwd_properties = format(pwd_properties, "06b")

Then the binary number string is split into a list:

pwd_properties = list(pwd_properties)

For my purposes I needed the bitfields to be represented as a boolean. To do this a the string replace() method is used to replace instances of ‘0’ with an empty string and then the bool() function is used to convert the result to either True or False while iterating over the list. (Note when dealing with strings an empty string is False and everything else is True).

bitfield_values = [bool(w.replace('0', '')) for w in pwd_properties]

Next a list containing the meaning of each bit is defined (make sure you have them in the correct order to match the bits) :

bitfield_keys = ['refuse_password_change', 'password_store_cleartext', 'lockout_admins', 'password_no_clear_change', 'password_no_anon_change', 'password_complex']

The two lists can then be formed into a list of tuples using zip() which is then used to create a dictionary using dict() :

pwd_properties = dict(zip(bitfield_keys, bitfield_values))

Finally this can all be condensed into :

bitfield_keys = ['refuse_password_change', 'password_store_cleartext', 'lockout_admins', 'password_no_clear_change', 'password_no_anon_change', 'password_complex']
bitfield_values = [bool(w.replace('0', '')) for w in list(format(int(attrs['pwdProperties'][0]), '06b'))]
pwd_properties = dict(zip(bitfield_keys, bitfield_values))

Resulting in a dictionary like this:

{'password_store_cleartext': True,
'password_no_anon_change': False,
'lockout_admins': False,
'refuse_password_change': False,
'password_no_clear_change': False,
'password_complex': True}

A limitation of this method is that it is not easy to go from the resulting dictionary back to the bitfield because a dictionary in Python is unordered. This can probably be overcome by using an ordered dictionary from the collections module. However for my current purpose there is no advantage to implementing this.

I have been mulling this over and come up with the following line to convert the dictionary back to a binary number :

int("{refuse_password_change}{password_store_cleartext}{lockout_admins}{password_no_clear_change}{password_no_anon_change}{password_complex}".format(**pwd_properties).replace('True', '1').replace('False', '0'), 2)

This is probably horribly inefficient due to the string replacement, but it works.
It takes advantage of unpacking and referencing keyword arguments to form a string with the values in the correct order, then replaces the strings ‘True’ and ‘False’ with ‘1’ and ‘0’ respectively before using the int() function to convert the string base 2 number (i.e. binary) to a decimal number.

It might be more efficient to avoid string replacement like this :

int("{refuse_password_change}{password_store_cleartext}{lockout_admins}{password_no_clear_change}{password_no_anon_change}{password_complex}".format(**dict(zip(pwd_properties.keys(), ['1' if pwd_properties[key] == True else '0' for key in pwd_properties.keys()]))), 2)

This recreates the dictionary with ‘1’ and ‘0’ by testing each key for True. Then takes advantage of unpacking and keyword arguments to get the bits in the correct order, before converting to a decimal number using int().

At some point I’ll time these two methods to find which is more efficient and update this post.

As always, if you have any comments or suggestions please feel free to get in touch.

Nettynum – A Windows Domain Enumeration Tool

TL;DR: I’ve written a Windows Domain Enumeration Tool, you can get it from here: https://bitbucket.org/grimhacker/nettynum.

Enumerating information from the Windows Domain is nothing new and has been quite extensively covered in books, blog posts and academic papers.

So what is the point off this post you ask? To release my first tool to the community!

There are a lot of very good tools to extract information from the Windows Domain, some of which have been especially created for use by Penetration Testers while others were created for administration.

However when on tests there are a lot of things that I need to be doing and taking the output of one tool, and feeding into another is not one of them. I wanted a tool that I could set running with no information and have it find everything it could about the Windows Domain, including the Domain Name, Domain Controllers, Domain Groups, Domain Admins, and Accounts Policies.  i.e.

“Replace myself with a small shell script.”

This lead me to create Nettynum, a Python script that automates enumeration of information from the Windows Domain for my final year project at Northumbria University. It currently uses the pywin32 to access the Windows API functions and is therefore a Windows only tool – but I have plans to address this.

For those interested all 190 pages of my dissertation including UML diagrams can be found here: ‘Oliver Morton Individual Project Nettynum‘.
It was accompanied by the first version of Nettynum (those au fait with git can checkout the first commit in the main branch of the BitBucket repository).

The automated domain enumeration option will:

  1. Discover the NetBIOS and DNS style domain names on the network.
  2. Discover the domain controllers for these domains (IP address, NetBIOS name, and DNS name).
  3. Authenticate to the domain controller (with a NULL session by default).
  4. Enumerate a List of Domain Groups (and their comment).
  5. Enumerate the members of groups that match a regular expression (.*admin.*) by default.
  6. Also attempt to enumerate the members of certain groups (“Domain Admins” and “Enterprise Admins” by default) regardless of whether they were in the list of groups or not. – This is to work around the 100 group limit of the Windows API.
  7. For each of the discovered users, enumerate the account information (including comment, whether it is disabled, SID,  Bad Password Count, Password Age, and more.)
  8. Enumerate the groups of which these users are a member.  (At the time of writing I am not aware of another tool that does this.)
  9. Enumerate the Accounts Policies (Lockout Threshold, Duration and Observation Window, Minimum/Maximum Password Age, Minimum Password Length, and Password History Length).
  10. Deauthenticate from the domain controller.
  11. Output this information as an XML file (so it can be easily parsed for use by other tools [I have ideas, watch this space]) with a style sheet applied (so that the file can be opened in a browser and read easily by a human).

So in short, if the domain controller has NULL sessions enabled this:

python nettynum.py -A

Results in this:

There is also an automated option for ‘local’ enumeration which discovers hosts and then proceeds to enumerate information on a ‘local’ level before generating a report.

Nettynum also offers targeting options to limit the scope to specified Domain Names, Domain Controllers, or Groups. It also allows the user to specify authentication credentials, or will use a previously established session with the host (and leave it intact afterwards).

Finally there is also a ‘manual’ mode which allows the user to specify a particular category of information to enumerate but requires prerequisite information to be provided.

Where to get it: clone or download from BitBucket https://bitbucket.org/grimhacker/nettynum

Dependencies and usage are in the README file.

I have no doubt that there are bugs hiding in the code partly due the nightmare that is Windows documentation and partly because of my mediocre programming skills, so if you find any please let me know!

As always, if you have any comments or suggestions please feel free to get in touch.