2019100701-cryptopals-1.6

Finished Code for Cryptopals Challenge 1.6: Cracking the Vigenere Cipher

Discuss this post

This exercise, Break repeating-key XOR, combines all of the tools acquired so far in completing the first set of challenges:

  1. Convert hex to base64
  2. Fixed XOR
  3. Single-byte XOR cipher
  4. Detect single-character XOR
  5. Implement repeating-key XOR
  6. Break repeating-key XOR
  7. AES in ECB mode
  8. Detect AES in ECB mode

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Sep 25 19:47:03 2019

@author: dwrob
"""

"""

Break repeating-key XOR
It is officially on, now.

This challenge isn't conceptually hard, but it involves actual error-prone coding. 
The other challenges in this set are there to bring you up to speed. This one is there to qualify you. If you can do this one, you're probably just fine up to Set 6.

There's a file here. It's been base64'd after being encrypted with repeating-key XOR.

Decrypt it.

Here's how:

    Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
    Write a function to compute the edit distance/Hamming distance between two strings. 
    The Hamming distance is just the number of differing bits. The distance between:

    this is a test

    and

    wokka wokka!!!

    is 37. Make sure your code agrees before you proceed.

    For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, 
    and find the edit distance between them. Normalize this result by dividing by KEYSIZE.
    The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed 
    perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average 
    the distances.
    
    Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
    Now transpose the blocks: make a block that is the first byte of every block, and a block that 
    is the second byte of every block, and so on.
    
    Solve each block as if it was single-character XOR. You already have code to do this.
    For each block, the single-byte XOR key that produces the best looking histogram is the 
    repeating-key XOR key byte for that block. Put them together and you have the key.

This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR 
("Vigenere") statistically is obviously an academic exercise, a "Crypto 101" thing. But more people 
"know how" to break it than can actually break it, and a similar technique breaks something much 
more important.

No, that's not a mistake.

We get more tech support questions for this challenge than any of the other ones. We promise, 
there aren't any blatant errors in this text. In particular: the "wokka wokka!!!" edit distance 
really is 37.
 
"""

import base64, string

debug = 0

ASCII = string.printable
print("\nASCII characters to test:",ASCII,"\n")

ct64 = """
HUIfTQsPAh9PE048GmllH0kcDk4TAQsHThsBFkU2AB4BSWQgVB0dQzNTTmVS
BgBHVBwNRU0HBAxTEjwMHghJGgkRTxRMIRpHKwAFHUdZEQQJAGQmB1MANxYG
DBoXQR0BUlQwXwAgEwoFR08SSAhFTmU+Fgk4RQYFCBpGB08fWXh+amI2DB0P
QQ1IBlUaGwAdQnQEHgFJGgkRAlJ6f0kASDoAGhNJGk9FSA8dDVMEOgFSGQEL
QRMGAEwxX1NiFQYHCQdUCxdBFBZJeTM1CxsBBQ9GB08dTnhOSCdSBAcMRVhI
CEEATyBUCHQLHRlJAgAOFlwAUjBpZR9JAgJUAAELB04CEFMBJhAVTQIHAh9P
G054MGk2UgoBCVQGBwlTTgIQUwg7EAYFSQ8PEE87ADpfRyscSWQzT1QCEFMa
TwUWEXQMBk0PAg4DQ1JMPU4ALwtJDQhOFw0VVB1PDhxFXigLTRkBEgcKVVN4
Tk9iBgELR1MdDAAAFwoFHww6Ql5NLgFBIg4cSTRWQWI1Bk9HKn47CE8BGwFT
QjcEBx4MThUcDgYHKxpUKhdJGQZZVCFFVwcDBVMHMUV4LAcKQR0JUlk3TwAm
HQdJEwATARNFTg5JFwQ5C15NHQYEGk94dzBDADsdHE4UVBUaDE5JTwgHRTkA
Umc6AUETCgYAN1xGYlUKDxJTEUgsAA0ABwcXOwlSGQELQQcbE0c9GioWGgwc
AgcHSAtPTgsAABY9C1VNCAINGxgXRHgwaWUfSQcJABkRRU8ZAUkDDTUWF01j
OgkRTxVJKlZJJwFJHQYADUgRSAsWSR8KIgBSAAxOABoLUlQwW1RiGxpOCEtU
YiROCk8gUwY1C1IJCAACEU8QRSxORTBSHQYGTlQJC1lOBAAXRTpCUh0FDxhU
ZXhzLFtHJ1JbTkoNVDEAQU4bARZFOwsXTRAPRlQYE042WwAuGxoaAk5UHAoA
ZCYdVBZ0ChQLSQMYVAcXQTwaUy1SBQsTAAAAAAAMCggHRSQJExRJGgkGAAdH
MBoqER1JJ0dDFQZFRhsBAlMMIEUHHUkPDxBPH0EzXwArBkkdCFUaDEVHAQAN
U29lSEBAWk44G09fDXhxTi0RAk4ITlQbCk0LTx4cCjBFeCsGHEETAB1EeFZV
IRlFTi4AGAEORU4CEFMXPBwfCBpOAAAdHUMxVVUxUmM9ElARGgZBAg4PAQQz
DB4EGhoIFwoKUDFbTCsWBg0OTwEbRSonSARTBDpFFwsPCwIATxNOPBpUKhMd
Th5PAUgGQQBPCxYRdG87TQoPD1QbE0s9GkFiFAUXR0cdGgkADwENUwg1DhdN
AQsTVBgXVHYaKkg7TgNHTB0DAAA9DgQACjpFX0BJPQAZHB1OeE5PYjYMAg5M
FQBFKjoHDAEAcxZSAwZOBREBC0k2HQxiKwYbR0MVBkVUHBZJBwp0DRMDDk5r
NhoGACFVVWUeBU4MRREYRVQcFgAdQnQRHU0OCxVUAgsAK05ZLhdJZChWERpF
QQALSRwTMRdeTRkcABcbG0M9Gk0jGQwdR1ARGgNFDRtJeSchEVIDBhpBHQlS
WTdPBzAXSQ9HTBsJA0UcQUl5bw0KB0oFAkETCgYANlVXKhcbC0sAGgdFUAIO
ChZJdAsdTR0HDBFDUk43GkcrAAUdRyonBwpOTkJEUyo8RR8USSkOEENSSDdX
RSAdDRdLAA0HEAAeHQYRBDYJC00MDxVUZSFQOV1IJwYdB0dXHRwNAA9PGgMK
OwtTTSoBDBFPHU54W04mUhoPHgAdHEQAZGU/OjV6RSQMBwcNGA5SaTtfADsX
GUJHWREYSQAnSARTBjsIGwNOTgkVHRYANFNLJ1IIThVIHQYKAGQmBwcKLAwR
DB0HDxNPAU94Q083UhoaBkcTDRcAAgYCFkU1RQUEBwFBfjwdAChPTikBSR0T
TwRIEVIXBgcURTULFk0OBxMYTwFUN0oAIQAQBwkHVGIzQQAGBR8EdCwRCEkH
ElQcF0w0U05lUggAAwANBxAAHgoGAwkxRRMfDE4DARYbTn8aKmUxCBsURVQf
DVlOGwEWRTIXFwwCHUEVHRcAMlVDKRsHSUdMHQMAAC0dCAkcdCIeGAxOazkA
BEk2HQAjHA1OAFIbBxNJAEhJBxctDBwKSRoOVBwbTj8aQS4dBwlHKjUECQAa
BxscEDMNUhkBC0ETBxdULFUAJQAGARFJGk9FVAYGGlMNMRcXTRoBDxNPeG43
TQA7HRxJFUVUCQhBFAoNUwctRQYFDE43PT9SUDdJUydcSWRtcwANFVAHAU5T
FjtFGgwbCkEYBhlFeFsABRcbAwZOVCYEWgdPYyARNRcGAQwKQRYWUlQwXwAg
ExoLFAAcARFUBwFOUwImCgcDDU5rIAcXUj0dU2IcBk4TUh0YFUkASEkcC3QI
GwMMQkE9SB8AMk9TNlIOCxNUHQZCAAoAHh1FXjYCDBsFABkOBkk7FgALVQRO
D0EaDwxOSU8dGgI8EVIBAAUEVA5SRjlUQTYbCk5teRsdRVQcDhkDADBFHwhJ
AQ8XClJBNl4AC1IdBghVEwARABoHCAdFXjwdGEkDCBMHBgAwW1YnUgAaRyon
B0VTGgoZUwE7EhxNCAAFVAMXTjwaTSdSEAESUlQNBFJOZU5LXHQMHE0EF0EA
Bh9FeRp5LQdFTkAZREgMU04CEFMcMQQAQ0lkay0ABwcqXwA1FwgFAk4dBkIA
CA4aB0l0PD1MSQ8PEE87ADtbTmIGDAILAB0cRSo3ABwBRTYKFhROHUETCgZU
MVQHYhoGGksABwdJAB0ASTpFNwQcTRoDBBgDUkksGioRHUkKCE5THEVCC08E
EgF0BBwJSQoOGkgGADpfADETDU5tBzcJEFMLTx0bAHQJCx8ADRJUDRdMN1RH
YgYGTi5jMURFeQEaSRAEOkURDAUCQRkKUmQ5XgBIKwYbQFIRSBVJGgwBGgtz
RRNNDwcVWE8BT3hJVCcCSQwGQx9IBE4KTwwdASEXF01jIgQATwZIPRpXKwYK
BkdEGwsRTxxDSToGMUlSCQZOFRwKUkQ5VEMnUh0BR0MBGgAAZDwGUwY7CBdN
HB5BFwMdUz0aQSwWSQoITlMcRUILTxoCEDUXF01jNw4BTwVBNlRBYhAIGhNM
EUgIRU5CRFMkOhwGBAQLTVQOHFkvUkUwF0lkbXkbHUVUBgAcFA0gRQYFCBpB
PU8FQSsaVycTAkJHYhsRSQAXABxUFzFFFggICkEDHR1OPxoqER1JDQhNEUgK
TkJPDAUAJhwQAg0XQRUBFgArU04lUh0GDlNUGwpOCU9jeTY1HFJARE4xGA4L
ACxSQTZSDxsJSw1ICFUdBgpTNjUcXk0OAUEDBxtUPRpCLQtFTgBPVB8NSRoK
SREKLUUVAklkERgOCwAsUkE2Ug8bCUsNSAhVHQYKUyI7RQUFABoEVA0dWXQa
Ry1SHgYOVBFIB08XQ0kUCnRvPgwQTgUbGBwAOVREYhAGAQBJEUgETgpPGR8E
LUUGBQgaQRIaHEshGk03AQANR1QdBAkAFwAcUwE9AFxNY2QxGA4LACxSQTZS
DxsJSw1ICFUdBgpTJjsIF00GAE1ULB1NPRpPLF5JAgJUVAUAAAYKCAFFXjUe
DBBOFRwOBgA+T04pC0kDElMdC0VXBgYdFkU2CgtNEAEUVBwTWXhTVG5SGg8e
AB0cRSo+AwgKRSANExlJCBQaBAsANU9TKxFJL0dMHRwRTAtPBRwQMAAATQcB
FlRlIkw5QwA2GggaR0YBBg5ZTgIcAAw3SVIaAQcVEU8QTyEaYy0fDE4ITlhI
Jk8DCkkcC3hFMQIEC0EbAVIqCFZBO1IdBgZUVA4QTgUWSR4QJwwRTWM=
"""

# https://www.base64decode.net/python-base64-b64decode

ct_bytes = bytearray(base64.b64decode(ct64))

print("Input ciphertext ( type =",type(ct_bytes),"):\n")

print(ct_bytes,"\n\n")

#>>> key = 37
#>>> message = bytearray(b"Hello World")
#>>> s = bytearray(x ^ key for x in message)


def xor_byte_arrays(array_1, array_2):
    """ XOR two bytearrays """
#    if len(array_1) != len(array_2):
#        raise ValueError("Undefined for sequences of unequal length")
    return bytes(b1 ^ b2 for b1, b2 in zip(array_1, array_2))


def multibyte_key_encrypt_decrypt(input_message, key):
    input_message_len = len(input_message)
    key_len = len(key)
    chunks = divmod(input_message_len,key_len)
    fullblocks = chunks[0]
    shortblock = chunks[1]
    
    if shortblock > 0:
       fullblocks = fullblocks + 1
    
    # In the loop, when you get to the last block, if there is a remainder, work will have to be done
  
    output = bytearray("","Latin-1")
   
    print("Key:",key)
    
    count = 0
    
    while (count < fullblocks): 
        
        print("\nCount:",count)
    
    # Bytearray slices:    
    #   block_0 = pt[:3]
   #   block_1 = pt[3:6]
    #   block_2 = pt[6:9]
    # Etc.
    # https://stackoverflow.com/questions/509211/understanding-slice-notation   
        
        input_block = input_message[key_len*count:key_len*(count+1)]   
        output_block = xor_byte_arrays(key,input_block)
        count = count + 1
        
    # build the output string
        output = output + output_block
    
        print("Input block:",input_block)
        print("XORed output:",output_block)
#        print("Cumulative output as hex bytes:\n",binascii.hexlify(output))    
            
    return output 

    
def calculate_score(xor_output_bytes):
    
    """ Score possible plaintexts created by single-character XOR """
    
    if debug == 1:        
        print("\n\n######################################\nScore plaintexts created by single-character XOR\n######################################")     

    xor_output_string = xor_output_bytes.decode("utf-8")

    local_score = 0   

# https://en.wikipedia.org/wiki/Letter_frequency
    char = "e"
    weight = 12.7
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")

    char = "t"
    weight = 9.06        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****") 
    char = "a"
    weight =  8.17       
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****") 
    
    char = "o"
    weight = 7.51        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "i"
    weight = 6.97        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "n"
    weight = 6.75        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "s"
    weight = 6.33        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "h"
    weight = 6.09        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "r"
    weight = 5.99        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "d"
    weight = 4.23        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "l"
    weight = 4.02        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "c"
    weight = 2.78        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "u"
    weight = 2.76        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "m"
    weight = 2.41        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "w"
    weight = 2.36     
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "f"
    weight = 2.23      
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "g"
    weight = 2.02        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "y"
    weight = 1.97        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "p"
    weight = 1.93    
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    

    char = "b"
    weight = 1.49       
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
  
    char = "v"
    weight = 0.978    
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "k"
    weight = 0.772    
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")      
    
#    local_score += xor_output_string.count("j") * 0.153
#    local_score += xor_output_string.count("x") * 0.150
#    local_score += xor_output_string.count("q") * 0.095
#    local_score += xor_output_string.count("z") * 0.074 

    char = "E"
    weight = 12.7
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")

    char = "T"
    weight = 9.06        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")   
    
    char = "A"
    weight =  8.17       
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "O"
    weight = 7.51        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "I"
    weight = 6.97        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "N"
    weight = 6.75        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
 
    char = "S"
    weight = 6.33        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "H"
    weight = 6.09        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "R"
    weight = 5.99        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "D"
    weight = 4.23        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "L"
    weight = 4.02        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "C"
    weight = 2.78        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "U"
    weight = 2.76        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "M"
    weight = 2.41        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "W"
    weight = 2.36     
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "F"
    weight = 2.23      
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "G"
    weight = 2.02        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "Y"
    weight = 1.97        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "P"
    weight = 1.93    
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
 
    char = "B"
    weight = 1.49       
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    

    char = "V"
    weight = 0.978    
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "K"
    weight = 0.772    
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")  

#    local_score += xor_output_string.count("J") * 0.153
#    local_score += xor_output_string.count("X") * 0.150
#    local_score += xor_output_string.count("Q") * 0.095
#    local_score += xor_output_string.count("Z") * 0.074 
    
# https://linguistics.stackexchange.com/questions/12324/special-characters-and-uppercase-frequency 

    char = ","
    weight = 2.38    
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")        

    char = "."
    weight = 2.13  
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "\""
    weight = 0.7       
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
                                          
                                          
#    local_score += xor_output_string.count("\)") * 0.169 
#    local_score += xor_output_string.count("(") * 0.169
#    local_score += xor_output_string.count("?") * 0.162
#    local_score += xor_output_string.count(":") * 0.136
    
# Out of my own head    

    char = " "
    weight = 30      
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "0"
    weight = 2     
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "1"
    weight = 2        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    

    char = "2"
    weight = 2      
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")       
    
    char = "3"
    weight = 2        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")       
    
    char = "4"
    weight = 2            
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")        
 
    char = "5"
    weight = 2            
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "6"
    weight = 2            
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "7"
    weight = 2            
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "8"
    weight = 2            
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
    
    char = "9"
    weight = 2        
    local_score += xor_output_string.count(char) * weight
    if debug == 1 and xor_output_string.count(char) > 0:
        print("**** Debug (",char,")",xor_output_string.count(char),"*",weight,"=",xor_output_string.count(char) * weight,"****")    
      
    if debug == 1: print("\n**** Debug Total raw score:",local_score,"****\n")

    return local_score    


def scan_for_keys():
    
    """ Find Hamming distances for possible keys. We are looking for a repeating similarity 
        between blocks of a certain length, starting at the beginning of the original ciphertext.
        Block length yielding most-similar blocks (lowest Hamming Distance) is the probable key length. 
    """ 
    print("\n\n####################################\nBegin searching for likely key size\n####################################\n\n")
                   
    min_key = 2
    max_key = 40
        
    tally = []

    for key_size in range(min_key,max_key):
        
        block_check = divmod(len(ct_bytes),key_size)        
        blocks = block_check[0]        
        if block_check[1] > 0: blocks -= 1                    
        if blocks % 2 > 0: blocks -= 1
        
        hd = 0
         
        print("\nFor",key_size,"bytes:", blocks,"blocks (",blocks/2,"pairs to XOR )")
           
        for i in range(0, blocks, 2):
            
            """ iterate through pairs of blocks """
            
            test_block_1 = ct_bytes[key_size*i:key_size*(i+1)]
            test_block_2 = ct_bytes[key_size*(i+1):key_size*(i+2)]

            xor_output_bytes = xor_byte_arrays(test_block_1,test_block_2)

            xor_output_bin = ''.join(format(i, '#010b') for i in xor_output_bytes)
            
            """ Hamming Distance hd is given by counting 
            the ones in the XOR output of two binary strings."""
                                         
            hd += xor_output_bin.count("1")     
            local_blockpair_hd = xor_output_bin.count("1")
           
            if debug == 1: print("Local blockpair HD:",local_blockpair_hd,"for",xor_output_bin)
            
          #  print("Block size",len(test_block_1),"bytes:",test_block_1)
       
        pairs = blocks/2
        average = hd/pairs
        normalized = average/key_size

        if debug == 1:
            print("Raw DH score:",hd)
            print("Blocks:",blocks)        
            print("Pairs:",pairs)
            print("Average of",blocks/2,"outputs:",average)
            print("Average HD normalized for key_size (",key_size,"):",normalized)
        
        new_result = { "key_size": key_size,"normalized_HD": normalized }
    
        print("Key size:",key_size, "HD:","Normalized HD:","{0:.1f}".format(normalized),"\n")
               
        tally.append(new_result)
        
    ranked = sorted(tally, key = lambda k: k["normalized_HD"],reverse=False)  
    detected_key_size = ranked[0].get("key_size")
    norm = ranked[0].get("normalized_HD")   

    print("\n\nHamming distance for all key lengths, ranked\n") 
    for i in ranked:
        print(i)
        
    print("\n************* Detected key size:",detected_key_size,"*************")
    print("\nNormalized HD:","{0:.1f}".format(norm),"\n")

    return detected_key_size    
   

def process_ct_blocks(ct_bytes, detected_key_size):
    
    """ Split ct into sequential key-sized blocks, then reshuffle the bytes into target blocks """
    
    print("\n\n############################\nBegin processing blocks\n############################\n\n")
    
    ct_len = len(ct_bytes)
    key_len = detected_key_size
    chunks = divmod(ct_len,key_len)
    fullblocks = chunks[0]
    shortblock = chunks[1]
    
    """ Account for a short block at the end"""
    
    if shortblock > 0:
        fullblocks = fullblocks + 1
    
    """ Initialize lists to contain n target blocks where n = key_len.
        Iterate by key_len blocks through ct_bytes and add each byte to 
        one of n collation lists named for its key byte index.
        
        For a four-byte key:
            
            Byte position:   0 ---> collation[0]
                             1 ---> collation[1] 
                             2 ---> collation[2]      
                             3 ---> collation[3]
                             
                             4 ---> collation[0]
                             5 ---> collation[1] 
                             6 ---> collation[2]      
                             7 ---> collation[3]    (iterate through ct_bytes...)
                   
        Each collation has length len(ct_bytes)/n 
        
        The approach below to dynamic variables creates a list of lists
        with indices matching key_len bytes.
    """
    
    key_position_collations = [[] for i in range(key_len)]
      
    count = 0
    
    while (count < fullblocks):
        
        """ Counting from zero....The slicing here works because the range of key_len is 0 to keylen-1, 
        so the value key_len itself is excluded from the slice."""
        
        input_block = bytes(ct_bytes[key_len*count:key_len*(count+1)])   

        if debug == 1:        
            print("\n\n######################################\nOuter loop, iterating through blocks\n######################################")       
            print("\nBlock count:",count)
            print("Raw input_block:",input_block)
        
        for i in range(key_len):
            key_position_collations[i].append(input_block[i:i+1])
             
            if debug == 1:            
                print("\n\n     #########################################################################\n     Inner loop, iterating through characters and inserting them in lists\n     #########################################################################")                   
                print("\n\n     Position",i,":",input_block[i:i+1])
                print("     Append byte to List",i)

        count = count + 1

    if debug == 1: 
        print("\n\n#####################################################\nCollations completed; display",key_len,"collations\n#####################################################\n")                   

        for i in range(key_len):            
                print("Collation for Key Byte", i,"\n\n",key_position_collations[i],"\n\n")
            
    return key_position_collations


def main():
  
    detected_key_size = scan_for_keys()  
    
#    detected_key_size = 29
    
    key_byte_candidates = [[] for i in range(detected_key_size)]
    
    ranked_candidates = [[] for i in range(detected_key_size)]
    
    key_position_collations = process_ct_blocks(ct_bytes, detected_key_size)
    
    for i in range(detected_key_size):   
        
        for chr in (ASCII):    
            key = bytearray(chr * len(key_position_collations[i]), "Latin-1")
                       
            """ We must convert the key and the list to byte objects """           
            
            list2ba = b''.join(x for x in key_position_collations[i])
           
            pt_candidate = xor_byte_arrays(key,list2ba)    
             
            if debug == 1:
                print("Key type:",type(key))
                print("Key character:", chr)
                print("Key =",chr,"*",len(key_position_collations[i]))
                print("Key:\n\n",key)
                print("\n\nPT candidate type:",type(pt_candidate))            
                print("PT candidate:\n\n",pt_candidate,"\n\n")
          
            score = calculate_score(bytes(pt_candidate))
            if debug == 1: print("Score:","{0:.0f}".format(score),"\n\n")
        
            """ Store dictionaries in the prepared list for this byte iteration """
 
            new_result = { "position": i, "character": chr, "score": score}
            key_byte_candidates[i].append(new_result)
            
    if debug == 1: print("\n\nUnranked lists:\n\n",key_byte_candidates)  
            
    """ Rank each of the lists by score """    

    derived_key = []

    for i in range(detected_key_size): 
        
        print("\n\n#####################################################\nRanked collation",i,"\n#####################################################\n")
              
        ranked_candidates[i] = sorted(key_byte_candidates[i], key = lambda k: k["score"],reverse=True)

        if debug == 1:
        
            for j in range(len(ASCII)):
                
                ascii_char = ranked_candidates[i][j].get("character")
                print("\nASCII Char:",ascii_char,"[",hex(ord(ascii_char)),"]")
                      
                score = ranked_candidates[i][j].get("score")
                print("Score:","{0:.0f}".format(score))
   
        winning_byte = ranked_candidates[i][0].get("character")
        derived_key.append(winning_byte)
        
    """ Key is a list; convert it to something usable """
        
    x = ''.join(x for x in derived_key) # yields a string
    y = x.encode('utf-8') # yields a byte object
    z = y.hex() # yields a hex bytes string
    w = bytearray(x, "utf-8")        
    
    print("\n\nRepresentations of the Derived Key:\n")
    print("String x from applying join to the list:",type(x),x,"\n")
    print("Bytes object y from encoding string x to UTF-8:",type(y),y,"\n")
    print("String z from applying the hex attribute to byte object y:",type(z),z,"\n")
    print("And here is a bytearray w from hex string z:",type(w),w,"\n")
       
#    decryption = multibyte_key_encrypt_decrypt(ct_bytes, bytes("Terminator X: Bring the noise","Latin-1"))    
    
    decryption = multibyte_key_encrypt_decrypt(ct_bytes, y)
    
    print("\n\nDecryption in raw bytes:\n\n",decryption)
    
    print("\n\nFormatted decryption:\n\n",decryption.decode())
    
main()    

Home