By Jesse Buonanno –
During penetration tests, more so in covert government operations, password hashes are found. In the case of foreign systems these hashes are produced from a character set that is not English. Traditional password lists, masks, and brute force techniques have not been openly developed to handle different language plaintexts. My proposal, at minimum, is to develop a proof of concept tool for Russian, Korean, and Chinese/Japanese that will be able to brute force these hashes.
Before coding beings, it’s important to understand the languages we are going to be working with. Each of them have unique traits and characteristics that may need to be considered. There may be good reason as to why it’s difficult to find tools already built for this purpose.
Korean was the most difficult to extrapolate out of the three. I found https://zkorean.com/hangul/structure to be very useful in understanding why password crackers haven’t implemented it. Each Hangul (name of the Korean character set) symbol can represent one of three structures in a word. One symbol can either be a vowel, consonant or a whole syllable. This structure creates two useful to know features, words will be represented in only a couple of symbols, which take up only one character space, and there is a vast amount of combinations of symbols.
An additional hurdle, also found in working with Chinese/Japanese, is that the language and character set has changed with history. Different locations will use different symbols for the same representation. This adds to the entropy that is possible password combinations. After combining http://unicode-table.com/en/blocks/hangul-syllables/, http://unicode-table.com/en/alphabets/hangul/, and ascii punctuation with numbers, there is a total of 11,272 possible symbols per character location. Wow. This is over 110x more than the 94 represented in ascii. This makes straight brute force practically infeasible.
Although large, there are a few considerations that can be considered to shrink this number. First off, likely common/weak passwords arenÕt going to be very long. 3 Hangul symbols roughly equates to 8 ascii letters in size, due to the syllable property. Because of this, users may have shorter, on average, passwords in Hangul than their English counterparts. So, although the character set is much larger, it will be iterated over less times. The second consideration is that when certain Hangul symbols are typed consecutively, they create a new symbol. In doing so the character length is reduced from two, three or four to one. For example, when ㄱ is typed before a ㅕthey form a new symbol 겨. This means that certain combinations cannot appear in a password as they would instead be combined. Further examples and typing can be done on this Korean web keyboard, https://www.branah.com/korean.
Because of the factors listed before, brute force would truly be a last resort when attempting to crack Hangul hashed passwords. A more likely solution would be known password wordlists, which I was unable to obtain without some bitcoin for purchase, or mangled dictionary based attacks. Each have their own pros and cons, but are still much better than attempting to iterate over all 11k+ Hangul symbols, some of which may be outdated or too new to have been included in the UTF-8 character set.
Chinese/Japanese have many of the same properties as the Korean character set. There are overlapping symbols from other character sets that have different meaning, symbols can be combined to save on character spaces, words are expressed in fewer characters, and there is quite a lot of symbols to be considered. However, the CJK character set (http://unicode-table.com/en/blocks/cjk-unified-ideographs/ )contains the most frequently used and current symbols for Chinese/Japanese. The downside being that it is not as encompassing, unlike Hangul. After ascii punctuation and numbers, the total symbols to be iterated over is 1078. After research, this character set seemed to be used much more frequently than any other for virtually representing Chinese/Japanese typers.
Like Hangul, optimization can be done since certain characters can be combined to reduce space. I recommended running this character set over Hangul first, even when attempting to crack Korean hashed passwords. It will be far quicker and almost as likely to crack low hanging fruit as there is some overlap is the character sets.
Russian was by far the most straight forward. There are no special tricks and the scope of characters is very close to that of the ascii character set. With the characters taken from http://unicode-table.com/en/alphabets/russian/, and ascii punctuation and numbers, the total character set contains 103 symbols. The character set of ascii is only 94.
Tools and Conclusion
As a starting point of solving these issues I present Lang_Crack. https://github.com/Bojak4616/lang_crack is a python tool that has the functionality to brute force md5, sha1, or sha256 hashes with the character sets previously listed. Additionally, it has functionality to take a wordlist of plaintext, hash it, and compare it against known hashes to check for a match. Python natively has support for the UTF-8 character set. More information about Lang_Crack can be found on its Github page and via python lang_crack.py -h.
More research can be done on Korean/Chinese/Japanese to isolate accepted symbols within most applications. It’s up to the developer to allow or deny certain symbols to be entered as a password for their application. An analysis of applications could yield a more defined character set leading to improved password cracking. Additionally, because of its scope, UTF-8 symbols seem like they can provide value in increasing password security while not affecting the user’s ability to remember their password. For instance, emojis may be able to be added to a password, increasing entropy while adding enough “flavor” for the user to easily remember a seemingly complicated password. Additionally, character sets may make passwords fun enough to remove “password” as the most used password.