最近在看这个书,http://book.douban.com/subject…,在 311 页讲到了一个针对正则引擎的 DoS,叫做 ReDoS,wiki 看这里,http://en.wikipedia.org/wiki/R…,原因是正则引擎在匹配的时候是用状态机,然而通过构造一些特殊的输入,可以让这个状态机需要尝试的路径暴涨,于是就耗时也跟着上升,照着书上敲了代码如下,不得不说 python 在这种时候真是不二选择啊
#!/usr/bin/env python # # retime.py - Python test program for regular expression DoS attacks # # # paramemers # # regex: the expression to be tested # maketeststring: a function which generates a test string from a # length patameter # maxiter: maximum number of test iterations to be performed # maxtime: maximum execution time in seconds for one iteration # before the test program is terminated # regex = r"^(a+)+$" maketeststring = lambda n : "a" * n + "!" maxiter = 50 maxtime = 2 # # modules used by this program # import re import time import sys # # main function # def main(): print print "Python Regular Expression DoS demo" print print "Platform: %s %s" % (sys.platform, sys.version) print "Regular expression: %s" % (regex) print "Typical test string: %s" % (maketeststring(10)) print "Max iterations: %s" % (maxiter) print "Max match time: %s sec%s" % (maxtime, "s" if maxtime!=1 else "") print cregex = re.compile(regex) for i in xrange(1, maxiter): time = runtest(cregex, i) if time > maxtime: break return # # run one test # def runtest(regex, n): teststr = maketeststring(n) starttime = time.clock() match = regex.match(teststr) elapsetime = int((time.clock()-starttime) * 1000) count = 0 if match != None: count = match.end() - match.start() print "For n=%d, match time=%d msec%s, match count=%s" % (n, elapsetime, "" if elapsetime==1 or elapsetime==0 else "s", count) return float(elapsetime)/1000 if __name__ == "__main__": main()
输出
Python Regular Expression DoS demo Platform: win32 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] Regular expression: ^(a+)+$ Typical test string: aaaaaaaaaa! Max iterations: 50 Max match time: 2 secs For n=1, match time=0 msec, match count=0 For n=2, match time=0 msec, match count=0 For n=3, match time=0 msec, match count=0 For n=4, match time=0 msec, match count=0 For n=5, match time=0 msec, match count=0 For n=6, match time=0 msec, match count=0 For n=7, match time=0 msec, match count=0 For n=8, match time=0 msec, match count=0 For n=9, match time=0 msec, match count=0 For n=10, match time=0 msec, match count=0 For n=11, match time=0 msec, match count=0 For n=12, match time=0 msec, match count=0 For n=13, match time=1 msec, match count=0 For n=14, match time=2 msecs, match count=0 For n=15, match time=4 msecs, match count=0 For n=16, match time=8 msecs, match count=0 For n=17, match time=16 msecs, match count=0 For n=18, match time=33 msecs, match count=0 For n=19, match time=65 msecs, match count=0 For n=20, match time=131 msecs, match count=0 For n=21, match time=263 msecs, match count=0 For n=22, match time=529 msecs, match count=0 For n=23, match time=1063 msecs, match count=0 For n=24, match time=2120 msecs, match count=0
===================================================
2012-10-14 15:53:55 update 注意到 46 行使用了 xrange,查找资料看到这里,http://ciniao.me/article.php?i…
range
函数说明:range([start,] stop[, step]),根据start与stop指定的范围以及step设定的步长,生成一个序列。
range示例:
>>> range(5)
[0, 1, 2, 3, 4]
>>> range(1,5)
[1, 2, 3, 4]
>>> range(0,6,2)
[0, 2, 4]xrange
函数说明:用法与range完全相同,所不同的是生成的不是一个数组,而是一个生成器。
xrange示例:
>>> xrange(5)
xrange(5)
>>> list(xrange(5))
[0, 1, 2, 3, 4]
>>> xrange(1,5)
xrange(1, 5)
>>> list(xrange(1,5))
[1, 2, 3, 4]
>>> xrange(0,6,2)
xrange(0, 6, 2)
>>> list(xrange(0,6,2))
[0, 2, 4]由上面的示例可以知道:要生成很大的数字序列的时候,用xrange会比range性能优很多,因为不需要一上来就开辟一块很大的内存空间,这两个基本上都是在循环的时候用: