最近在看这个书,http://book.douban.com/subject…,在 311 页讲到了一个针对正则引擎的 DoS,叫做 ReDoS,wiki 看这里,http://en.wikipedia.org/wiki/R…,原因是正则引擎在匹配的时候是用状态机,然而通过构造一些特殊的输入,可以让这个状态机需要尝试的路径暴涨,于是就耗时也跟着上升,照着书上敲了代码如下,不得不说 python 在这种时候真是不二选择啊
#!/usr/bin/env python
#
# retime.py - Python test program for regular expression DoS attacks
#
#
# paramemers
#
# regex: the expression to be tested
# maketeststring: a function which generates a test string from a
# length patameter
# maxiter: maximum number of test iterations to be performed
# maxtime: maximum execution time in seconds for one iteration
# before the test program is terminated
#
regex = r"^(a+)+$"
maketeststring = lambda n : "a" * n + "!"
maxiter = 50
maxtime = 2
#
# modules used by this program
#
import re
import time
import sys
#
# main function
#
def main():
print
print "Python Regular Expression DoS demo"
print
print "Platform: %s %s" % (sys.platform, sys.version)
print "Regular expression: %s" % (regex)
print "Typical test string: %s" % (maketeststring(10))
print "Max iterations: %s" % (maxiter)
print "Max match time: %s sec%s" % (maxtime, "s" if maxtime!=1 else "")
print
cregex = re.compile(regex)
for i in xrange(1, maxiter):
time = runtest(cregex, i)
if time > maxtime:
break
return
#
# run one test
#
def runtest(regex, n):
teststr = maketeststring(n)
starttime = time.clock()
match = regex.match(teststr)
elapsetime = int((time.clock()-starttime) * 1000)
count = 0
if match != None:
count = match.end() - match.start()
print "For n=%d, match time=%d msec%s, match count=%s" %
(n, elapsetime, "" if elapsetime==1 or elapsetime==0 else "s", count)
return float(elapsetime)/1000
if __name__ == "__main__":
main()
输出
Python Regular Expression DoS demo Platform: win32 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] Regular expression: ^(a+)+$ Typical test string: aaaaaaaaaa! Max iterations: 50 Max match time: 2 secs For n=1, match time=0 msec, match count=0 For n=2, match time=0 msec, match count=0 For n=3, match time=0 msec, match count=0 For n=4, match time=0 msec, match count=0 For n=5, match time=0 msec, match count=0 For n=6, match time=0 msec, match count=0 For n=7, match time=0 msec, match count=0 For n=8, match time=0 msec, match count=0 For n=9, match time=0 msec, match count=0 For n=10, match time=0 msec, match count=0 For n=11, match time=0 msec, match count=0 For n=12, match time=0 msec, match count=0 For n=13, match time=1 msec, match count=0 For n=14, match time=2 msecs, match count=0 For n=15, match time=4 msecs, match count=0 For n=16, match time=8 msecs, match count=0 For n=17, match time=16 msecs, match count=0 For n=18, match time=33 msecs, match count=0 For n=19, match time=65 msecs, match count=0 For n=20, match time=131 msecs, match count=0 For n=21, match time=263 msecs, match count=0 For n=22, match time=529 msecs, match count=0 For n=23, match time=1063 msecs, match count=0 For n=24, match time=2120 msecs, match count=0
===================================================
2012-10-14 15:53:55 update 注意到 46 行使用了 xrange,查找资料看到这里,http://ciniao.me/article.php?i…
range
函数说明:range([start,] stop[, step]),根据start与stop指定的范围以及step设定的步长,生成一个序列。
range示例:
>>> range(5)
[0, 1, 2, 3, 4]
>>> range(1,5)
[1, 2, 3, 4]
>>> range(0,6,2)
[0, 2, 4]xrange
函数说明:用法与range完全相同,所不同的是生成的不是一个数组,而是一个生成器。
xrange示例:
>>> xrange(5)
xrange(5)
>>> list(xrange(5))
[0, 1, 2, 3, 4]
>>> xrange(1,5)
xrange(1, 5)
>>> list(xrange(1,5))
[1, 2, 3, 4]
>>> xrange(0,6,2)
xrange(0, 6, 2)
>>> list(xrange(0,6,2))
[0, 2, 4]由上面的示例可以知道:要生成很大的数字序列的时候,用xrange会比range性能优很多,因为不需要一上来就开辟一块很大的内存空间,这两个基本上都是在循环的时候用: