Freesteel Blog » I can crash Python’s re module

I can crash Python’s re module

Friday, May 18th, 2007 at 8:03 pm Written by:

Discovered while attempting to parse some processed UN documents. Here it is:

import re
s = "Add.1, 2020 and Add.1, 2021-2023, 2025, 2028 and 2029 and Add.1) R"
r =
"(?:s|,|and|AddS*?|Parts?|([^)]*)|[IV-d]+)*$"
print "It's going to crash"
print re.search(r, s)
print "It didn't crash"

This example has been painstakingly iterated down from an extremely long case. It could probably go further, but it’s good enough for a report.

I have been using this regular expression module for three years, more heavily than anyone I know. I have iteratively built gigantic matching expressions and applied them to megabytes of poor quality data at a time. This is the first time I have ever, ever encountered a problem with it. It is absolutely the most reliable code I know (possibly because it runs Google), so I’m feeling a bit chuffed.

Owing to the fact that Python is a no-nonsense tool of, by, and for the people, without any demi-God like beings such as Donald Knuth kicking around in the background, I probably won’t be getting a reward cheque for this.

Oh well. I’ll just settle down and find out how to submit this bug, and then do a work around.

Meanwhile, scary emails from rich people are lurking in my in-box as a result of my rant yesterday. I’m too frightened to open them, and have forwarded them to Francis, my appointed agent for this month of May. He can deal with it. It’s the least he can do for encouraging this irresponsible attitude.

Update: Bug now reported with number 1721518

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <blockquote cite=""> <code> <em> <strong>