Python Regex Help

Hi everyone, i’m pulling my hair out over this.

I have strings that are in this format:

line = "(optional text) this is required text +oneWordOptional @OneWordOptional"

It could be in any format except the parentheses (if they exist) must be first, eg, this is also valid:

line = "(optional text) this is required text @OneWordOptional +oneWordOptional"

I’ve got this regex:

optionRe = re.compile(r'(?:\\(.+\\))?(.+)\\+?|@?')

however, it’s including the first + or @ prefaced text in the result, meaning it’s being greedy. Reading through my RE book, I found that doubling the ? (i.e. ??) makes the RE non-greedy. However, it appears Python’s interpretation doesn’t support this. How can I make this non-greedy?

Thanks for your help! I have 2.6 installed, so I guess an update is in order. Thanks for the other tips too. I haven’t done real regex in years, so I haven’t gotten all the kinks out.

Hm, Python 2.7 should understand ?? fine. But I’m not sure I understand what you mean by greedy… greedy is “match as much as possible, in a given string”. It’s not “match as many strings as possible”.

I’m wondering if you want to say
[b][1](?:\(stuff\))? …

Here, the “(stuff)” is still optional and non-captured, but the regex should be looking for it specifically at the beginning of the string.

however, it’s including the first + or @ prefaced text in the result…

So, you have strings:

(optional text) this is required text +oneWordOptional @OneWordOptional

and

this is required text +oneWordOptional @OneWordOptional

but also

+oneWordOptional @OneWordOptional this is required text

which you don’t want to let match?

You may want to forbid those symbols explicitly when at the beginning then:

[2]\w …

*edit: I may be confused by your post, are you grabbing strings or atoms of a string?


  1. /b ↩︎

  2. ^+@ ↩︎

Well, I don’t know that 2.6 doesn’t have ??.. in general, the C-ish languages follow PCRE, and any exceptions to that are usually known in the community and listed around in various places. Nothing wrong with upgading, but I would be surprised that your version’s missing ??.

*edit could you be more clear on what exactly you’re doing with these strings?