gregbair — 2010-08-10T08:27:56-04:00 — #1
Hi everyone, i'm pulling my hair out over this.
I have strings that are in this format:
line = "(optional text) this is required text +oneWordOptional @OneWordOptional"
It could be in any format except the parentheses (if they exist) must be first, eg, this is also valid:
line = "(optional text) this is required text @OneWordOptional +oneWordOptional"
I've got this regex:
optionRe = re.compile(r'(?:\\(.+\\))?(.+)\\+?|@?')
however, it's including the first + or @ prefaced text in the result, meaning it's being greedy. Reading through my RE book, I found that doubling the ? (i.e. ??) makes the RE non-greedy. However, it appears Python's interpretation doesn't support this. How can I make this non-greedy?
gregbair — 2010-08-10T14:12:29-04:00 — #2
Thanks for your help! I have 2.6 installed, so I guess an update is in order. Thanks for the other tips too. I haven't done real regex in years, so I haven't gotten all the kinks out.
stomme_poes — 2010-08-10T14:04:32-04:00 — #3
Hm, Python 2.7 should understand ?? fine. But I'm not sure I understand what you mean by greedy... greedy is "match as much as possible, in a given string". It's not "match as many strings as possible".
I'm wondering if you want to say
Here, the "(stuff)" is still optional and non-captured, but the regex should be looking for it specifically at the beginning of the string.
however, it's including the first + or @ prefaced text in the result...
So, you have strings:
(optional text) this is required text +oneWordOptional @OneWordOptional
this is required text +oneWordOptional @OneWordOptional
+oneWordOptional @OneWordOptional this is required text
which you don't want to let match?
You may want to forbid those symbols explicitly when at the beginning then:
*edit: I may be confused by your post, are you grabbing strings or atoms of a string?
stomme_poes — 2010-08-10T14:17:15-04:00 — #4
Well, I don't know that 2.6 doesn't have ??... in general, the C-ish languages follow PCRE, and any exceptions to that are usually known in the community and listed around in various places. Nothing wrong with upgading, but I would be surprised that your version's missing ??.
*edit could you be more clear on what exactly you're doing with these strings?