Some regex help!

skyline · January 15, 2013, 9:12pm

I need to clean up some text.

I need to remove all instances where < occurs as first and only char on a line (but keep the line empty):

eg

the cat sat on the mat
>
the dog sat on the cat

I also need to remove instances where we have uncessary < followed by a space and then a (first char on line)

eg

< The cat sat on the mat

and finally I need to make sure that each line that only contains …any text in here… has one empty line above it (except for the first occurence in the whole string being processed)

Anyone able to help put this into a regex?

StarLion · January 16, 2013, 1:47pm

Simple - you have 3 conditions. You need to do 3 evaluations, not 1.
~^<$~ will match condition 1.
~^< <b~ will match condition 2.
~\R+.*?\R~ will match condition 3.

NOTE: Conditions 1 and 2 are matched on an array. Condition 3 is matched on a string.

skyline · January 16, 2013, 2:58pm

Thanks! I’m not very familiar with regex. How would I put this into a pre_replace to act on my string?

StarLion · January 16, 2013, 6:08pm

the first two should be put through an -array- based preg_replace (file() the text, or else explode on
and trim each element). This is done so that the start and end operators can process correctly (otherwise you’d be looking for
's, which would miss the first line of text…)
the last one can be preg_replaced directly onto the string, capturing the middle part (so you’ll actually need () around the .*? ) and replanting it in your replace.

Give it a go, and come back with questions.