skyline — 2013-01-15T16:12:43-05:00 — #1
I need to clean up some text.
- I need to remove all instances where < occurs as first and only char on a line (but keep the line empty):
the cat sat on the mat
the dog sat on the cat
- I also need to remove instances where we have uncessary < followed by a space and then a <b> (first char on line)
< <b>The cat sat on the mat
and finally I need to make sure that each line that only contains <b>...any text in here...</b> has one empty line above it (except for the first occurence in the whole string being processed)
Anyone able to help put this into a regex?
starlion — 2013-01-16T08:47:28-05:00 — #2
Simple - you have 3 conditions. You need to do 3 evaluations, not 1.
~^<$~ will match condition 1.
~^< <b~ will match condition 2.
~\R+<b>.*?</b>\R~ will match condition 3.
NOTE: Conditions 1 and 2 are matched on an array. Condition 3 is matched on a string.
skyline — 2013-01-16T09:58:53-05:00 — #3
Thanks! I'm not very familiar with regex. How would I put this into a pre_replace to act on my string?
starlion — 2013-01-16T13:08:27-05:00 — #4
the first two should be put through an -array- based preg_replace (file() the text, or else explode on \
and trim each element). This is done so that the start and end operators can process correctly (otherwise you'd be looking for \
's, which would miss the first line of text...)
the last one can be preg_replaced directly onto the string, capturing the middle part (so you'll actually need () around the .*? ) and replanting it in your replace.
Give it a go, and come back with questions.