Regex to replace li tags with asterisk

Working with TinyMCE to enable editor to toggle off html mode, what I’m struggling with is converting list items into asterisks:

<ul>
<li>Bullet 1</li>
<li>Bullet 2</li>
<li>Bullet 3</li>
</ul>

Should become

  • Bullet 1
  • Bullet 2
  • Bullet 3

I’ve used a similar regex to convert paragraphs to "
$1

" and that is working, but I can’t seem to get the regex to work for list items, here’s my code:


// replace p tags with line breaks
strippedValue = strippedValue.replace(/<p>([^<\\/p>]*)<\\/p>/ig, "\
\
$1\
\
");

alert(strippedValue);

// replace list items with astrisks
strippedValue = strippedValue.replace(/<li>([^<\\/li>]*)<\\/li>/ig, "* $1\
");

alert(strippedValue);


At both alerts, the content remains the same:

<ul><li>Bullet 1</li><li>Bullet 2</li><li>Bullet 3
</li></ul>

<li>([^<\/li>]*)<\/li>

You are looking for a string that begins with <li> and finishes with </li> and has any characters other than <. /, l, i, > in between. Since the text Bullet contains ls the match is not made and no substitutions are done.

Try

<li>(.*?)<\\/li>

Ah, yes I see the problem that square brackets are evaluating matches against any of the characters within. That greedy .* was dumping all list items onto one line, I’ve got it working with this:


strippedValue = strippedValue.replace(/<li[^>]*>([^<]*)<\\/li>/ig, "* $1\
");

But its asking for trouble when someone uses < within the list item. Is there a way to use regex to match where as I originally wanted:

Assign to $1 all characters after <li> and before the next occurrence of </li>, I thought maybe

[^(?:<\\/li>)]

would do it, or maybe

(^<\\/li>)

but the ^ doesn’t appear to work within parentheses…

Did a bit more reading and found that (.*?) is not greedy, the problem was caused by the markup having a new line character before the last closing </li> tag. The . operator doesn’t match new line breaks, so have updated to common work-around and it works, here’s the final code:


strippedValue = strippedValue.replace(/<p[^>]*>([\\s\\S]*?)<\\/p>/ig, "$1\
");
strippedValue = strippedValue.replace(/<li[^>]*>([\\s\\S]*?)<\\/li>/ig, "* $1\
");

Thanks for helping, Philip!

In the same way as you put the i (case insensitive) and g (global) at the end you can also put s (treat as a single line) then matches occur across lines.