jQuery Word Counter

Code being used:

<script type="text/javascript">
$('td.c_post').each(function () {
var words = $(this).html().replace(/<blockquote>(.*?)<\\/blockquote>/g, '').replace(/<div class="spoiler_toggle">(.*?)<\\/div>/g, '').replace(/<div class="spoiler">(.*?)<\\/div>/g, '').replace(/<br \\/>/g, '').split(' ').length;
$(this).append('<br /><br /><span class="word_count"><big><strong>' + words + '</strong> Words</big></span>');
});
</script>

Here: http://s1.zetaboards.com/Cory/topic/4616513/

See how it only counts each line when there is multiple lines, as the example shows in post #6, how do I prevent this and make it count all words? I’m using replace() so it won’t count words in specific HTML elements and to remove the line breaks, and split() to remove the spaces so it will count all words, but it doesn’t seem to be working correctly.

I see that you’re replacing <br /> but from what I see on the page, there is <br> too.
You can deal with that issue by instead using:

.replace(/<br[ \\/]*>/g, ' ')

If you now look at the resulting text after it’s split by a space, you’ll find that you have:

["
Alpha", "Bravo
Charlie
Delta
Echo", "Foxtrot", "Gamma
"]

The reason for this is that the newline characters aren’t being split up. You can fix that by splitting using /\s/ instead, which is a regular expression for a white-space separator.

.split(/\\s/)

When you do that you’ll then have:

["", "Alpha", "Bravo", "Charlie", "Delta", "Echo", "Foxtrot", "Gamma", ""]

So you need to trim things off first, before splitting it.

Here’s a way of doing it, where you get the text first, and then rework it for the words.


var text = $(this).html().replace(/<blockquote>(.*?)<\\/blockquote>/g, '').replace(/<div class="spoiler_toggle">(.*?)<\\/div>/g, '').replace(/<div class="spoiler">(.*?)<\\/div>/g, '').replace(/<br[ \\/]*>/g, ' '),
    words = $.trim(text).split(/\\s/).length;

That will give you a result of 7 words, which is correct for the example.

But what happens though when there are multiple spaces between words, your word count will then be off again.
You can deal with that by using the + symbol to have the split capture one or more pieces of white-space:

.split(/\\s+/)

Which leaves us with:


var text = $(this).html().replace(/<blockquote>(.*?)<\\/blockquote>/g, '').replace(/<div class="spoiler_toggle">(.*?)<\\/div>/g, '').replace(/<div class="spoiler">(.*?)<\\/div>/g, '').replace(/<br[ \\/]*>/g, ' '),
    words = $.trim(text).split(/\\s+/).length;

That sounds understandable, but I made the according changes and it still gives me the same result.

Shall we get more detailed about things then?

What you are wanting a way that results in the number of words from the following, right?


<img src="http://z3.ifrm.com/63/1/0/e661949/e661949.png" alt=":cblush:">
<img src="http://z3.ifrm.com/63/1/0/e661950/e661950.png" alt=":cdrat:">
<img src="http://z3.ifrm.com/63/1/0/e661951/e661951.png" alt=":facepalm:">
<img src="http://z3.ifrm.com/63/1/0/e661952/e661952.png" alt=":cglare:">
<img src="http://z3.ifrm.com/63/1/0/e661953/e661953.png" alt=":cmeh:">
<img src="http://z3.ifrm.com/63/1/0/e661954/e661954.png" alt=":cP:">

Number of words in the above: 0

and from:


<strong>Test</strong><br><br>
<em>Test</em><br><br>
<u>Test</u><br><br>
<del>Test</del><br><br>
<big>Test</big><br><br>
<small>Test</small><br><br>
<blockquote><dl><dt>Code: </dt><dd>&nbsp;</dd></dl><code style="width: 1079px; display: block; ">Test</code></blockquote><br><br>
<blockquote><dl><dt>Quote:</dt><dd>&nbsp;</dd></dl><div>Test</div></blockquote><br><br>
<a href="http://test.com/" target="_blank" rel="nofollow">Test</a><br><br>
<div class="spoiler_toggle">Spoiler: click to toggle</div>
<div class="spoiler" style="display:none;">Test</div>
<ul><li style="display:none"><br></li><li>Test<br></li><li>Test<br></li></ul>

Number of words in the above: 11

Is that right?

If I understand
I try this code:

  
<html>
<head>
<meta chaset="utf-8">
<script type="text/javascript" src="http://code.jquery.com/jquery-latest.pack.js"></script> 
<script type="text/javascript">
// http://www.eburhan.com/jquery-dunyasina-adim-atiyoruz/
// http://www.w3schools.com/jsref/jsref_match.asp

$(document).ready(function(){

var text = $('td.c_post').html();
var c = text.match(/(<[^<]+>)(\\w+\\:*\\s*)+(<[^<]+>)/g).join(' ').replace(/(<[^<]+>)/g,'').match(/\\w+\\:*\\s*/g).length;
// var c = text.match(/(<[^<]+>)(\\w+\\:*\\s*)+(<[^<]+>)/g).join(' ').replace(/(<[^<]+>)/g,'').match(/ /g).length;
// alert('word numbers = '+c); // 18
$('td.c_post').append('<br /><br /><span class="word_count"><big><strong>' + c + '</strong> Words</big></span>');

});
</script>
</head>
<body>
<table><tr>
<td class="c_post">
<strong>Test</strong><br><br>
<em>Test</em><br><br>
<u>Test</u><br><br>
<del>Test</del><br><br>
<big>Test</big><br><br>
<small>Test</small><br><br>
<blockquote><dl><dt>Code: </dt><dd>&nbsp;</dd></dl><code style="width: 1079px; display: block; ">Test</code></blockquote><br><br>
<blockquote><dl><dt>Quote:</dt><dd>&nbsp;</dd></dl><div>Test</div></blockquote><br><br>
<a href="http://test.com/" target="_blank" rel="nofollow">Test</a><br><br>
<div class="spoiler_toggle">Spoiler: click to toggle</div>
<div class="spoiler" style="display:none;">Test</div>
<ul><li style="display:none"><br></li><li>Test<br></li><li>Test<br></li></ul>
</td>
</tr></table>
</body>
</html>


The above code is working in Firefox 4.0b9 and Konqueror 4.5.5

  
$(document).ready(function(){

var text = $('td.c_post').html();

alert(text);

var re = /(<[^<]+>)(\\w+\\:*\\s*)+(<[^<]+>)/g;

var t = text.match(re);
alert('t =   '+t);
alert('t.length =   '+t.length);
var tt = t.join(' ').replace(/(<[^<]+>)/g,'');
alert('tt =   '+tt);
var c = tt.match(/\\w+\\:*\\s*/g).length;
alert(c);

paul_wilkins: Yes, I basically need to match the amount of words in each post and have it display the amount of words at the bottom of each post. I’m getting 18 words in the two posts with the emoticons, and I edited the third post to put more text inside of the blockquote and now it is giving me 25 words instead of 11. I don’t want words to be counted that are inside of the HTML replaced in the string. Yesterday for post #6 it was saying there was only 1 word in the post even though there was 10, it appears to be counting the correct amount of words on each new line now, so the only issue at the moment is it still appears to be counting words in the replaced HTML.

I get this error in Firebug, muazzez: http://prntscr.com/6s97q

Because there are so many changes that you are making to the posts on that site, I want you to give here some examples of the HTML code for posts that you want to count, and to also help clarify the situation with them by showing how many words you expect to find in those examples.

Here’s the post I am concerned about:

<td class="c_post">
						<strong>Test</strong><br><br><em>Test</em><br><br><u>Test</u><br><br><del>Test</del><br><br><big>Test</big><br><br><small>Test</small><br><br><blockquote><dl><dt>Code: </dt><dd>&nbsp;</dd></dl><code style="width: 686px; display: block;">Test</code></blockquote><br><br><blockquote><dl><dt>Quote:</dt><dd>&nbsp;</dd></dl><div>Test Test Test Test Test Test Test Test Test Test Test</div></blockquote><br><br><a rel="nofollow" target="_blank" href="http://test.com/">Test</a><br><br><div class="spoiler_toggle">Spoiler: click to toggle</div><div style="display:none;" class="spoiler">Test</div><ul><li style="display:none"><br></li><li>Test<br></li><li>Test<br></li></ul>
						
						
						<div class="editby">Edited by <strong><a href="http://s1.zetaboards.com/Cory/profile/62973/">Cory</a></strong>, 59 minutes ago.</div>
					<br><br><span class="word_count"><big><strong>25</strong> Words</big></span><span><br><div style="display: none;" class="likebg" id="like4616513.671140"></div></span></td>

It should only count 9 words, it is counting 25 words. I don’t want it to count what’s in between the blockquotes and DIVs. I can split the editby DIV with the replace method I originally used, but I just need it to work correctly. When I added more text inside the blockquote, it added more words to the total word count. Every other post seems to be fine, I don’t really need it to count images like it’s doing in the first two posts, but I don’t mind that as much as it counting the replaced HTML.

Righto - after working through that, the following seems to do the job nicely.


var $html,
    html,
    text,
    words;
$html = $('.c_post');
$('blockquote', $html).remove();
$('div', $html).remove();
html = $html.html().replace(/<br[ \\/]*>/gm, ' ');
text = $(html).text();
words = $.trim(text).split(/\\s+/).length;

The only difficult to understand part in there is the “.replace(/<br[ \/]*>/gm, ’ ')” piece.

The /<br[ \/]*>/ part matches either <br> or <br /> or even <br/>
and the gm part means global and multiline, which performs multiple matches (and replacements) across multiple lines of the matching HTML code
The reason why you replace the break with a space is that you don’t want “text<br>text” to end up being “texttext” if the break is just removed.

If you find other pieces of your HTML code isn’t behaving as you expect with the word count, it should be possible to update the script to work in with that as well.

Is it suppose to look like this?

<script type="text/javascript">
$('td.c_post').each(function () {
var $html,
    html,
    text,
    words;
$html = $(this);
$('blockquote', $html).remove();
$('div', $html).remove();
html = $html.html().replace(/<br[ \\/]*>/gm, ' ');
text = $(html).text();
words = $.trim(text).split(/\\s+/).length;
$(this).append('<br /><br /><span class="word_count"><big><strong>' + words + '</strong> Words</big></span>');
});
</script>

If so, the count is only correct in post #3, the three posts below that it states that there is 1 word, although there are 10 words, and post #7 has 7 words. I haven’t made any edits since the last I mentioned. The other thing that appears to be happening is blockquotes and DIV’s are actually being removed from posts, I don’t want them removed, I just want the text within them to not be added to the total word count. Sorry for making this so confusing, I suppose I should have explained myself more clearly.

That’s because it has been tested only on the one example that was provided earlier.

Which other examples are different enough from that previous example to require further development?

Other examples:

<td class="c_post">
						One Two Three Four Five Six Seven Eight Nine Ten
						
						
						
					<br><br><span class="word_count"><big><strong>1</strong> Words</big></span></td>

Shows 1 word, should be 10 words.

<td class="c_post">
						One<br><br>Two<br><br>Three<br><br>Four<br><br>Five<br><br>Six<br><br>Seven<br><br>Eight<br><br>Nine<br><br>Ten
						
						
						
					<br><br><span class="word_count"><big><strong>1</strong> Words</big></span></td>

Shows 1 word, should be 10 words.

<td class="c_post">
						Test Test<br><br>Test<br><br>Test<br><br>Test Test Test
						
						
						
					<br><br><span class="word_count"><big><strong>1</strong> Words</big></span></td>

Shows 1 word, should be 7 words.

It seems that the following line has trouble if no tags exist at that stage in the HTML string.


text = $(html).text();

So all that’s needed there is to check if HTML contains any tags, if it doesn’t, just assign that tag-free HTML content straight over to the text variable.

Sorry, but how would I do that exactly?

How would you do an if statement?

if ($('td.c_post:contains(blockquote), td.c_post:contains(div)').length) {
//Parse Code
}

Like that?

No, those have already been removed by earlier code, remember?

Perhaps it would be easier for you to just wrap the html inside of a <div>, so that the .html() method can be guaranteed to have something to work with.

Sorry man, but I think I give up. I evidently wasn’t meant to create a script like this, and it’s not even for me, I wanted to create it for someone else. Sorry to waste your time.

What I meant when I said “to just wrap the html inside of a <div>” is this:

text = $('<div>' + html + '</div>').text();

OK, that seemed to work, now how do I make it so it doesn’t actually remove blockquotes and DIV’s, but just doesn’t count the text within them?