JS Regular Expression question

pier · February 11, 2010, 12:15am

Hi everyone, I’m new to the site and looking for an answer to this:

I want to match the word(s) that begin with a hashtag (#) on a given string. So far with a code like

var mystring = “Going to the #museum and then to the #gym”;
var mypattern = /(#)([a-zA-Z0-9])*/;
var myresult= mypattern.exec(mystring);

myresult contains

#museum,#,m

but:

I only want #museum
It’s not detecting #gym

The main reason for all this is, I want to get all text that is not a hashtag, so in this example I want to get
"Going to the "
"and then to the "
as two separate string. I thought using regular expressions would help me instead of searching the string for #.

Any help is appreciated!

pier · February 11, 2010, 12:47am

It worked! Thank you mrhoo for the solution and the correction in English (I am from Peru)

mrhoo · February 11, 2010, 12:37am

Try it with a global match instead of exec.

Exec finds one match at a time, so you can do something to the matched text.
If you are just collecting them, a match will get them all at once.

var mystring = “Going to the #museum and then to the #gym”;

var mypattern = /(#[a-zA-Z0-9]+)/g;

var myresult= mystring.match(mypattern);

/* returned value: [‘#museum’,‘#gym’] */

By the way, # is called the ‘octothorp’ , from a word meaning eight-points.

mrhoo · February 11, 2010, 12:54am

de nada. Mucho gusto en conocerle.

You can use the same pattern in an exec,
but you must loop through the string.

var result='', pat, rx = /(#\\w+)/g;
while((pat=rx.exec(mystring))!=null){
	result+=pat[1].substring(1)+' ';
}
result.slice(0,-1);

/*  returned value: (String)  'museum gym' */

pier · February 11, 2010, 1:46am

mrhoo:

Is there a way (on regular expressions or any other) that I can transform:

“Going to the #museum and then to the #gym”;

into an array that looks like:

[“Going to the “,”#museum”," and then to the “,”#gym"]

?

mrhoo · February 11, 2010, 2:32am

Sure- you want to return everything,
keeping the flagged words separate but in source order.

In regular expression syntax a pipe (|) means ‘or’:

var rx=/([^#]+)|(#\w+)/g;

([^#]+)= match anything not an #
(#\w+)= match an # plus any number of word characters (a-zA-Z0-9 and _)

var s="Going to the #museum and then to the #gym";
s.match(/([^#]+)|(#\\w+)/g)

/* returns (Array): [‘Going to the ‘,’#museum’,’ and then to the ‘,’#gym’] */

Paul_Wilkins · February 11, 2010, 6:11am

People often say that you need to learn about regular expressions, and you don’t bother to because you don’t know what you’re missing.

This thread is a perfect example of why you need to learn about regular expressions.

Thank you mrhoo.

Stomme_poes · February 11, 2010, 11:08am

By the way, # is called the ‘octothorp’ , from a word meaning eight-points.

Huh, I didn’t know that either, but Engrish has lots of names for that thing (number, pound, hash, (cross)hatch). For programming I’m going to keep calling it hash(mark) or shebang, depending on where it is, so people know what I’m talking about.

RLM2008 · February 11, 2010, 3:27pm

Came up with this match, which removes the trailing spaces. (If you’d want that???)

Can it be simplified?

var s=“Going to the #museum and then to the #gym”;
console.log (s.match(/(\w+ ?)+(?= #)|(#\w+)/g)); // ‘Going to the’,‘#museum’,‘and then to the’,‘#gym’

edit: forget it! it fails with “Going to the #museum and then to the #gym and then the pub”

RPG

pier · February 11, 2010, 6:33pm

Thank you indeed! I always thought regular expressions were very helpful, I guess I have to get me some tutorials and/or books. Any links recommended? I couldn’t find any on the Links/Tutorial thread.

mrhoo · February 11, 2010, 6:45pm

This site is devoted to Regular expressions in javascript:

But try to find the Book, ‘Mastering Regular Expressions’,by Jeffrey EF Friedl,
published by O’Reilly. It’s a keeper, and covers other programming languages as well as javascript.

They are extremely useful in server scripts.

pier · February 11, 2010, 7:21pm

Thank you mrhoo for the tip!

I have another question revolving on the same problem, but since it’s not related to regular expressions anymore, I will make a another post (if I can’t find the answer already in the forums).

RLM2008 · February 12, 2010, 12:45am

Just to fix the above.

/(\w+ ?)+(?= #)?|(#\w+)/g