REGEX help?

I have a string

When: When: Thu Aug 23, 2012 1pm to Fri Aug 24, 2012 10pm \
BST\\u003cbr /\\u003e\
\\u003cbr /\\u003eWho: Paul\
\\u003cbr /\\u003eWhere: The O2, Greenwich\
\\u003cbr /\\u003eEvent Status: confirmed

And I’d like to extract two pieces of information from it

First is

The O2, Greenwich

Second is

1pm to 10pm

I’ve always sucked at REGEXs so help me please? :blush:

To further explain: I’m trying to extract two pieces of data from a string (see below), the time and the location.

String example two

When: When: Thu Aug 23, 2012 1pm to Fri Aug 24, 2012 10pm \
BST\\u003cbr /\\u003e\
\\u003cbr /\\u003eWho: Paul\
\\u003cbr /\\u003eWhere: Lost Region, Timbuktu\
\\u003cbr /\\u003eEvent Status: confirmed

The following should do the job.


var eventInfo = "When: When: Thu Aug 23, 2012 1pm to Fri Aug 24, 2012 10pm \
BST\\u003cbr /\\u003e\
\\u003cbr /\\u003eWho: Paul\
\\u003cbr /\\u003eWhere: The O2, Greenwich\
\\u003cbr /\\u003eEvent Status: confirmed",
    whenLine = eventInfo.match(/(.*)/)[1];
    when = whenLine.match(/(\\d+(?:am|pm) to )/)[1] + whenLine.match(/(\\d+(?:am|pm))\\s*$/)[1],
    where = eventInfo.match(/Where: (.*)/)[1];

// when is "1pm to 10pm"
// where is "The O2, Greenwich"

Thank you so much. I really appreciate the help.

I’m finding that it errors out if it doesn’t find a match, however I thought to myself that I should encapsulate it in a conditional and thus I’ve tried

if (eventinfo[index]['details'].match(^(20|21|22|23|[01]\\d|\\d)(([:.][0-5]\\d){1,2})$))

But it’s giving me a parse error, what am I doing wrong?

It errors when it doesn’t find a match, so let’s deal with that issue instead. We can get the match separately, and the check to see if the match contains anything useful before getting the [1] index from it. If it doesn’t contain anything useful, we can give it a default value of an empty string instead.


var eventInfo = "Some non-matching content",
    whenMatch = eventInfo.match(/(.*)/),
    whenLine = (whenMatch && whenMatch[1]) || '',
    fromWhenMatch = whenLine.match(/(\\d+(?:am|pm) to )/),
    toWhenMatch = whenLine.match(/(\\d+(?:am|pm))\\s*$/),
    whereMatch = eventInfo.match(/Where: (.*)/),
    fromWhen = (fromWhenMatch && fromWhenMatch[1]) || '',
    toWhen = (toWhenMatch && toWhenMatch[1]) || '',
    when = fromWhen + toWhen,
    where = (whereMatch && whereMatch[1]) || '';

We can even simplify things further by putting parts of this in to some functions:


function getFirstLine(info) {
    var match = info.match(/(.*)/),
        firstLine = (match && match[1]) || '';

    return firstLine;
}

function getWhenInfo(firstLine) {
    var fromMatch = firstLine.match(/(\\d+(?:am|pm) to )/),
        toMatch = firstLine.match(/(\\d+(?:am|pm))\\s*$/),
        from = (fromMatch && fromMatch[1]) || '',
        to = (toMatch && toMatch[1]) || '',
        when = from + to;

    return when;
}

function getWhereInfo(info) {
    var match = info.match(/Where: (.*)/),
        where = (match && match[1]) || '';

    return where;
}

var eventInfo = "Some non-matching info",
    firstLine = getFirstLine(eventInfo),
    when = getWhenInfo(firstLine),
    where = getWhereInfo(eventInfo);

Thank you Paul. You’re awesome. That works perfectly.

You’re welcome. Just to explain briefly, this is where the syntax comes from for:

where = (match && match[1]) || '';

First we start with how that might be written in full, as:


if (match.length > 0) {
    where = match[1];
} else {
    where = '';
}

Which can then easily be turned in to a ternary expression of the format (…) ? … : …;


where = (match.length > 0) ? match[1] : '';

And that might do just well, except that it’s not being expressive enough. Currently the above code says that the where variable is either an array value, or an empty string. But it doesn’t really make it clear as to why this should be the case. Sure, we can come up with some kind of correlation between the length check and the match[1], but there are more expressive ways to go about this.

What we can do is to use the && operator as a guard condition. Only if the preceeding condition is truthy, will JavaScript be allowed to carry on and check the next one. This is something that is commonly used to check if something exists first before using it.


if (targ && targ.nodeName && targ.nodeName === 'A') {
    ...
}

And there is also the || operator which is used as a default value, because JavaScript will keep on checking the different conditions until it comes across one that is truthy in nature. This is commonly used to assign default values to a variable:


function onclickEventHandler(evt) {
    evt = evt || window.event;
    var targ = evt.target || evt.srcElement;
    ...
}

So we can put the two together. To say that match[1] is being guarded first, in case it doesn’t exist, and if nothing is found there, that a default value of an empty string should be used instead.


where = (match.length > 0) && match[1] || '';

Now since an array is considered to always be a truthy value, even if the array is completely empty, and a failed regular expression match doesn’t give an array, but null instead, we can just test to see if the match is truthy or not.


where = match && match[1] || '';

And finally to help make things clearer to someone who is reading the code, we can use parenthesis to help clarify that the first two parts are related to each other:


where = (match && match[1]) || '';

So you could have ended up with a whole lot of if/else statements in your code.


if (match.length > 0) {
    where = match[1];
} else {
    where = '';
}

But you now have in return code that is more expressive instead. I wouldn’t say that this is a one-liner, because reducing code to a single line is not a good goal to have. Instead, it makes use of some well known javascript techniques to result in being even more expressive than the if/else code from before.


where = (match && match[1]) || '';

Thanks again Paul. Me and JS have never gotten along that well, but you’re actually amazing at explaining this stuff. Truly, thank you.

I eventually figured out how I was wanting to end up the discussion.

The ternary expression is equivalent to this code:


// var where = (match.length > 0) ? match[1] : '';

var where;
if (match.length > 0) {
    where = match[1];
} else {
    where = '';
}

Whereas, the preferred code using a guard operator and a default operator, is a lot closer to this:


// var where = (match && match[1]) || '';

var where = ''; // default value
if (match) { // guard
    where = match[1];
}

Before this discussion, I had no idea you could do something like:


where = (match && match[1]) || '';

I thought you’d have to use a ternary. It actually surprised me or maybe that’s because my background is PHP.

Sent from my HTC One X using Tapatalk 2

Hi Paul, I’ve discovered an issue.

Sometimes I have

function getWhenInfo(firstLine) {
	// Written by Paul Wilkins @ Sitepoint
    var fromMatch = firstLine.match(/(\\d+(?:am|pm) to )/),
        toMatch = firstLine.match(/(\\d+(?:am|pm))\\s*$/),
        from = (fromMatch && fromMatch[1]) || '',
        to = (toMatch && toMatch[1]) || '',
        when = from + to;

    return when;
}

only returning

6pm to

From

When: Tue Oct 16, 2012 6pm to 10pm \
BST\\u003cbr /\\u003e\
\\u003cbr /\\u003e

I’ve been unable to figure out how to make it recognise that without breaking multi-day usage. I thought it’d be something like

fromMatch = firstLine.match(/(\\d+(?:am|pm) to )/ || /(\\d+(?:am|pm) to \\d+(?:am|pm))/),

but it’s not.

The fromMatch part doesn’t need changing. You should put that back to what it was before.

It’s the toMatch line that needs to be made more flexible, by having it match from the "to " keyword of the line.


toMatch = firstLine.match(/to (\\d+(?:am|pm))/),

Thank you Paul. I most definitely owe you a beer.

Sent from my HTC One X using Tapatalk 2