Removing hashtags (#example) from a string

Im pulling in tweets to my webpage using the xml feed from Twitter.

Is it possible to remove the hashtags from the string tweet? When you don’t know what the hashtag will be.

Removing everything from the hash symbol to the next bit of whitespace?

E.g

“Isn’t this amazing! html5

becomes

“Isn’t this amazing!”

So you only need part of the string? Then you need [fphp]substr[/fphp]

And you want to find the last occurrence of a # and take all text up until there --so you basically need the position of the last occurrence of # in the string–, then you need [fphp]strrpos[/fphp].

Using those two functions I’m pretty sure you can figure it out. If you have any questions let me know :slight_smile:

Thanks ScallioXTX.

Hmmm. There might be more than one - so I was you’d run the same function again to remove the next - and so on?

Hm, I was under the impression there would be only one at the end. In that case I’d use [fphp]strpos[/fphp] rather than strrpos (strrpos would work as well, but working from left to right is a bit more natural than from right to left ;))

And you’re right, you’d work your way through the string until all the tags are gone.

A general outline:


$tweet='tweet #tweet diddly tweet #html';
while (($pos = strpos($tweet, '#')) !== false) {
  // $spacepos = position of first space after $pos -- find this using strpos
  // now that you know where the tag starts ($pos) and ends ($spacepos), use
  // substr_replace to get it out of there
}

:slight_smile:

Twitter themselves have released code which they recommend using for parsing usernames and hash tags. I don’t know if it will do exactly as you want but it might be useful as a starting point at least. http://github.com/mzsanford/twitter-text-php

Excellent ScallioXTX. Thank you for your guidance!

chestertondevelopment I shall take a look. Thank you.

Though I want to try the ‘manual’ way of doing things not least as it will come in handy for other parts of this project not just the hashtags. Eg. removing links. I think anyway!

I may need some additional help - sorry.

$tweet='tweet #tweet diddly tweet #html';

while (($pos = strpos($tweet, '#')) !== false) {

	$spacepos= strpos($tweet, ' ')
	$tweet=substr_replace($var, '', $pos, $spacepos)
} 

Does this look right to you? How do I get out the loop?

No, $var should be $tweet :slight_smile:

Also, you should start looking for a space after the position of the #

So, $spacepos=strpos($tweet, ’ ', $pos);

Otherwise in a string ‘hello #hi bye’ it will find the space after ‘hello’, while that’s not what you want, since you want the space before ‘bye’. The code above will give you exactly that :slight_smile:

Also, you need to take into account that there doesn’t necessarily have to a space, if #something is at the end of the string there is no space after it.

So you need something like this:


while (($pos = strpos($tweet, '#')) !== false) {
  if ($spacepos = strpos($tweet, ' ', $pos)) {
    $tweet=substr_replace($tweet, '', // fill this in //, // fill this in //);
  } else {
    $tweet=substr_replace($tweet, '', // fill this in //);
  }
}
$tweet=trim($tweet);

I’ll leave the //fill this in// as an exercise to you :slight_smile:

You’d need to use preg_replace…

The problem is you’d need to decide on where to stop.

Looking at your code I’d use:


$tweet = 'tweet #tweet diddly tweet #html';
$tweet = preg_replace('/#([^ \\r\
\	]+)/', '', $tweet);
echo $tweet;

hth

Yup, you could also do that, although I’d replace [^ \r\
] with [^\s] since \s=[ \r\
], and I’d remove the backreference since we’re actually interested in what it says and don’t want to capture it for later use; we just want to remove it.
Also, if you remove everything up until the next space you run into the chance that you’ll end up with two consecutive spaces in the string, so I’d add an \s to the end as well, but make it optional since it doesn’t have to be there (i.e., at the end of the string).
Lastly, if there is a tag at the end of the string we could end up with a trailing space, but that’s nothing [fphp]rtrim[/fphp] can’t handle.


$tweet = 'tweet #tweet diddly tweet #html';
$tweet = rtrim( preg_replace('/#[^\\s]+\\s?/', '', $tweet) );
echo $tweet;

:slight_smile:

BTW. The solution using strpos is


$tweet = 'tweet #tweet diddly tweet #html';
while (($pos = strpos($tweet, '#')) !== false) {
  if ($spacepos = strpos($tweet, ' ', $pos)) {
    $tweet=substr_replace($tweet, '', $pos, $spacepos-$pos+1);
  } else {
    $tweet=substr_replace($tweet, '', $pos);
  }
}
$tweet=rtrim($tweet); 

Following chestertondevelopment’s suggestion, I’d recommend looking at existing solutions for inspiration.

Thank you ScallioXTX! Your code worked perfectly to do would I requested. I also learnt ALOT.

However, I still need to look into the existing solutions as I’ve run into some other issues.

Good, I’m glad to hear that :slight_smile: