I have a list of keywords that I want to try and exact from a string that will contain html. I only want to get the exact word on it’s own, i.e.
String = “This is a long string <a href=”“>cat</a> link. I love cats. I love cat.”;
Word = “cat”;
I only want to get the word “cat” when it’s on it’s own. I don’t want it when it’s apart of a link so or apart of another word cats. Just “cat”. I will probably need to also check for punctuation like fullstops and commas.
You could feed it back into itself with an if/else statement.
First you’ll do a preg_match for “href” and if it appears, then the script ends because it’s a link, which you don’t want. If it does not appear, then you do the preg_match again for “cat”.
So it will return true for “cat”, “cats”, and “tabby-cat.jpg” but not <a href=“cat.html”>
I don’t think either of those solutions will help my problem. The string could contain pretty much anything (any html, or any random stuff- it’s a users wordpress post). The process is to go through the whole post looking for only the single word on it’s own- I think the second solution is closer to what I want, but not exactly since a user could have already linked “cat” and I don’t want to remove it, and then link it again. I only want to link the word if it’s on it’s own and untouched (if that makes sense).
So to clarify the process is:
Get the user’s submitted WordPress post which can contain HTML.
Look through the post for single keywords (in this case, the example was cat).
If the cat is completely on it’s own (no html surrounding it, not apart of another word) then link it.
I ended up using a solution that involves DomDocument. It seems to work well so far, I just need to test it using lots of different scenarios to see if it breaks. Thanks again