How to use preg_replace to hide parts of title - php

I tried explaining this but I don't think anyone understood,
I have a lot of titles which are all different, what I'm trying to do, is get rid of every "Lyrics" word in the title, and also, everything that appears before the "-" symbol.
I managed to do the lyrics part with istr_replace
<?php echo str_ireplace('lyrics', '', get_the_title()); ?>
How can I do the second part, what can I apply to the code to make it so?
Example of what I want it to do:
"random title - more random title lyrics" turns to "more random title"
The code applied, would delete the "random title - " on every single title on my website,
I was previously given this by someone here, Idk if this would help
$string = preg_replace("/[^-]+-(.*) Lyrics/", "$1", $string);

If you're trying to do it all in one regexp, that's probably asking too much. Why not break it up into two steps: remove everything up through - (and following spaces)
$string = preg_replace('/^[^-]*-\s*/, '', $string);
and then remove the first "lyrics" (case insensitive), if any
$string = preg_replace('/lyrics/i', '', $string);
This can get more complicated -- what if there are no or multiple hyphens, what if there are no or multiple "lyrics", what do you want to do about spaces or punctuation after the hyphen and around "lyrics", can "lyrics" be part of another word, what about arbitrary capitalization of "lyrics", etc.? You need to make sure it works as expected in all these cases.

Related

Regex to find hashtag in string - without taking the initial hashtag symbol

I'm trying to do this in PHP and I am just wondering as I'm not great with Regex.
I'm trying to find all hashtags in a string, and wrap them in a link to twitter. In order to do this I need the content of the hashtag, without the symbol.
I want to select the #hashtag - without the preceding # => Just to return hashtag?
I'd like to do it in one line but I'm doing a preg_replace, followed by a string replace as shown:
$string = preg_replace('/\B#([a-z0-9_-]+)/i', '$0 ', $string);
$string = str_replace('https://twitter.com/hashtag/#', 'https://twitter.com/hashtag/', $string);
Any guidance is apprecaited!
I was using a regex tester and found the answer.
preg_replace was returning two values, one $0 with the #hashtag value, and $1 with the hashtag value - without the # symbol.
Tested here (select preg_replace): http://www.phpliveregex.com/p/kOn
Perhaps it is something to do with the regex itself I'm not sure. Hopefully this helps someone else too.
My one liner is:
$string = preg_replace('/\B#([a-z0-9_-]+)/i', '$0 ', $string);
Edit: I understand it now. The added brackets ( ) around the square brackets effectively return the $1 variable. Otherwise the whole pattern is $0.

Regex trying to get a match in php

Im trying to make a regex pattern for strings that contain [[Title#Night|Anchor]] or just [[Title|Anchor]] and extract Title and Anchor. Basically two variables, first part between [[ and | and second part between | and ]], no matter what type of characters are inside (excluding \n, \r).
I had tried writing different patterns and none worked like I wanted. The code can be seen here with a sample content that I need to apply to.
\[\[(.*?)|(.*?)\]\]
may this help you:
$str = ' [[Title#Night|Anchor]] ';
preg_match('/\[\[([\s\S]*?)\]\]/',$str,$match);
print_r(explode('|',$match[1]));
update:
preg_match('/\[\[([\s\S]*?)\|([\s\S]*?)\]\]/',$str,$match);
print_r($match);
update 2:
preg_match('/\[\[(.*?)\|(.*?)\]\]/',$str,$match);
print_r($match);
update 3:
preg_match('/\[\[([^|\n\r]*)\|([^\]\n\r]*)\]\]/',$str,$match);
print_r($match);
The following should work for alphanumeric entries, depends what you want Title or Anchor to contain.
\[\[([a-zA-Z0-9#]*)\|([a-zA-Z0-9]*)\]\]

PHP regex to match expressions not containg a pattern

I have the following text:
"This is a test text. Test, comma instead of space."
I iterate through each word and want to replace each word to a distinct link. Let's say
wordToReplace
My problem is that consecutive matches of the word "test" (to use the above example) replace the href and anchor text so I'm left with links inside links which is not good at all.
(to give a base idea of my problem this is what I'm left with. It has some additional markup.)
This is a<a href="/index.php?r<a href="/index.php?r=texts/addWord"
class="wordLink info label" id="yt2">text</a>/addWord" class="wordLink
info label" id="yt1">test</a>text.<a href="/index.php?r=texts/addWord"
class="wordLink info label" id="yt3">Test</a><a
href="/index.php?r=texts/addWord" class="wordLink info label"
id="yt4">comma</a>instead of<a href="/index.php?r=texts/addWord"
class="wordLink info label" id="yt6">space</a>
I'm trying
preg_replace("/[^\">]".$word."[^\"<]/", $link, $text->text);
but I don't think I'm on the right track at all.
Thanks for the time
From your one line of code, I'm assuming that you have that line in a loop which iterates through all the $words you want to replace, which causes the problem.
What you need to do is put all those replacements into only one preg_replace call. For that, regexes provide alternatives. So say, your list of words consisted of test, text and this. Then you could do:
preg_replace('/(test|text|this)/', $link, $text->text);
And if you have all your words in an array $words then you can generate the regex simply with:
$wordList = implode('|', $words);
preg_replace('/('.$wordList.')/', $link, $text->text);
You might want to add an i at the very end of your regex, if your checks are supposed to be case-insenstive.
In case some of your words are parts of other words (e.g. you want to replace text and texture), you could check for word boundaries:
preg_replace('/\b('.$wordList.')\b/', $link, $text->text);
Is this what you are looking for?
EDIT: If your $link was pre-generated for every $word in your loop before, you can now replace that word with $1. If your $link was something like
$link = ''.$word.'';
you can now simply use
$link = '$1';

preg_replace proper regex for repeating character

OK I'm stuck again, this time it's a problem with the regex... Was searching google, was searching SO, but there wasn't a post that made me happy... So to make a long story short:
§text = Database entry string -> could be everything
$text gets parsed and the regex should replace everything between 2 * with:
[bla].$matchedtext.[blub]
So I've tried to find the right regex for that and that's what I came up with:
$text= preg_replace('~(/\*([^\"]*?)\*/)~', "$1<b>$2</b>", $text);
And the 2 * per match should disappear as well :/...
Obviously it doesn't work, elsewhise I wouldn't post :D -> Any ideas?
This should probably do it:
preg_replace('/\*([^"*]*)\*/', '<b>\1</b>', $text);
A few comments on your earlier regular expression:
[^\"]*?
The non-greedy * is not necessary; when you're looking at a negative character set, simply add the '*' inside the character set. Also, the double quote doesn't need escaping.
[^"*]*
You only need memory groups for things you wish to remember; in your case, you don't want to know that you matched a beginning and ending asterisk. So you can do your whole matching with just one memory group.

Automatically convert keywords to links in php

I am trying to convert specific keywords in text, which are stored in array, to the links.
Example text:
$text='This text contains many keywords, but also formated keywords.'
So now I want to convert the word keywords to the #keywords.
I used the very simple preg_replace function
preg_replace('/keywords/i',' keywords ',$text);
but obviously it converts to link also the string already formatted as a link, so I get a messy html like:
$text='This text contains many keywords, but also formated keywords" title="keywords">keywords</a>.'
Expected result:
$text='This text contains many keywords, but also formated keywords.'
Any suggestions?
THX
EDIT
We are one step from the perfect function, but still not working well in this case:
$text='This text contains many keywords, but also formated
keywords.'
In this case it replaces also the word keywords in the href, so we again get the messy code like
keywords.com/keywords" title="keywords">keywords</a>
I'm not great with regular expressions, but maybe this one will work:
/[^#>"]keywords/i
What I think it will do is ignore any instances of #keywords, >keywords, and "keywords and find the rest.
EDIT:
After testing it out, it looks like that replaces the space before the word as well, and doesn't work if keywords is the beginning of the string. It also didn't preserve original capitalization. I have tested this one, and it works perfectly for me:
$string = "Keywords and keywords, plus some more keywords with the original keywords.";
$string = preg_replace("/(?<![#>\"])keywords/i", "$0", $string);
echo $string;
The first three are replaced, preserving the original capitalization, and the last one is left untouched. This one uses a negative lookbehind and backreferences.
EDIT 2:
OP edited question. With the new example provided, the following regex will work:
$string = 'This text contains many keywords, but also formated keywords.';
$string = preg_replace("/(?<![#>\".\/])keywords/i", "$0", $string);
echo $string;
// outputs: This text contains many keywords, but also formated keywords.
This will replace all instances of keywords that are not preceded by #, >, ", ., or /.
Here is the problem:
The keyword could be inside the href, the title, or the text of the link, and anywhere in there (like if the keyword was sanity and you already had href="insanity". Or even worse, you could have a non-keyword link that happens to contain a keyword, something like:
Click here to find more keywords and such!
In the above example, even though it fits every other possible criteria (it's got spaces before and after being the easiest one to test for), it still would result in a link within a link, which I think breaks the internet.
Because of this, you need to use lookaheads and lookbehinds to check if the keyword is wrapped in a link. But there is one catch: lookbehinds have to have a defined pattern (meaning no wild cards).
I thought I'd be the hero and show you the easy fix for your issue, which would be something to the effect of:
'/(?<!\<a.?>)[list|of|keywords](?!\<\/a>)/'
Except you can't do that because the lookbehind in this case has that wildcard. Without it, you end up with a super greedy expression.
So my proposed alternative is to use regex to find all link elements, then str_replace to swap them out with a placeholder, and then replacing them with the placeholder at the end.
Here's how I did it:
$text='This text contains many keywords, but also formated keywords.';
$keywords = array('text', 'formatted', 'keywords');
//This is just to make the regex easier
$keyword_list_pattern = '['. implode($keywords,"|") .']';
// First, get all matching keywords that are inside link elements
preg_match_all('/<a.*' . $keyword_list_pattern . '.*<\/a>/', $text, $links);
$links = array_unique($links[0]); // Cleaning up array for next step.
// Second, swap out all matches with a placeholder, and build restore array:
foreach($links as $count => $link) {
$link_key = "xxx_{$count}_xxx";
$restore_links[$link_key] = $link;
$text = str_replace($link, $link_key, $text);
}
// Third, we build a nice replacement array for the keywords:
foreach($keywords as $keyword) {
$keyword_links[$keyword] = "<a href='#$keyword'>$keyword</a>";
}
// Merge the restore links to the bottom of the keyword links for one mass replacement:
$keyword_links = array_merge($keyword_links, $restore_links);
$text = str_replace(array_keys($keyword_links), $keyword_links, $text);
echo $text;
You can change your RegEx so that it only targets keywords with a space in front. Since the formatted keywords do no contain a space. Here is an example.
$text = preg_replace('/ keywords/i',' keywords',$text);

Categories