I would like to extract the last 4 words from a string as a single chunk.
For example, if I have:
"Are You Looking For The Best Website Review"
I'd like to catch the last four words as:
"The Best Website Review"
I only have basic coding knowledge and have tried every variation I could find within this forum.
The closest I've come is by using the suggestion of Rick Kuipers (How to obtain the last word of a string) but this gives me the words as individual values.
$split = explode(" ", $str);
echo $split[count($split)-4];
echo $split[count($split)-3];
echo $split[count($split)-2];
echo $split[count($split)-1];
My coding knowledge is limited so any suggestions would be appreciated.
Simply use str_word_count along with the array_splice like as
$str = "Are You Looking For The Best Website Review";
$arr = str_word_count($str,1);
echo implode(' ',array_splice($arr,-4));
Edited
If your text contains Are-You-Looking-For-The-Best-Website-Review hiphen the you can use str_replace like as
$arr = str_word_count(str_replace('-',' ',$str),1);
echo implode(' ',array_splice($arr,-4));
Demo
You can use a simple regular expression for this:
<?php
$subject = "Are You Looking For The Best Website Review";
$pattern = '/(\w+\s+\w+\s+\w+\s+\w+)\s*$/u';
preg_match($pattern, $subject, $tokens);
var_dump($tokens[0]);
The output is:
string(23) "The Best Website Review"
This also has the advantage that the type and number of whitespaces between words is irrelevant and trailing whitespaces at the end of the string are ignored.
Edit: considering your comment to the answer posted by #Uchiha this is a variant of the pattern that shows how you can easily match words delimited by other characters apart from whitespaces:
$pattern = '/(\w+[-\s]+\w+[-\s]+\w+[-\s]+\w+)\s*$/u';
Related
Imagine if:
$string = "abcdabcdabcdabcdabcdabcdabcdabcd";
How do I remove the repeated sequence of characters (all characters, not just alphabets) in the string so that the new string would only have "abcd"? Perhaps running a function that returns a new string with removed repetitions.
$new_string = remove_repetitions($string);
The possible string before removing the repetition is always like above. I don’t know how else to explain since English is not my first language. Other examples are,
$string = “EqhabEqhabEqhabEqhabEqhab”;
$string = “o=98guo=98guo=98gu”;
Note that I want it to work with other sequence of characters as well. I tried using Regex but I couldn't figure out a way to accomplish it. I am still new to php and Regex.
For details : https://algorithms.tutorialhorizon.com/remove-duplicates-from-the-string/
In different programming have a different way to remove the same or duplicate character from a string.
Example: In PHP
<?php
$str = "Hello World!";
echo count_chars($str,3);
?>
OutPut : !HWdelor
https://www.w3schools.com/php/func_string_count_chars.asp
Here, if we wish to remove the repeating substrings, I can't think of a way other than knowing what we wish to collect since the patterns seem complicated.
In that case, we could simply use a capturing group and add our desired output in it the remove everything else:
(abcd|Eqhab|guo=98)
I'm guessing it should be simpler way to do this though.
Test
$re = '/.+?(abcd|Eqhab|guo=98)\1.+/m';
$str = 'abcdabcdabcdabcdabcdabcdabcdabcd
EqhabEqhabEqhabEqhabEqhab
o98guo=98guo=98guo=98guo=98guo=98guo=98guo98';
$subst = '$1';
$result = preg_replace($re, $subst, $str);
echo $result;
Demo
You did not tell what exactly to remove. A "sequnece of characters" can be as small as just 1 character.
So this simple regex should work
preg_replace ( '/(.)(?=.*?\1)/g','' 'abcdabcdabcdabcdabcdabcd');
I can't tell you guys how many hours I've spent on this one. I simply want to IGNORE any instances of keywords that are BETWEEN the strong tags. Whether they are directly next to the tags or somewhere in between with other words. All while keeping the keywords case-insensitive.
Example:
The man drove in his car. Then <strong>the man walked to the boat.</strong>
The word boat should be ignored and Car should be replaced.
$keywords = array(
'boat',
'car',
);
$p = implode('|', array_map('preg_quote', $keywords));
$string = preg_replace("/\b($p)\b/i", 'gokart', $string, 4);
You can use a SKIP-FAIL regex for to only replace something that is clearly outside on non-identical delimiters:
<strong>.*?<\/strong>(*SKIP)(*FAIL)|\b(boat|car)\b
See demo
Here is an IDEONE demo:
$str = "The man drove in his car.Then <strong>the man walked to the boat.</strong>";
$keywords = array('boat','car');
$p = implode('|', array_map('preg_quote', $keywords));
$result = preg_replace("#<strong>.*?<\/strong>(*SKIP)(*FAIL)|\b($p)\b#i", "gokart", $str);
echo $result;
NOTE that in this case, we most probably are not interested in a tempered greedy token solution inside the SKIP-FAIL block (that I posted initially, see revision history) since we do not care what is in between the delimiters.
Regex is probably not the best way to do something like this.
It would probably be best to use a DOM parser or something similar to properly find the <strong> tags.
A few of the answers here offer some good options: RegEx: Matching text that is not inside and part of a HTML tag
How can I replace a string starting with 'a' and ending with 'z'?
basically I want to be able to do the same thing as str_replace but be indifferent to the values in between two strings in a 'haystack'.
Is there a built in function for this? If not, how would i go about efficiently making a function that accomplishes it?
That can be done with Regular Expression (RegEx for short).
Here is a simple example:
$string = 'coolAfrackZInLife';
$replacement = 'Stuff';
$result = preg_replace('/A.*Z/', $replacement, $string);
echo $result;
The above example will return coolStuffInLife
A little explanation on the givven RegEx /A.*Z/:
- The slashes indicate the beginning and end of the Regex;
- A and Z are the start and end characters between which you need to replace;
- . matches any single charecter
- * Zero or more of the given character (in our case - all of them)
- You can optionally want to use + instead of * which will match only if there is something in between
Take a look at Rubular.com for a simple way to test your RegExs. It also provides short RegEx reference
$string = "I really want to replace aFGHJKz with booo";
$new_string = preg_replace('/a[a-zA-z]+z/', 'boo', $string);
echo $new_string;
Be wary of the regex, are you wanting to find the first z or last z? Is it only letters that can be between? Alphanumeric? There are various scenarios you'd need to explain before I could expand on the regex.
use preg_replace so you can use regex patterns.
I am trying to convert specific keywords in text, which are stored in array, to the links.
Example text:
$text='This text contains many keywords, but also formated keywords.'
So now I want to convert the word keywords to the #keywords.
I used the very simple preg_replace function
preg_replace('/keywords/i',' keywords ',$text);
but obviously it converts to link also the string already formatted as a link, so I get a messy html like:
$text='This text contains many keywords, but also formated keywords" title="keywords">keywords</a>.'
Expected result:
$text='This text contains many keywords, but also formated keywords.'
Any suggestions?
THX
EDIT
We are one step from the perfect function, but still not working well in this case:
$text='This text contains many keywords, but also formated
keywords.'
In this case it replaces also the word keywords in the href, so we again get the messy code like
keywords.com/keywords" title="keywords">keywords</a>
I'm not great with regular expressions, but maybe this one will work:
/[^#>"]keywords/i
What I think it will do is ignore any instances of #keywords, >keywords, and "keywords and find the rest.
EDIT:
After testing it out, it looks like that replaces the space before the word as well, and doesn't work if keywords is the beginning of the string. It also didn't preserve original capitalization. I have tested this one, and it works perfectly for me:
$string = "Keywords and keywords, plus some more keywords with the original keywords.";
$string = preg_replace("/(?<![#>\"])keywords/i", "$0", $string);
echo $string;
The first three are replaced, preserving the original capitalization, and the last one is left untouched. This one uses a negative lookbehind and backreferences.
EDIT 2:
OP edited question. With the new example provided, the following regex will work:
$string = 'This text contains many keywords, but also formated keywords.';
$string = preg_replace("/(?<![#>\".\/])keywords/i", "$0", $string);
echo $string;
// outputs: This text contains many keywords, but also formated keywords.
This will replace all instances of keywords that are not preceded by #, >, ", ., or /.
Here is the problem:
The keyword could be inside the href, the title, or the text of the link, and anywhere in there (like if the keyword was sanity and you already had href="insanity". Or even worse, you could have a non-keyword link that happens to contain a keyword, something like:
Click here to find more keywords and such!
In the above example, even though it fits every other possible criteria (it's got spaces before and after being the easiest one to test for), it still would result in a link within a link, which I think breaks the internet.
Because of this, you need to use lookaheads and lookbehinds to check if the keyword is wrapped in a link. But there is one catch: lookbehinds have to have a defined pattern (meaning no wild cards).
I thought I'd be the hero and show you the easy fix for your issue, which would be something to the effect of:
'/(?<!\<a.?>)[list|of|keywords](?!\<\/a>)/'
Except you can't do that because the lookbehind in this case has that wildcard. Without it, you end up with a super greedy expression.
So my proposed alternative is to use regex to find all link elements, then str_replace to swap them out with a placeholder, and then replacing them with the placeholder at the end.
Here's how I did it:
$text='This text contains many keywords, but also formated keywords.';
$keywords = array('text', 'formatted', 'keywords');
//This is just to make the regex easier
$keyword_list_pattern = '['. implode($keywords,"|") .']';
// First, get all matching keywords that are inside link elements
preg_match_all('/<a.*' . $keyword_list_pattern . '.*<\/a>/', $text, $links);
$links = array_unique($links[0]); // Cleaning up array for next step.
// Second, swap out all matches with a placeholder, and build restore array:
foreach($links as $count => $link) {
$link_key = "xxx_{$count}_xxx";
$restore_links[$link_key] = $link;
$text = str_replace($link, $link_key, $text);
}
// Third, we build a nice replacement array for the keywords:
foreach($keywords as $keyword) {
$keyword_links[$keyword] = "<a href='#$keyword'>$keyword</a>";
}
// Merge the restore links to the bottom of the keyword links for one mass replacement:
$keyword_links = array_merge($keyword_links, $restore_links);
$text = str_replace(array_keys($keyword_links), $keyword_links, $text);
echo $text;
You can change your RegEx so that it only targets keywords with a space in front. Since the formatted keywords do no contain a space. Here is an example.
$text = preg_replace('/ keywords/i',' keywords',$text);
I am being "thick this morning" so please excuse this simple question - I have an array of keywords e.g. array('keyword1','keyword2'....) and I have a string of text - (bit like a blog content in length i.e. not just a few words but may be 200-800 words) what is the best way to search the string for the keywords and replace them with an href link. So in the text 'keyword 1' (as plain text) will become <a href='apage'>keyword1</a> and so on.
See said was being thick this am.
Thanks in adavance.
Typical preg_replace case:
$text = "There is some text keyword1 and lorem ipsum keyword2.";
$keywords = array('keyword1', 'keyword2');
$regex = '/('.implode('|', $keywords).')/i';
// You might want to make the replacement string more dependent on the
// keyword matched, but you 'll have to tell us more about it
$output = preg_replace($regex, '\\1', $text);
print_r($output);
See it in action.
Now the above doesn't do a very "smart" replace in the sense that the href is not a function of the matched keyword, while in practice you will probably want to do that. Look into preg_replace_callback for more flexibility here, or edit the question and provide more information regarding your goal.
WHY would you use regex instead of just str_replace() !? Regex works, but it over complicates such an incredibly simple question.