PHP Find all occurences of a whole word from a text

PHP Find all occurences of a whole word from a text - php

Here is my problem, I have a text in PHP :
$text = "Car is going with 10 meters/second"
$find = array("meters","meters/second");
now when I do this :
foreach ($find as $f)
{
$count = substr_count($text,$f);
}
The output is :
meters -> 1
meters/second -> 1
Normally I consider meters/second as a whole word, so meters shouldn't be counted, only meters/second since no space seperates them
Thus What I Expect :
meters -> 0
meters/second -> 1

You can do it with a regular expression, \b won't work because / is a word boundary, but something like that should work:
preg_match_all(",meters([^/]|$),", $text, $matches);
print_r($matches[0]);

$exists = preg_match("/\bmeters\b/", $text) ;
\b stands for word boundary.

To do what you want, you will have to use regular expressions. Something like:
$text = "Car is going with 10 meters/second";
$find = array("/\bmeters\b/", "/\bmeters\/second\b/");
foreach($find as $f) {
print(preg_match_all($f, $text));
}

Related

php regex replace each character with asterisk

I am trying to something like this.
Hiding users except for first 3 characters.
EX)
apple -> app**
google -> goo***
abc12345 ->abc*****
I am currently using php like this:
$string = "abcd1234";
$regex = '/(?<=^(.{3}))(.*)$/';
$replacement = '*';
$changed = preg_replace($regex,$replacement,$string);
echo $changed;
and the result be like:
abc*
But I want to make a replacement to every single character except for first 3 - like:
abc*****
How should I do?

Don't use regex, use substr_replace:
$var = "abcdef";
$charToKeep = 3;
echo strlen($var) > $charToKeep ? substr_replace($var, str_repeat ( '*' , strlen($var) - $charToKeep), $charToKeep) : $var;
Keep in mind that regex are good for matching patterns in string, but there is a lot of functions already designed for string manipulation.
Will output:
abc***

Try this function. You can specify how much chars should be visible and which character should be used as mask:
$string = "abcd1234";
echo hideCharacters($string, 3, "*");
function hideCharacters($string, $visibleCharactersCount, $mask)
{
if(strlen($string) < $visibleCharactersCount)
return $string;
$part = substr($string, 0, $visibleCharactersCount);
return str_pad($part, strlen($string), $mask, STR_PAD_RIGHT);
}
Output:
abc*****

Your regex matches all symbols after the first 3, thus, you replace them with a one hard-coded *.
You can use
'~(^.{3}|(?!^)\G)\K.~'
And replace with *. See the regex demo
This regex matches the first 3 characters (with ^.{3}) or the end of the previous successful match or start of the string (with (?!^)\G), and then omits the characters matched from the match value (with \K) and matches any character but a newline with ..
See IDEONE demo
$re = '~(^.{3}|(?!^)\G)\K.~';
$strs = array("aa","apple", "google", "abc12345", "asdddd");
foreach ($strs as $s) {
$result = preg_replace($re, "*", $s);
echo $result . PHP_EOL;
}

Another possible solution is to concatenate the first three characters with a string of * repeated the correct number of times:
$text = substr($string, 0, 3).str_repeat('*', max(0, strlen($string) - 3));
The usage of max() is needed to avoid str_repeat() issue a warning when it receives a negative argument. This situation happens when the length of $string is less than 3.

Regex to delete words with numbers

I would like to delete words with numbers (reference) or small words (2 characters or less) into my product name but I can't find the good regex.
Some examples:
"Chaine anti-rebond ECS-2035" should become "Chaine anti-rebond"
"Guide 35 cm Oregon Intenz" should become "Guide Oregon Intenz"
"Tronçonneuse sans fil AKE 30 LI - Guide 30 cm 36 V" should become "Tronçonneuse sans fil AKE - Guide"
I'm doing this in PHP:
preg_replace('#([^A-Za-z-]+)#', ' ',' '.wd_remove_accents($modele).' ');

You don't need to do everything in RegExp you know:
<?php
$str = "Chaine anti-rebond ECS-2035 cm 30 v";
$result = array();
$split = explode(" ", $str); //Split to an array
foreach ($split as $word) {
if ((strlen($word) <= 2) || (preg_match("|\d|", $word))) { //If word is <= 2 char long, or contains a digit
continue; //Continue to next iteration immediately
}
$result[] = $word; //Add word to result array (would only happen if the above condition was false)
}
$result = implode(" ", $result); //Implode result back to string
echo $result;
For word based string manipulation, parsing the string itself, conditioning exactly what you want on a word basis, is often much better than a string-level RegExp.

To deal with unicode characters like in tronçonneuse you could use:
/\b(?:[\pL-]+\pN+|\pN+[\pL-]+|\pN+|\pL{1,2})\b/
where \pL stands for any letter and \pN stands for any digit.

Your requirements aren't specific enough for a final answer, but this would do it for your example:
$subject = 'Tronçonneuse sans fil AKE 30 LI - Guide 30 cm 36 V';
$regex = '/(\\s+\\w{1,2}(?=\\W+))|(\\s+[a-zA-Z0-9_-]+\\d+)/';
$result = preg_replace($regex, '', $subject);

Well, for the combinations in your example the following regex would do:
/\b(?:[-A-Za-z]+[0-9]+|[0-9]+[-A-Za-z]+|\d{1,2}|[A-Za-z]{1,2})\b/
Then just replace the match with an empty string.
However, it doesn't allow for strings like aaa897bbb - just aaa786 or 876aaa (and an optional dash).
I don't know what it is that you require - you would have to specify the rules in more detail before the regex can be refined.

Use preg_replace_callback and filter in the callback function http://www.php.net/manual/en/function.preg-replace-callback.php
This will work for all 3 test strings:
<?php
$str = "Tronçonneuse sans fil AKE 30 LI - Guide 30 cm 36 V";
function filter_cb($matches)
{
$word = trim($matches[0]);
if ($word !== '-' && (strlen($word) <= 2 || (preg_match("/\d/", $word)))) {
return '';
}
return $matches[0];
}
$result = preg_replace_callback('/([\p{L}\p{N}-]+\s*)/u', "filter_cb", $str);
echo trim($result);

Regex: Using capture data further in the regex

I want to parse some text that start with ":" and could be surround with parentheses to stop the match so:
"abcd:(someText)efgh" and
"abcd:someText"
will return someText.
but i have a problem to set the parentheses optionnal.
I make this but it does not works:
$reg = '#:([\\(]){0,1}([a-z]+)$1#i';
$v = 'abc:(someText)def';
var_dump(preg_match($reg,$v,$matches));
var_dump($matches);
The $1 makes it failed.
i don't know how to tell him :
If there is a "(" at the beginning, there must be ")" at the end.

You can't test if the count of something is equal to another count. It's a regex problem who can only be used with regular language (http://en.wikipedia.org/wiki/Regular_language). To achieve your goal, as you asked - and that is if there's a '(' should be a ')' -, you'll need a Context-Free Language (http://en.wikipedia.org/wiki/Context-free_language).
Anyway, you can use this regex:
'/:(\([a-z]+\)|[a-z]+)/i

To return the match of different sub-patterns in the regex to the same element of the $matches array, you can use named subpattern with the internal option J to allow duplicate names. The return element in $matches is the same as the name of the pattern:
$pattern = '~(?J:.+:\((?<text>[^)]+)\).*|.+:(?<text>.+))~';
$texts = array(
'abc:(someText)def',
'abc:someText'
);
foreach($texts as $text)
{
preg_match($pattern, $text, $matches);
echo $text, ' -> ', $matches['text'], '<br>';
}
Result:
abc:(someText)def -> someText
abc:someText -> someText
Demo

This regex will match either :word or :(word) groups 1 and 2 hold the respective results.
if (preg_match('/:([a-z]+)|\(([a-z]+)\)/i', $subject, $regs)) {
$result = ($regs[1])?$regs[1]:$regs[2];
} else {
$result = "";
}

regex: with look-behind
"(?<=:\(|:)[^()]+"
test with grep:
kent$ echo "abcd:(someText)efgh
dquote> abcd:someOtherText"|grep -Po "(?<=:\(|:)[^()]+"
someText
someOtherText

Try this
.+:\((.+)\).*|.+:(.+)
if $1 is empty there are no parentheses and $2 has your text.

Identifying a random repeating pattern in a structured text string

I have a string that has the following structure:
ABC_ABC_PQR_XYZ
Where PQR has the structure:
ABC+JKL
and
ABC itself is a string that can contain alphanumeric characters and a few other characters like "_", "-", "+", "." and follows no set structure:
eg.qWe_rtY-asdf or pkl123
so, in effect, the string can look like this:
qWe_rtY-asdf_qWe_rtY-asdf_qWe_rtY-asdf+JKL_XYZ
My goal is to find out what string constitutes ABC.
I was initially just using
$arrString = explode("_",$string);
to return $arrString[0] before I was made aware that ABC ($arrString[0]) itself can contain underscores, thus rendering it incorrect.
My next attempt was exlpoding it on "_" anyway and then comparing each of the exploded string parts with the first string part until I get a semblance of a pattern:
function getPatternABC($string)
{
$count = 0;
$pattern ="";
$arrString = explode("_", $string);
foreach($arrString as $expString)
{
if(strcmp($expString,$arrString[0])!==0 || $count==0)
{
$pattern = $pattern ."_". $arrString[$count];
$count++;
}
else break;
}
return substr($pattern,1);
}
This works great - but I wanted to know if there was a more elegant way of doing this using regular expressions?

Here is the regex solution:
'^([a-zA-Z0-9_+-]+)_\1_\1\+'
What this does is match (starting from the beginning of the string) the longest possible sequence consisting of the characters inside the square brackets (edit that per your spec). The sequence must appear exactly twice, each time followed by an underscore, and then must appear once more followed by a plus sign (this is actually the first half of PQR with the delimiter before JKL). The rest of the input is ignored.
You will find ABC captured as capture group 1.
So:
$input = 'qWe_rtY-asdf_qWe_rtY-asdf_qWe_rtY-asdf+JKL_XYZ';
$result = preg_match('/^([a-zA-Z0-9_+-]+)_\1_\1\+/', $input, $matches);
if ($result) {
echo $matches[2];
}
See it in action.

Sure, just make a regular expression that matches your pattern. In this case, something like this:
preg_match('/^([a-zA-Z0-9_+.-]+)_\1_\1\+JKL_XYZ$/', $string, $match);
Your ABC is in $match[1].

If the presence of underscores in these strings has a low frequency, it may be worth checking to see if a simple explode() will do it before bothering with regex.
<?php
$str = 'ABC_ABC_PQR_XYZ';
if(substr_count($str, '_') == 3)
$abc = reset(explode('_', $str));
else
$abc = regexy_function($str);
?>

Auto-link URLs in a string

I have a normal message output $msg. I want it to make it links, if it is links. (containing http:// or www.) then it should make it http://google.com
I have stripped html from the messages
$msg = htmlspecialchars(strip_tags($show["status"]), ENT_QUOTES, 'utf-8')
How can that be done, seen it many places.

I had the same problem like #SublymeRick (stops after first dot, see Auto-link URLs in a string).
With a little inspiration from https://stackoverflow.com/a/8218223/593957 I changed it to
$msg = preg_replace('/((http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])?)/', '\1', $msg);

Use a regular expression for this, via PHP's preg_replace() function.
Something like this....
preg_replace('/\b(https?:\/\/(.+?))\b/', '\1', $text);
Explaination:
Looks for (https?://(.+?)) surrounded by \b, which is a beginning-of-word / end-of-word marker.
https?:// is obvious (the s? means that the 's' is optional).
(.+?) means any number of any characters: 'any character' is represented by the dot; 'any number of' is the plus sign. The question mark means it isn't greedy, so it will allow the item after it (ie the \b end of word) to match at the first opportunity. This stops it just carrying on till the end of the string.
The whole expression is in brackets so that it gets picked up the the replacement system and can be re-inserted using \1 in the second parameter.

Something like:
preg_replace('#(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)#', '$1', $text);
maybe?

enter code h function AutoLinkUrls($str,$popup = FALSE){
if (preg_match_all("#(^|\s|\()((http(s?)://)|(www\.))(\w+[^\s\)\<]+)#i", $str, $matches)){
$pop = ($popup == TRUE) ? " target=\"_blank\" " : "";
for ($i = 0; $i < count($matches['0']); $i++){
$period = '';
if (preg_match("|\.$|", $matches['6'][$i])){
$period = '.';
$matches['6'][$i] = substr($matches['6'][$i], 0, -1);
}
$str = str_replace($matches['0'][$i],
$matches['1'][$i].'<a href="http'.
$matches['4'][$i].'://'.
$matches['5'][$i].
$matches['6'][$i].'"'.$pop.'>http'.
$matches['4'][$i].'://'.
$matches['5'][$i].
$matches['6'][$i].'</a>'.
$period, $str);
}//end for
}//end if
return $str;
}//end AutoLinkUrlsere

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Find all occurences of a whole word from a text - php

You can do it with a regular expression, \b won't work because / is a word boundary, but something like that should work: preg_match_all(",meters([^/]|$),", $text, $matches); print_r($matches[0]);

$exists = preg_match("/\bmeters\b/", $text) ; \b stands for word boundary.

To do what you want, you will have to use regular expressions. Something like: $text = "Car is going with 10 meters/second"; $find = array("/\bmeters\b/", "/\bmeters\/second\b/"); foreach($find as $f) { print(preg_match_all($f, $text)); }

Related

php regex replace each character with asterisk

Regex to delete words with numbers

Regex: Using capture data further in the regex

Identifying a random repeating pattern in a structured text string

Auto-link URLs in a string

Categories

Resources