Regex to delete words with numbers - php

I would like to delete words with numbers (reference) or small words (2 characters or less) into my product name but I can't find the good regex.
Some examples:
"Chaine anti-rebond ECS-2035" should become "Chaine anti-rebond"
"Guide 35 cm Oregon Intenz" should become "Guide Oregon Intenz"
"Tronçonneuse sans fil AKE 30 LI - Guide 30 cm 36 V" should become "Tronçonneuse sans fil AKE - Guide"
I'm doing this in PHP:
preg_replace('#([^A-Za-z-]+)#', ' ',' '.wd_remove_accents($modele).' ');

You don't need to do everything in RegExp you know:
<?php
$str = "Chaine anti-rebond ECS-2035 cm 30 v";
$result = array();
$split = explode(" ", $str); //Split to an array
foreach ($split as $word) {
if ((strlen($word) <= 2) || (preg_match("|\d|", $word))) { //If word is <= 2 char long, or contains a digit
continue; //Continue to next iteration immediately
}
$result[] = $word; //Add word to result array (would only happen if the above condition was false)
}
$result = implode(" ", $result); //Implode result back to string
echo $result;
For word based string manipulation, parsing the string itself, conditioning exactly what you want on a word basis, is often much better than a string-level RegExp.

To deal with unicode characters like in tronçonneuse you could use:
/\b(?:[\pL-]+\pN+|\pN+[\pL-]+|\pN+|\pL{1,2})\b/
where \pL stands for any letter and \pN stands for any digit.

Your requirements aren't specific enough for a final answer, but this would do it for your example:
$subject = 'Tronçonneuse sans fil AKE 30 LI - Guide 30 cm 36 V';
$regex = '/(\\s+\\w{1,2}(?=\\W+))|(\\s+[a-zA-Z0-9_-]+\\d+)/';
$result = preg_replace($regex, '', $subject);

Well, for the combinations in your example the following regex would do:
/\b(?:[-A-Za-z]+[0-9]+|[0-9]+[-A-Za-z]+|\d{1,2}|[A-Za-z]{1,2})\b/
Then just replace the match with an empty string.
However, it doesn't allow for strings like aaa897bbb - just aaa786 or 876aaa (and an optional dash).
I don't know what it is that you require - you would have to specify the rules in more detail before the regex can be refined.

Use preg_replace_callback and filter in the callback function http://www.php.net/manual/en/function.preg-replace-callback.php
This will work for all 3 test strings:
<?php
$str = "Tronçonneuse sans fil AKE 30 LI - Guide 30 cm 36 V";
function filter_cb($matches)
{
$word = trim($matches[0]);
if ($word !== '-' && (strlen($word) <= 2 || (preg_match("/\d/", $word)))) {
return '';
}
return $matches[0];
}
$result = preg_replace_callback('/([\p{L}\p{N}-]+\s*)/u', "filter_cb", $str);
echo trim($result);

Related

php regex replace each character with asterisk

I am trying to something like this.
Hiding users except for first 3 characters.
EX)
apple -> app**
google -> goo***
abc12345 ->abc*****
I am currently using php like this:
$string = "abcd1234";
$regex = '/(?<=^(.{3}))(.*)$/';
$replacement = '*';
$changed = preg_replace($regex,$replacement,$string);
echo $changed;
and the result be like:
abc*
But I want to make a replacement to every single character except for first 3 - like:
abc*****
How should I do?
Don't use regex, use substr_replace:
$var = "abcdef";
$charToKeep = 3;
echo strlen($var) > $charToKeep ? substr_replace($var, str_repeat ( '*' , strlen($var) - $charToKeep), $charToKeep) : $var;
Keep in mind that regex are good for matching patterns in string, but there is a lot of functions already designed for string manipulation.
Will output:
abc***
Try this function. You can specify how much chars should be visible and which character should be used as mask:
$string = "abcd1234";
echo hideCharacters($string, 3, "*");
function hideCharacters($string, $visibleCharactersCount, $mask)
{
if(strlen($string) < $visibleCharactersCount)
return $string;
$part = substr($string, 0, $visibleCharactersCount);
return str_pad($part, strlen($string), $mask, STR_PAD_RIGHT);
}
Output:
abc*****
Your regex matches all symbols after the first 3, thus, you replace them with a one hard-coded *.
You can use
'~(^.{3}|(?!^)\G)\K.~'
And replace with *. See the regex demo
This regex matches the first 3 characters (with ^.{3}) or the end of the previous successful match or start of the string (with (?!^)\G), and then omits the characters matched from the match value (with \K) and matches any character but a newline with ..
See IDEONE demo
$re = '~(^.{3}|(?!^)\G)\K.~';
$strs = array("aa","apple", "google", "abc12345", "asdddd");
foreach ($strs as $s) {
$result = preg_replace($re, "*", $s);
echo $result . PHP_EOL;
}
Another possible solution is to concatenate the first three characters with a string of * repeated the correct number of times:
$text = substr($string, 0, 3).str_repeat('*', max(0, strlen($string) - 3));
The usage of max() is needed to avoid str_repeat() issue a warning when it receives a negative argument. This situation happens when the length of $string is less than 3.

Regex to remove everything but numbers and one character

I need to remove everything but numbers and, if exists one character from a string. It's a street name I need to extract the house number of. It is possible that there is some more content after the string, but not neccessarely.
The original string is something like
Wagnerstrasse 3a platz53,eingang 3,Zi.3005
I extract the street with number like this:
preg_match('/^([^\d]*[^\d\s]) *(\d.*)$/', $address, $match);
Then, I do an if statement on "Wagnerstrasse 3a"
if (preg_replace("/[^0-9]/","",$match[2]) == $match[2])
I need to change the regex in order to get one following letter too, even if there is a space in between, but only if it is a single letter so that my if is true for this condition / Better a regex that just removes everything but below:
Wagnerstrasse 3a <-- expected result: 3a
Wagnerstrasse 3 a <--- expected result 3 a
Wagnerstrasse 3 <--- expected result 3
Wagnerstrasse 3 a bac <--- expected result 3 a
You can try something like this that uses word boundaries:
preg_match('~\b\d+(?: ?[a-z])?\b~', $txt, $m)
The letter is in an optional group with an optional space before. Even if there is no letter the last word boundary will match with the digit and what follows (space, comma, end of the string...).
Note: to avoid a number in the street name, you can try to anchor your pattern at the first comma in a lookahead, for example:
preg_match('~\b\d+(?: ?[a-z])?\b(?= [^\s]*,)~', $txt, $m)
I let you to improve this subpattern with your cases.
<?php
$s1 = 'Wagnerstrasse 3 platz53,eingang 3,Zi.3005';
$s2 = 'Wagnerstrasse 3a platz53,eingang 3,Zi.3005';
$s3 = 'Wagnerstrasse 3A platz53,eingang 3,Zi.3005';
$s4 = 'Wagnerstrasse 3 a platz53,eingang 3,Zi.3005';
$s5 = 'Wagnerstrasse 3 A platz53,eingang 3,Zi.3005';
//test all $s
preg_match('#^(.+? [0-9]* *[A-z]?)[^A-z]#', $s1, $m);
//if you want only the street number
//preg_match('#^.+? ([0-9]* *[A-z]?)[^A-z]#', $s1, $m);
echo $m[1];
?>
After doing some more research and hours of checking addresses (so many addresses) on the topic I found a solution which, until now, didn't fail. Might be that I didn't realize it, but it seems to be quite good. And it's a regex one has not seen before... The regex fails if there are no numbers in the line. So I did some hacking (mention the millions of nines...)
Basically the regex is excellent for finding numbers at the end and preserves numbers in the middle of the text but fails for above mentionend fact and if the street starts with a number. So I did just another little hack and explode the first number to the back and catch it as number.
if ($this->startsWithNumber($data))
{
$tmp = explode(' ', $data);
$data = trim(str_replace($tmp[0], '', $data)) . ' ' . $tmp[0];
}
if (!preg_match('/[0-9]/',$data))
{
$data .= ' 99999999999999999999999999999999999999999999999999999999999999999999999';
}
$data = preg_replace("/[^ \w]+/",'',$data);
$pcre = '/\A\s*
(.*?) # street
\s*
\x2f? # slash
(
\pN+\s*[a-zA-Z]? # number + letter
(?:\s*[-\x2f\pP]\s*\pN+\s*[a-zA-Z]?)* # cut
) # number
\s*\z/ux';
preg_match($regex, $data, $h);
$compare = strpos($h[2],'999999999999999999999999999999999999999999999999999999999999999999999999');
if ($compare !== false) {
$h[2] = null;
}
$this->receiverStreet[] = (isset($h[1])) ? $h[1] : null;
$this->receiverHouseNo[] = (isset($h[2])) ? $h[2] : null;
public function startsWithNumber($str)
{
return preg_match('/^\d/', $str) === 1;
}

Unicode (UTF8) string word count in PHP

I need to have the word count of the following unicode string. Using str_word_count:
$input = 'Hello, chào buổi sáng';
$count = str_word_count($input);
echo $count;
the result is
7
which is aparentley wrong.
How to get the desired result (4)?
$tags = 'Hello, chào buổi sáng';
$word = explode(' ', $tags);
echo count($word);
Here's a demo: http://codepad.org/667Cr1pQ
Here is a quick and dirty regex-based (using Unicode) word counting function:
function mb_count_words($string) {
preg_match_all('/[\pL\pN\pPd]+/u', $string, $matches);
return count($matches[0]);
}
A "word" is anything that contains one or more of:
Any alphabetic letter
Any digit
Any hyphen/dash
This would mean that the following contains 5 "words" (4 normal, 1 hyphenated):
echo mb_count_words('Hello, chào buổi sáng, chào-sáng');
Now, this function is not well suited for very large texts; though it should be able to handle most of what counts as a block of text on the internet. This is because preg_match_all needs to build and populate a big array only to throw it away once counted (it is very inefficient). A more efficient way of counting would be to go through the text character by character, identifying unicode whitespace sequences, and incrementing an auxiliary variable. It would not be that difficult, but it is tedious and takes time.
You may use this function to count unicode words in given string:
function count_unicode_words( $unicode_string ){
// First remove all the punctuation marks & digits
$unicode_string = preg_replace('/[[:punct:][:digit:]]/', '', $unicode_string);
// Now replace all the whitespaces (tabs, new lines, multiple spaces) by single space
$unicode_string = preg_replace('/[[:space:]]/', ' ', $unicode_string);
// The words are now separated by single spaces and can be splitted to an array
// I have included \n\r\t here as well, but only space will also suffice
$words_array = preg_split( "/[\n\r\t ]+/", $unicode_string, 0, PREG_SPLIT_NO_EMPTY );
// Now we can get the word count by counting array elments
return count($words_array);
}
All credits go to the author.
I'm using this code to count word. You can try this
$s = 'Hello, chào buổi sáng';
$s1 = array_map('trim', explode(' ', $s));
$s2 = array_filter($s1, function($value) { return $value !== ''; });
echo count($s2);

PHP Find all occurences of a whole word from a text

Here is my problem, I have a text in PHP :
$text = "Car is going with 10 meters/second"
$find = array("meters","meters/second");
now when I do this :
foreach ($find as $f)
{
$count = substr_count($text,$f);
}
The output is :
meters -> 1
meters/second -> 1
Normally I consider meters/second as a whole word, so meters shouldn't be counted, only meters/second since no space seperates them
Thus What I Expect :
meters -> 0
meters/second -> 1
You can do it with a regular expression, \b won't work because / is a word boundary, but something like that should work:
preg_match_all(",meters([^/]|$),", $text, $matches);
print_r($matches[0]);
$exists = preg_match("/\bmeters\b/", $text) ;
\b stands for word boundary.
To do what you want, you will have to use regular expressions. Something like:
$text = "Car is going with 10 meters/second";
$find = array("/\bmeters\b/", "/\bmeters\/second\b/");
foreach($find as $f) {
print(preg_match_all($f, $text));
}

PHP string letters

We have a variable $string, its contains some text like:
About 200 million CAPTCHAs are solved by humans around the world every day.
How can we get 2-3 last or first letters of each word (which length is more than 3 letters)?
Will check them for matched text with foreach():
if ('ey' is matched in the end of some word) {
replace 'ey' with 'ei' in this word;
}
Thanks.
First, I'll give you an example of how to loop through a string and work with each word in the string.
Second, I'll explain each part of the code so that you can modify it to your exact needs.
Here is how to switch out the last 2 letters (if they are "ey") of each word that is more than 3 letters long.
<?php
// Example string
$string = 'Hey they ey shay play stay nowhey';
// Create array of words splitting at spaces
$string = explode(" ", $string);
// The search and replace strings
$lookFor = "ey";
$switchTo = "ei";
// Cycle through the words
foreach($string as $key => $word)
{
// If the word has more than 3 letters
if(strlen($word) > 3)
{
// If the last two letters are what we want
if ( substr($word, -2) == $lookFor )
{
// Replace the last 2 letters of the word
$string[$key] = substr_replace($word, $switchTo, -2);
}
}
}
// Recreate string from array
$string = implode(" ", $string);
// See what we got
echo $string;
// The above will print:
// Hey thei ey sashei play nowhei
?>
Live example
I'll explain each function so that you can modify the above to exactly how you want it, since I don't precisely understand all your specifications:
explode() will take a string and split it apart into an array. The first argument is what you use to split it. The second argument is the string, so explode(" ", $string) will split $string by the use of spaces. The spaces will not be included in the array.
foreach() will cycle through each element of an array. foreach($string as $key => $word) will go through each element of $string and for each element it will assign the index number to $key and the value of the element (the word in this case) to $word.
strlen() returns how long a string is.
substr() returns a portion of a string. The first argument is the string, the second argument is where the substring starts, and a third optional argument is the length of the substring. With a negative start, the start will be calculated from the end of the string to the end of the string. In other words, substr($word, -2) returns the substring that begins two from the end of the string and goes to the end of the string.... the last two letters. If you want the first two letters, you would use substr($word, 0, 2), since you're starting at the very beginning and want a length of 2 letters.
substr_replace() will replace a substring within a string. The first argument is the entire string. The second argument is your replacement substring. The third argument is where the replacement starts, and the fourth optional argument is the length of the substring, so substr_replace($word, $switchTo, -2) will take $word and starting at the penultimate letter, replace what's there with $switchTo. In this case, we'll switch out the last two letter. If you want to replace the first two letters, you would use substr_replace($word, $switchTo, 0, 2)
implode() is the opposite of explode. It takes an array and forms it into a string using the separator specified.
$string = 'About 200 million CAPTCHAs are solved by humans around the world every day.';
$result = array();
$words = explode(" ",$string);
foreach($words as $word){
if(strlen($word) > 3){
$result[] = substr($word,0,3); //first 3 characters, use "-3" for second paramter if you want last three
}
}
function get_symbols($str, $reverse = false)
{
$symbols = array();
foreach (explode(' ', $str) as $word)
{
if ($reverse)
$word = strrev($word);
if (strlen($word) > 3)
$word = substr($word, 0, 3);
array_push($symbols, $word);
}
return $symbols;
}
EDIT:
function change_reverse_symbol_in_word($str, $symbol, $replace_to)
{
$result = "";
foreach (explode(' ', $str) as $word)
{
$rword = $word;
if (strlen($rword) > 3)
{
$rword = substr($word, 0, -3);
}
if (!strcmp($symbol, $rword))
{
$word = substr($word, 0, strlen($word) - strlen($rword)) . $replace_to;
}
$result .= $word . " ";
}
return $result;
}
And if you want to use this like a your question you must call this like that:
$string_malformed = change_reverse_symbol_in_word($str, "ey", "ei");

Categories