string manipulations in php - php

I have a function:
function Validate($name)
{
$rename = 'Rename' .$name;
if (strlen($rename) > 50) {
$rename = substr($rename, 0, 48) . '..';
}
return $rename;
}
The function is called as follows:
$data['name'] = Validate($duplicate->name."_").$i++;
If name is 50 characters then it is cut-shortened to 48 characters and extra .. at the end . In case if the name is 50characters ending with .._somedigits.
I would like to do $rename = substr($rename, 0, 45) . '..'.$suffix; I would like to have this extra check with sffix
Any help would be appreciated.

You are concatenating $rename = 'Rename' .$name; and then you get the strlen based on that $rename. So that adds "Rename" to the length of the string.
If that is you intention, you could use a regex _\d+$ to check if the string ends with and underscore and one or more digits.
function Validate($name)
{
$suffix = "suffix";
$rename = 'Rename' . $name;
$re = '/_\d+$/';
if (strlen($rename) > 50) {
if (preg_match($re, $rename)) {
return substr($rename, 0, 45) . '..' . $suffix;
}
return substr($rename, 0, 48) . '..';
}
return $rename;
}
Demo

You can use the regex ^.*_0$
If this regex matches then you can make substring to 45 characters.
Reg

$strings = [
'abcdefghijklmnopqrstuvwxyz',
'abcdefghijklmnopqrstuvwxyz_12',
'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx',
'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz',
'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz_12',
];
foreach($strings as $str) {
if (preg_match('/_\d+$/', $str)) {
echo preg_replace('/^(?=.{51}).{45}\K.+(_\d+)$/', '..$1', $str),"\n";
} else {
echo preg_replace('/^(?=.{51}).{48}\K.+$/', '..', $str),"\n";
}
}
Output:
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz_12
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuv..
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrs.._12
First regex explain:
^ : beginning of string
(?=.{51}) : lookahead, 0-length assertion that makes sure we have, at least, 51 characters
.{45} : exactly 45 characters
\K : forget all we have seen until this position
.+ : 1 or more any character
(_\d+) : group 1, underscore and last digits
$ : end of line
Second regex explain:
^ : beginning of string
(?=.{51}) : lookahead, 0-length assertion that makes sure we have, at least, 51 characters
.{48} : exactly 48 characters
\K : forget all we have seen until this position
.+ : 1 or more any character
$ : end of line

Related

PHP regex to check if a string ends with a space and 15 other characters

I'm looking for a PHP regex to check if a string ends with a string that
starts with a space
then 15 characters which contain only 0-9 and a-f characters (lower case)
Match:
$myString = "something here 62ffe537a66ddcf"; // space after "here" and then 15 characters
No Matches:
$myString1 = "something here62ffe537a66ddcf"; // space missing before the 6
$myString2 = "something here 62ffe537a66ddc"; // only 15 characters (including the space)
$myString3 = "something here 62ffe537A66ddC"; // contains upper case characters
My attempt. There might be a shorter way?
$myString = "something here 62ffe53766ddcf"; // space after "here" and then 15 characters
if (stringEndsWithId($myString)) {
echo "string ends with id";
}
else {
echo "string does not end with id";
}
function stringEndsWithId($str) {
if (str_starts_with(right($str, 16) , ' ')) {
return preg_match('/[^a-f0-9]/', $str);
}
return false;
}
function right($string, $count) {
return substr($string, strlen($string) - $count, $count);
}
You can use [0-9a-f] to match any of the 0-9 and a-f chars, followed by {15} to match exactly 15 of them, followed by $ to match the end of the string.
function stringEndsWithId($str) {
return ( preg_match('/ [0-9a-f]{15}$/', $str) );
}

How to use a regexp to match dashes and ohter specific chars anywhere in thre string that contains numbers?

I need to write a regexp that matches a string that has only numbers in it and these numbers can be divided by a comma or dash or underline or slash/backslash.
For example:
$reg = '/^\d+$/';
$phoneWithDashes = '343-1431-4412';
$phoneWithoutDashes = '34314314412';
echo preg_match($reg, $phoneWithDashes); // 0
echo preg_match($reg, $phoneWithoutDashes); // 1
How do I tell to this regexp '/^\d+$/' that I also want to match if there are dashes anywhere in the string?
Since dashes can appear anywhere (between 2 digits), I would split on the dashes then check each string individually. Let's see how that translates to PHP code.
function match_phone($phone) {
$arr = preg_split('/[\/\\-_,]/', $phone);
$reg = '/^\d+$/';
foreach ($arr as $str) {
if (!preg_match($reg, $str)) {
return 0;
}
}
return 1;
}
echo match_phone('343-1431-4412/7'); // 1
echo match_phone('343143144127'); // 1
echo match_phone('1234-illegal'); // 0
echo match_phone('11--22'); // 0

php regex to separate out characters stuck to left, right or in the middle

Looking for a php regex that will allow me to separate out certain characters from words (if they're sticking to the left or right of the word, or even anywhere within the word).
For example,
hello. -> hello .
.hello -> . hello
hello.hello -> hello . hello
I have the below code but it won't work for all cases. Please note that $value could be '.', '?', or any character.
$regex = "/(?<=\S)\\" . $value . "|\\" . $value . "(?=\S)/";
$this->str = preg_replace_callback($regex, function($word) {
return ' ' . $word[0];
}, $this->str);
Also, please help with specifying the part where I can turn on (or off) the 3rd condition.
[UPDATE]
I think there might be confusion about exact requirements. Let me try to be more specific. I want a regex which will help me seperate out certain characters which are either at the end or the beginning of a group of text. What is group of text? Group of text could be any length (>=1) and contain any characters however it must begin with a-z or 0-9. Again, would be nice if this aspect would be highlighted in solution so that if we want group of text to begin&end with more characters (not just a-z or 0-9) it's possible.
$character = '.', string is ".hello.world." => ". hello.world ."
$character = '.', string is ".1ello.worl2." => ". 1ello.worl2 ."
$character = '.', string is ".?1ello.worl2." => ".?1ello.worl2 ."
$character = '.', string is "4/5.5" => "4/5.5"
$character = '.', string is "4.?1+/5" => "4.?1+/5"
$character = '.', string is ".4/5.5." => ". 4/5.5 ."
$character = '/', string is ".hello?.world/" => ".hello?.world /"
$character = '/', string is ".hello?.worl9/" => ".hello?.worl9 /"
Hope, its more clear now.
You can use 3 alternatives each captured into its own capture group, and use a preg_replace_callback to apply the corresponding replacement:
$wrd = ".";
$re = '~(?<=\S)(' . preg_quote($wrd) . ')(?=\S)|(?<=\S)(' . preg_quote($wrd) . ')|(' . preg_quote($wrd) . ')(?=\S)~';
$str = "hello.\n.hello\nhello.hello";
$result = preg_replace_callback($re, function($m) {
if (!empty($m[1])) {
return " " . $m[1] . " ";
} else if (!empty($m[2])) {
return " " . $m[2];
} else return $m[3] . " ";
}, $str);
echo $result;
See the IDEONE demo
The regex will be
(?<=\S)(\.)(?=\S)|(?<=\S)(\.)|(\.)(?=\S)
| 1| | 2| | 3|
See regex demo
The first group is your Case 3 (hello.hello -> hello . hello), the second group is your Case 1 (hello. -> hello .) and the third group singals your Case 2 (.hello -> . hello).
UPDATE (handling exceptions)
If you have exceptions, you can add more capturing groups. E.g., you want to protect the dot in float numbers. Add a (\d\.\d) alternative, and check inside the callback function if it is not empty. If not, just restore it with return $m[n]:
$wrd = ".";
$re = '~(\d\.\d)|(?<=\S)(' . preg_quote($wrd) . ')(?=\S)|(?<=\S)(' . preg_quote($wrd) . ')|(' . preg_quote($wrd) . ')(?=\S)~';
$str = "hello.\n.hello\nhello.hello\nhello. 3.5/5\nhello.3\na./b";
$result = preg_replace_callback($re, function($m) {
if ( !empty($m[1])) { // The dot as a decimal separator
return $m[1]; // No space is inserted
}
else if (!empty($m[2])) { // A special char is enclosed with non-spaces
return " " . $m[2] . " "; // Add spaces around
} else if (!empty($m[3])) { // A special char is enclosed with non-spaces
return " " . $m[3]; // Add a space before the special char
} else return $m[4] . " "; // A special char is followed with a non-space, add a space on the right
}, $str);
echo $result;
See an updated code demo
Another code demo - based on matching locations before and after the . that are not enclosed with spaces (and protecting a float value) (based on #bobblebubble's solution (deleted)):
$wrd = ".";
$re = '~(\d\.\d)|(?<!\s)(?=' . preg_quote($wrd) . ')|(?<=' . preg_quote($wrd) . ')(?!\s)~';
$str = "hello.\n.hello\nhello.hello\nhello. 3.5/5\nhello.3\na./b";
$result = preg_replace_callback($re, function($m) {
if ( !empty($m[1])) { // The dot as a decimal separator
return $m[1]; // No space is inserted
}
else return " "; // Just insert a space
}, $str);
echo $result;
SUMMARY:
You cannot use \b since your . / ? etc. can appear in mixed "word" and "non-word" contexts
You need to use capturing and preg_replace_callback since there are different replacement schemes
You can use a regex based on word boundaries.
\b(?=\.(?!\S))|(?<=(?<!\S)\.)\b
Would match the boundary (zero-width) between a word and a literal dot if not followed by a non-whitespace \S or not preceded by a non-whitespace using lookarounds to check.
See demo at regex101. Use in a PHP function with value parameter and replace with space.
// $v = character
function my_func($str, $v=".")
{
$v = preg_quote($v, '/');
return preg_replace('/\b(?='.$v.'(?!\S))|(?<=(?<!\S)'.$v.')\b/', " ", $str);
}
PHP demo at eval.in
From what I understand the . can be any non-word character. If that's the case, try this:
$patron = '/(\W+)/';
$this->str = trim(preg_replace($patron, ' $1 ', $this->str));
(\s?[.]\s?)
If you use the above regex, you can simply replace all the matches with " . "
How it works:
I used \s? to capture a leading and trailing whitespace, if there is any.
[.] is a char class, so you should add all of the "certain characters" you want to find.
A regex that catches the first 2 conditions and never the third is (\s[.]\s?|\s?[.]\s). (Again, you'll need to replace the capture with " . ", and also add your "certain characters" to the char classes.)
You can then choose which regex you will use.

match whole word only without regex

Since i cant use preg_match (UTF8 support is somehow broken, it works locally but breaks at production) i want to find another way to match word against blacklist. Problem is, i want to search a string for exact match only, not first occurrence of the string.
This is how i do it with preg_match
preg_match('/\b(badword)\b/', strtolower($string));
Example string:
$string = "This is a string containing badwords and one badword";
I want to only match the "badword" (at the end) and not "badwords".
strpos('badword', $string) matches the first one
Any ideas?
Assuming you could do some pre-processing, you could use replace all your punctuation marks with white spaces and put everything in lowercase and then either:
Use strpos with something like so strpos(' badword ', $string) in a while loop to keep on iterating through your entire document;
Split the string at white spaces and compare each word with a list of bad words you have.
So if you where trying the first option, it would something like so (untested pseudo code)
$documet = body of text to process . ' '
$document.replace('!##$%^&*(),./...', ' ')
$document.toLowerCase()
$arr_badWords = [...]
foreach($word in badwords)
{
$badwordIndex = strpos(' ' . $word . ' ', $document)
while(!badWordIndex)
{
//
$badwordIndex = strpos($word, $document)
}
}
EDIT: As per #jonhopkins suggestion, adding a white space at the end should cater for the scenario where there wanted word is at the end of the document and is not proceeded by a punctuation mark.
If you want to mimic the \b modifier of regex you can try something like this:
$offset = 0;
$word = 'badword';
$matched = array();
while(($pos = strpos($string, $word, $offset)) !== false) {
$leftBoundary = false;
// If is the first char, it has a boundary on the right
if ($pos === 0) {
$leftBoundary = true;
// Else, if it is on the middle of the string, we must check the previous char
} elseif ($pos > 0 && in_array($string[$pos-1], array(' ', '-',...)) {
$leftBoundary = true;
}
$rightBoundary = false;
// If is the last char, it has a boundary on the right
if ($pos === (strlen($string) - 1)) {
$rightBoundary = true;
// Else, if it is on the middle of the string, we must check the next char
} elseif ($pos < (strlen($string) - 1) && in_array($string[$pos+1], array(' ', '-',...)) {
$rightBoundary = true;
}
// If it has both boundaries, we add the index to the matched ones...
if ($leftBoundary && $rightBoundary) {
$matched[] = $pos;
}
$offset = $pos + strlen($word);
}
You can use strrpos() instead of strpos:
strrpos — Find the position of the last occurrence of a substring in a string
$string = "This is a string containing badwords and one badword";
var_dump(strrpos($string, 'badword'));
Output:
45
A simple way to use word boundaries with unicode properties:
preg_match('/(?:^|[^pL\pN_])(badword)(?:[^pL\pN_]|$)/u', $string);
In fact it's much more complicated, have a look at here.

regex replace for decimals

I'm trying build a regex that will replace any characters not of the format:
any number of digits, then optional (single decimal point, any number of digits)
i.e.
123 // 123
123.123 // 123.123
123.123.123a // 123.123123
123a.123 // 123.123
I am using ereg_replace in php and the closest to a working regex i have managed is
ereg_replace("[^.0-9]+", "", $data);
which is almost what i need (apart from it will allow any number of decimal points)
i.e.
123.123.123a // 123.123.123
my next attempt was
ereg_replace("[^0-9]+([^.]?[^0-9]+)?", "", $data);
which was meant to translate as
[^0-9]+ // any number of digits, followed by
( // start of optional segment
[^.]? // decimal point (0 or 1 times) followed by
[^0-9]+ // any number of digits
) // end of optional segment
? // optional segment to occur 0 or 1 times
but this just seems to allow any number of digits and nothing else.
Please help
Thanks
Try these steps:
remove any character except 0-9 and .
remove any . behind the first decimal point.
Here’s a implementation with regular expressions:
$str = preg_replace('/[^0-9.]+/', '', $str);
$str = preg_replace('/^([0-9]*\.)(.*)/e', '"$1".str_replace(".", "", "$2")', $str);
$val = floatval($str);
And another one with just one regular expression:
$str = preg_replace('/[^0-9.]+/', '', $str);
if (($pos = strpos($str, '.')) !== false) {
$str = substr($str, 0, $pos+1).str_replace('.', '', substr($str, $pos+1));
}
$val = floatval($str);
This should be faster, actually. And it is way more readable. ;-)
$s = preg_replace('/[^.0-9]/', '', '123.123a.123');
if (1 < substr_count($s, '.')) {
$a = explode('.', $s);
$s = array_shift($a) . '.' . implode('', $a);
}

Categories