Basically, I just wondering if exists a function like this:
$string = 'helloWorld';
// 1 uppercase, 1 lower case, 1 number and at least 8 of length
$regex = '/^\S*(?=\S{8,})(?=\S*[a-z])(?=\S*[A-Z])(?=\S*[\d])\S*$/'
$percent = matchPercent($string, $regex);
echo "the string match {$percent}% of the given regex";
Then, the result could be something like this:
the string match 75% of the given regex
Seeing another post and question, I can do somehitng like this:
$uppercase = preg_match('#[A-Z]#', $password);
$lowercase = preg_match('#[a-z]#', $password);
$number = preg_match('#[0-9]#', $password);
But, the goal is to work with any regex pattern at the function
If you want to do it the regex way and based on the use-case you've provided, we need to make the whole regex optional. Also we'll be using capturing groups in our lookaheads.
But first things first, let's improve your regex:
[\d] is redundant, just use \d.
\S*(?=\S{8,}) remove \S* part, we already have it at the end.
Our regex will look like ^(?=\S{8,})(?=\S*[a-z])(?=\S*[A-Z])(?=\S*\d)\S*$
Now is the tricky part, we will add groups in our lookaheads and make them optional:
^(?=(\S{8,})?)(?=(\S*[a-z])?)(?=(\S*[A-Z])?)(?=(\S*\d)?)\S*$
You might ask why? The groups are made so that we can track them later on. We make them optional so that our regex will always match. That way, we can do some math!
$regex = '~^(?=(\S{8,})?)(?=(\S*[a-z])?)(?=(\S*[A-Z])?)(?=(\S*\d)?)\S*$~';
$input = 'helloWorld';
preg_match_all($regex, $input, $m);
array_shift($m); // Get rid of group 0
for($i = 0, $j = $k = count($m); $i < $j; $i++){ // Looping
if(empty($m[$i][0])){ // If there was no match for that particular group
$k--;
}
}
$percentage = round(($k / $j) * 100);
echo $percentage;
Online php demo
EDIT: I see that Hamza had pretty much the same idea.
Sure! That's a really fun question.
Here is a solution for a simplified validation regex.
$str = 'helloword';
$regex = '~^(?=(\S{8,}))?(?=(\S*[a-z]))?(?=(\S*[A-Z]))?(?=(\S*[\d]))?.*$~';
if(preg_match($regex,$str,$m)) {
$totaltests = 4;
$passedtests = count(array_filter($m)) -1 ;
echo $passedtests / $totaltests;
}
Output: 0.5
How does it work?
For each condition (expressed by a lookahead), we capture the text that can be matched.
We define $totaltests as the total number of tests
We count the number of tests passed with count(array_filter($m)) -1 which removes the empty groups and Group 0, i.e. the overall match.
We divide.
Related
I want to take a post description but only display the first, for example, 30 letters but ignore any tabs and spaces.
$msg = 'I only need the first, let us just say, 30 characters; for the time being.';
$msg .= ' Now I need to remove the spaces out of the checking.';
$amount = 30;
// if tabs or spaces exist, alter the amount
if(preg_match("/\s/", $msg)) {
$stripped_amount = strlen(str_replace(' ', '', $msg));
$amount = $amount + (strlen($msg) - $stripped_amount);
}
echo substr($msg, 0, $amount);
echo '<br /> <br />';
echo substr(str_replace(' ', '', $msg), 0, 30);
The first output gives me 'I only need the first, let us just say, 30 characters;' and the second output gives me: Ionlyneedthefirst,letusjustsay so I know this isn't working as expected.
My desired output in this case would be:
I only need the first, let us just say
Thanks in advance, my maths sucks.
You could get the part with the first 30 characters with a regular expression:
$msg_short = preg_replace('/^((\s*\S\s*){0,30}).*/s', '$1', $msg);
With the given $msg value, you will get in $msg_short:
I only need the first, let us just say
Explanation of the regular expression
^: match must start at the beginning of the string
\s*\S\s* a non-white-space (\S) surrounded by zero or more white-space characters (\s*)
(\s*\S\s*){0,30} repeat finding this sequence up to 30 times (greedy; get as many as possible within that limit)
((\s*\S\s*){0,30}) the parentheses make this series of characters group number 1, which can be referenced as $1
.* any other characters. This will match all remaining characters, because of the s modifier at the end:
s: makes the dot match new line characters as well
In the replacement only the characters are maintained that belong to group one ($1). All the rest is ignored and not included in the returned string.
Spontaneously, there are two ways to achieve that I can think of.
The first one is close to what you did already. Take the first 30 characters, count the spaces and take as many next characters as you found spaces until the new set of letters has no spaces in it anymore.
$msg = 'I only need the first, let us just say, 30 characters; for the time being.';
$msg .= ' Now I need to remove the spaces out of the checking.';
$amount = 30;
$offset = 0;
$final_string = '';
while ($amount > 0) {
$tmp_string = substr($msg, $offset, $amount);
$amount -= strlen(str_replace(' ', '', $tmp_string));
$offset += strlen($tmp_string);
$final_string .= $tmp_string;
}
print $final_string;
The second technique would be to explode your string at spaces and put them back together one by one until you hit your threshold (where you would eventually need to break down a single word into characters).
Try this out if it works:
<?php
$string= 'I only need the first, let us just say, 30 characters; for the time being.';
echo "Everything: ".strlen($string);
echo '<br />';
echo "Only alphabetical: ".strlen(preg_replace('/[^a-zA-Z]/', '', $string));
?>
It can be done this way.
$tmp=str_split($string);//split the string
$result="";
$i=0;$j=0;
while(isset($tmp[$i]) && $j<30){
if(trim($tmp[$i])){//test for non space and count
$j++;
}
$result .= $tmp[$i++];
}
print $result;
I don't know regex too well so...
<?php
$msg = 'I only need the first, let us just say, 30 characters; for the time being. Now I need to remove the spaces out of the checking.';
$non_space_hit = 0;
for($i = 0; $i < strlen($msg); ++$i)
{
echo $msg[$i];
$non_space_hit+= (int)($msg[$i] !== ' ' && $msg[$i] !== "\t");
if($non_space_hit === 30)
{
break;
}
}
You end up with:
I only need the first, let us just say
I have a string, something like this:
$str ="it is a test string.";
// for more clarification
i t i s a t e s t s t r i n g .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Now I need to check all characters that are multiples of 4 (plus first character). like these:
1 => i
4 => i
8 => [space]
12 => t
16 => r
20 => .
Now, I need to compare them with Y (Y is a variable (symbol), for example Y = 'r' in here). So I want to replace Y with X (X is a variable (symbol) too, for example X = 'm' in here).
So, I want this output:
it is a test stming.
Here is my solution: I can do that using some PHP function:
strlen($str): to count the number of characters (named $sum)
$sum / 4: To find characters that are multiples of 4
substr($str, 4,1): to select specific character (named $char) {the problem is here}
if ($char == 'r') {}: to compare
str_replace('r','m',$char): to replace
And then combining all $char to each other.
But my solution has two problem:
substr() does not count [space] character (As I mentioned above)
combining characters is complicated a bit. (It needs to some waste processing)
Well, is there any solution? I like to do that using REGEX, Is it possible?
Could just use a simple regex with callback (add u flag if utf-8, s for . to match newline).
$str = preg_replace_callback(['/^./', '/.{3}\K./'], function ($m) {
return $m[0] == "r" ? "m" : $m[0];
}, $str); echo $str;
See this demo at tio.run > it is a test stming.
1st pattern: ^. any first character
2nd pattern: \K resets after .{3} any three characters, only want to check the fourth .
For use with anonymous function PHP >= 5.3 is required. Here is the workaround (demo).
Update: #Mariano demonstrated in his very nice answer that it is even with a single regex replacement possible. Thank you for the benchmark that reveals a rather bad performance for the preg_replace_callback solution. A more efficient variant without callback (but still two patterns).
$str = preg_replace(['/^r/', '/(?:...[^r])*...\Kr/'], 'm', $str);
I also included #revo's answer from 2017 in Mariano's benchmark and ran it on tio.run (100k loops). With newer PHP and PCRE2 the numbers seem to have changed slightly, "no regex" leads at tio.run.
In .NET or modern browser JS regex it also could be done like this by a variable length lookbehind.
If all characters in your string are in single byte, you can use something from PHP's official language reference:
$str ="it is a test string.";
$y="r";
$x="m";
$len=strlen($str);
if($str[0]==$y)
{
$str=substr_replace($str,$x,0,1);
}
if($len>=3)
{
for($i=3;$i<$len;$i+=4)
{
if($str[$i]==$y)
{
$str=substr_replace($str,$x,$i,1);
}
}
}
var_dump($str);
3v4l demo
Outputs it is a test stming.
Edit:
As #Don'tPanic points out, String is mutable using [] operator, so instead of using
$str=substr_replace($str,$x,$i,1);
you can just use
$str[$i]=$x;
This is an alternative using preg_replace()
$y = 'r';
$y = preg_quote($y, '/');
$x = 'M';
$x = preg_quote($x, '/');
$subject = 'rrrrrr rrrrr rrrrrr rrrr rrrr.';
$regex = "/\\G(?:^|(?(?<!^.).)..(?:.{4})*?)\\K$y/s";
$result = preg_replace($regex, $x, $subject);
echo $result;
// => MrrMrr MrrrM rrMrrr rrrM rrMr.
ideone demo
Regex:
\G(?:^|(?(?<!^.).)..(?:.{4})*?)\Km
\G is an assertion to the end of last match (or start of string)
(?:^|(?(?<!^.).)..(?:.{4})*?) matches:
^ start of string, to check at position 1
(?(?<!^.).) is an if clause that yields:
..(?:.{4})*?) 2 chars + a multiple of 4 if it has just replaced at position 1
...(?:.{4})*?) 3 chars + a multiple of 4 for successive matches
\K resets the text matched to avoid using backreferences
I must say though, regex is an overkill for this task. This code is counterintuitive and a typical regex that proves difficult to understand/debug/maintain.
EDIT. There was a later discussion about performance vs. code readability, so I did a benchmark to compare:
RegEx with a callback (#bobblebubble's answer).
RegEx with 2 replacements in an array (#bobblebubble's suggestion in comment).
No RegEx with substr_replace (#Passerby's answer).
Pure RegEx (this answer).
Result:
Code #1(with_callback): 0.548 secs/50k loops
Code #2(regex_array): 0.158 secs/50k loops
Code #3(no_regex): 0.120 secs/50k loops
Code #4(pure_regex): 0.118 secs/50k loops
Benchmark in ideone.com
Try this
$str ="it is a test string.";
$y="r";
$x="m";
$splite_array = str_split($str);
foreach ($splite_array as $key => $val)
{
if($key % 4 == 0 && $val == $y)
{
$splite_array[$key] = $x;
}
}
$yout_new_string = implode($splite_array);
This piece of code could help you on your way:
// Define variables
$string = "it is a test string.";
$y = 'r';
$x = 'm';
// Convert string to array
$chars = explode('', $string);
// Loop through all characters
foreach ($chars as $key => $char) {
// Array keys start at 0, so we add 1
$keyCount = $key+1;
// Check if deviding the key by 4 doesn't have rest value
// This means it is devisable by 4
if ($keyCount % 4 == 0 && $value == $y) {
$chars[$key] = $x;
}
}
// Convert back to string
$string = implode($chars);
Here is one other way to do this using string access and modification by character. (Consequently, it is only useful for single-byte encoded strings.)
// First character handled outside the loop because its index doesn't match the pattern
if ($str[0] == $y) $str[0] = $x;
// access every fourth character
for ($i=3; isset($str[$i]) ; $i+=4) {
// change it if it needs to be changed
if ($str[$i] == $y) $str[$i] = $x;
}
This modifies the original string rather than creating a new string, so if that shouldn't happen, it should be used on a copy.
Late to the party, puting aside \G anchor, I'd go with (*SKIP)(*F) method:
$str = "it is a test string.";
echo preg_replace(['~\Ar~', '~.{3}\K(?>r|.(*SKIP)(?!))~'], 'm', $str);
Short and clean.
PHP live demo
I am trying to remove the word "John" a certain number of times from a string. I read on the php manual that str_replace excepts a 4th parameter called "count". So I figured that can be used to specify how many instances of the search should be removed. But that doesn't seem to be the case since the following:
$string = 'Hello John, how are you John. John are you happy with your life John?';
$numberOfInstances = 2;
echo str_replace('John', 'dude', $string, $numberOfInstances);
replaces all instances of the word "John" with "dude" instead of doing it just twice and leaving the other two Johns alone.
For my purposes it doesn't matter which order the replacement happens in, for example the first 2 instances can be replaced, or the last two or a combination, the order of the replacement doesn't matter.
So is there a way to use str_replace() in this way or is there another built in (non-regex) function that can achieve what I'm looking for?
As Artelius explains, the last parameter to str_replace() is set by the function. There's no parameter that allows you to limit the number of replacements.
Only preg_replace() features such a parameter:
echo preg_replace('/John/', 'dude', $string, $numberOfInstances);
That is as simple as it gets, and I suggest using it because its performance hit is way too tiny compared to the tedium of the following non-regex solution:
$len = strlen('John');
while ($numberOfInstances-- > 0 && ($pos = strpos($string, 'John')) !== false)
$string = substr_replace($string, 'dude', $pos, $len);
echo $string;
You can choose either solution though, both work as you intend.
You've misunderstood the wording of the manual.
If passed, this will be set to the number of replacements performed.
The parameter is passed by reference and its value is changed by the function to indicate how many times the string was found and replaced. Its initial value is discarded.
There are a few things you could do to achieve this, but I can't think of one specific php function that will easily let you do this.
One option is to create your own replace function and utilize strripos and substr to do the replaces.
Another thing you can do is use preg_replace_callback and count the number of replacements you have done in the callback.
There's probably more ways but that's all I can think of on the fly. If performance is an issue I suggest you give both a try and do some simple benchmarks.
The cleanest, most-direct, single function call is to use preg_replace(). Its replacement limiting parameter makes the task intuitive and readable.
$string = preg_replace('/John/', 'dude', $string, $numberOfInstances);
The function is also attractive because making the search case-insensitive is as simple as adding the i pattern modifier to the end of the pattern. I won't delve into the usefulness of word boundaries (\b).
If a search string might contain characters with special meaning to the regex engine, then preg_quote() will be necessary -- this diminishes the beauty of the technique but not prohibitively so.
$search = '$5.99';
$pattern = '/' . preg_quote($search, '/') . '/';
$string = preg_replace($pattern, 'free', $string, $numberOfInstances);
For anyone who has an unnatural bias against regex functions, this can be done without regex and without looping -- it will be case-sensitive though.
Limited Explode & Implode: (Demo)
$numberOfInstances = 2;
$string = 'Hello John, how are you John. John are you happy with your life John?';
// explode here -^^^^ and ---------^^^^ only to create the following array:
// 0 => 'Hello ',
// 1 => ', how are you ',
// 2 => '. John are you happy with your life John?'
echo implode('dude', explode('John', $string, $numberOfInstances + 1));
Output:
Hello dude, how are you dude. John are you happy with your life John?
Notice the explode's limiting parameter dictates how many elements are generated, not how many explosions are executed on the string.
function str_replace_occurrences($find, $replace, $string, $count = -1) {
// current occrurence
$current = 0;
// while any occurrence
while (($pos = strpos($string, $find)) != false) {
// update length of str (size of string is changing)
$len = strlen($find);
// found next one
$current++;
// check if we've reached our target
// -1 is used to replace all occurrence
if($current <= $count || $count == -1) {
// do replacement
$string = substr_replace($string, $replace, $pos, $len);
} else {
// we've reached our
break;
}
}
return $string;
}
Artelius has already described how the function works, ill just show you how to do this via the manual methods:
function str_replace_occurrences($find,$replace,$string,$count = 0)
{
if($count == 0)
{
return str_replace($find,$replace,$string);
}
$pos = 0;
$len = strlen($find);
while($pos < $count && false !== ($pos = strpos($string,$find,$pos)))
{
$string = substr_replace($string,$replace,$pos,$len);
}
return $string;
}
This is untested but should work.
I have a very big .txt file with our clients order and I need to move it in a mysql database . However I don't know what kind of regex to use as the information is not very different .
-----------------------
4046904
KKKKKKKKKKK
Laura Meyer
MassMutual Life Insurance
153 Vadnais Street
Chicopee, MA 01020
US
413-744-5452
lmeyer#massmutual.co...
KKKKKKKKKKK
373074210772222 02/12 6213 NA
-----------------------
4046907
KKKKKKKKKKK
Venkat Talladivedula
6105 West 68th Street
Tulsa, OK 74131
US
9184472611
venkat.talladivedula...
KKKKKKKKKKK
373022121440000 06/11 9344 NA
-----------------------
I tried something but I couldn't even extract the name ... here is a sample of my effort with no success
$htmlContent = file_get_contents("orders.txt");
//print_r($htmlContent);
$pattern = "/KKKKKKKKKKK(.*)\n/s";
preg_match_all($pattern, $htmlContent, $matches);
print_r($matches);
$name = $matches[1][0];
echo $name;
You may want to avoid regexes for something like this. Since the data is clearly organized by line, you could repeatedly read lines with fgets() and parse the data that way.
You could read this file with regex, but it may be quite complicated create a regex that could read all fields.
I recommend that you read this file line by line, and parse each one, detecting which kind of data it contains.
As you know exactly where your data is (i.e. which line its on) why not just get it that way?
i.e. something like
$htmlContent = file_get_contents("orders.txt");
$arrayofclients = explode("-----------------------",$htmlContent);
$newlinesep = "\r\n";
for($i = 0;i < count($arrayofclients);$i++)
{
$temp = explode($newlinesep,$arrayofclients[i]);
$idnum = $temp[0];
$name = $temp[4];
$houseandstreet = $temp[6];
//etc
}
or simply read the file line by line using fgets() - something like:
$i = 0;$j = 0;
$file = fopen("orders.txt","r");
$clients = [];
while ($line = fgets($ffile) )
{
if(line != false)
{
$i++;
switch($i)
{
case 2:
$clients[$j]["idnum"] = $line;
break;
case 6:
$clients[$j]["name"] = $line;
break;
//add more cases here for each line up to:
case 18:
$j++;
$i = 0;
break;
//there are 18 lines per client if i counted right, so increment $j and reset $i.
}
}
}
fclose ($f);
You could use regex's, but they are a bit awkward for this situation.
Nico
For the record, here is the regex that will capture the names for you. (Granted speed very well may be an issue.)
(?<=K{10}\s{2})\K[^\r\n]++(?!\s{2}-)
Explanation:
(?<=K{10}\s{2}) #Positive lookbehind for KKKKKKKKKK then 2 return/newline characters
\K[^\r\n]++ #Greedily match 1 or more non-return/newline characters
(?!\s{2}-) #Negative lookahead for return/newline character then dash
Here is a Regex Demo.
You will notice that my regex pattern changes slightly between the Regex Demo and my PHP Demo. Slight tweaking depending on environment may be required to match the return / newline characters.
Here is the php implementation (Demo):
if(preg_match_all("/(?<=K{10}\s{2})\K[^\r\n]++(?!\s{2}-)/",$htmlContent,$matches)){
var_export($matches[0]);
}else{
echo "no matches";
}
By using \K in my pattern I avoid actually having to capture with parentheses. This cuts down array size by 50% and is a useful trick for many projects. The \K basically says "start the fullstring match from this point", so the matches go in the first subarray (fullstrings, key=0) of $matches instead of generating a fullstring match in 0 and the capture in 1.
Output:
array (
0 => 'Laura Meyer',
1 => 'Venkat Talladivedula',
)
those reqular expressions drive me crazy. I'm stuck with this one:
test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not
Task:
Remove all [[ and ]] and if there is an option split choose the later one so output should be:
test1:link test2:silver test3:out1insideout2 test4:this|not
I came up with (PHP)
$text = preg_replace("/\\[\\[|\\]\\]/",'',$text); // remove [[ or ]]
this works for part1 of the task. but before that I think I should do the option split, my best solution:
$text = preg_replace("/\\[\\[(.*\|)(.*?)\\]\\]/",'$2',$text);
Result:
test1:silver test3:[[out1[[inside]]out2]] this|not
I'm stuck. may someone with some free minutes help me? Thanks!
I think the easiest way to do this would be multiple passes. Use a regular expression like:
\[\[(?:[^\[\]]*\|)?([^\[\]]+)\]\]
This will replace option strings to give you the last option from the group. If you run it repeatedly until it no longer matches, you should get the right result (the first pass will replace [[out1[[inside]]out2]] with [[out1insideout2]] and the second will ditch the brackets.
Edit 1: By way of explanation,
\[\[ # Opening [[
(?: # A non-matching group (we don't want this bit)
[^\[\]] # Non-bracket characters
* # Zero or more of anything but [
\| # A literal '|' character representing the end of the discarded options
)? # This group is optional: if there is only one option, it won't be present
( # The group we're actually interested in ($1)
[^\[\]] # All the non-bracket characters
+ # Must be at least one
) # End of $1
\]\] # End of the grouping.
Edit 2: Changed expression to ignore ']' as well as '[' (it works a bit better like that).
Edit 3: There is no need to know the number of nested brackets as you can do something like:
$oldtext = "";
$newtext = $text;
while ($newtext != $oldtext)
{
$oldtext = $newtext;
$newtext = preg_replace(regexp,replace,$oldtext);
}
$text = $newtext;
Basically, this keeps running the regular expression replace until the output is the same as the input.
Note that I don't know PHP, so there are probably syntax errors in the above.
This is impossible to do in one regular expression since you want to keep content in multiple "hierarchies" of the content. It would be possible otherwise, using a recursive regular expression.
Anyways, here's the simplest, most greedy regular expression I can think of. It should only replace if the content matches your exact requirements.
You will need to escape all backslashes when putting it into a string (\ becomes \\.)
\[\[((?:[^][|]+|(?!\[\[|]])[^|])++\|?)*]]
As others have already explained, you use this with multiple passes. Keep looping while there are matches, performing replacement (only keeping match group 1.)
Difference from other regular expressions here is that it will allow you to have single brackets in the content, without breaking:
test1:[[link]] test2:[[gold|si[lv]er]]
test3:[[out1[[in[si]de]]out2]] test4:this|not
becomes
test1:[[link]] test2:si[lv]er
test3:out1in[si]deout2 test4:this|not
Why try to do it all in one go. Remove the [[]] first and then deal with options, do it in two lines of code.
When trying to get something going favour clarity and simplicity.
Seems like you have all the pieces.
Why not just simply remove any brackets that are left?
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$str = preg_replace('/\\[\\[(?:[^|\\]]+\\|)+([^\\]]+)\\]\\]/', '$1', $str);
$str = str_replace(array('[', ']'), '', $str);
Well, I didn't stick to just regex, because I'm of a mind that trying to do stuff like this with one big regex leads you to the old joke about "Now you have two problems". However, give something like this a shot:
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not'; $reg = '/(.*?):(.*?)( |$)/';
preg_match_all($reg, $str, $m);
foreach($m[2] as $pos => $match) {
if (strpos($match, '|') !== FALSE && strpos($match, '[[') !== FALSE ) {
$opt = explode('|', $match); $match = $opt[count($opt)-1];
}
$m[2][$pos] = str_replace(array('[', ']'),'', $match );
}
foreach($m[1] as $k=>$v) $result[$k] = $v.':'.$m[2][$k];
This is C# using only using non-escaped strings, hence you will have to double the backslashes in other languages.
String input = "test1:[[link]] " +
"test2:[[gold|silver]] " +
"test3:[[out1[[inside]]out2]] " +
"test4:this|not";
String step1 = Regex.Replace(input, #"\[\[([^|]+)\|([^\]]+)\]\]", #"[[$2]]");
String step2 = Regex.Replace(step1, #"\[\[|\]\]", String.Empty);
// Prints "test1:silver test3:out1insideout2 test4:this|not"
Console.WriteLine(step2);
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$s = preg_split("/\s+/",$str);
foreach ($s as $k=>$v){
$v = preg_replace("/\[\[|\]\]/","",$v);
$j = explode(":",$v);
$j[1]=preg_replace("/.*\|/","",$j[1]);
print implode(":",$j)."\n";
}