I am having a difficult time getting my regular expression code to work properly in PHP. Here is my code:
$array = array(); // Used to satisfy the 3rd argument requirment of preg_match_all.
$regex = '/(012|345|678|987|654|321|123|456|789|876|543|210|234|567|765|432)/';
$subject = '123456';
echo preg_match_all($regex, $subject, $array).'<br />';
print_r($array);
When this code is ran it will output:
2
Array
(
[0] => Array
(
[0] => 123
[1] => 456
)
[1] => Array
(
[0] => 123
[1] => 456
)
)
What can I do so that it will match 123, 234, 345 and 456?
Thanks in advance!
Regex is not the right tool for this job (it's not going to return "sub-matches"). Simply use strpos in a loop.
$subject = '123456';
$seqs = array('012', '345', '678', '987', '654', '321', '123', '456', '234');
foreach ($seqs as $seq) {
if (strpos($subject, $seq) !== false) {
// found
}
}
$regex = '/(?=(012|345|678|987|654|321|123|456|789|876|543|210|234|567|765|432))/';
$subject = '123456';
preg_match_all($regex, $subject, $array);
print_r($array[1]);
output:
Array
(
[0] => 123
[1] => 234
[2] => 345
[3] => 456
)
You're trying to retrieve matches that overlap each other in the subject string, which in general is not possible. However, in many cases you can fake it by wrapping the whole regex in a capturing group, then wrapping that in a lookahead. Because the lookahead doesn't consume any characters when it matches, the regex engine manually bumps forward one position after each successful match, to avoid getting stuck in an infinite loop. But capturing groups still work, so you can retrieve the captured text in the usual way.
Notice that I only printed the contents of the first capturing group ($array[1]). If I had printed the whole array of arrays ($array), it would have looked like this:
Array
(
[0] => Array
(
[0] =>
[1] =>
[2] =>
[3] =>
)
[1] => Array
(
[0] => 123
[1] => 234
[2] => 345
[3] => 456
)
)
see it in action on ideone
It can be done with regular expressions. The problem with your original code is that as soon as a match occurs, the character is consumed and the regular expression will not backtrack. Here's one way to do it:
$array = array(); // Used to satisfy the 3rd argument requirment of preg_match_all.
$regex = '/012|345|678|987|654|321|123|456|789|876|543|210|234|567|765|432/';
$subject = '123456';
$tempSubject = $subject;
$finalAnswer = array();
do {
$matched = preg_match($regex, $tempSubject, $array);
$finalAnswer = array_merge($finalAnswer, $array);
$tempSubject = substr($tempSubject, 1);
} while ($matched && (strlen($tempSubject >= 3)));
print_r($finalAnswer);
As suggested in another answer, however, regular expressions might not be the correct tool to use in this situation, depending on your larger goal. In addition, the above code may not be the most efficient way (wrt memory or wrt performance) to solve this with regular expressions. It's just a striaghtforward fulfill-the-requirement solution.
Yeah it's a hack but you can use RegEx
<?php
$subject = '123456';
$rs = findmatches($subject);
echo '<pre>'.print_r($rs,true).'</pre><br />';
function findmatches($x) {
$regex = '/(\d{3})/';
// Loop through the subject string
for($counter = 0; $counter <= strlen($x); $counter++) {
$y = substr($x, $counter);
if(preg_match_all($regex, $y, $array)) {
$rs_array[$counter] = array_unique($array);
}
}
// Parse results array
foreach($rs_array as $tmp_arr) {
$rs[] = $tmp_arr[0][0];
}
return $rs;
}
?>
Returns:
Array
(
[0] => 123
[1] => 234
[2] => 345
[3] => 456
)
NOTE: This would only work with concurrent numbers
Related
Consider I have this string 'aaaabbbaaaaaabbbb' I want to convert this to array so that I get the following result
$array = [
'aaaa',
'bbb',
'aaaaaa',
'bbbb'
]
How to go about this in PHP?
PHP code demo
Regex: (.)\1{1,}
(.): Match and capture single character.
\1: This will contain first match
\1{1,}: Using matched character one or more times.
<?php
ini_set("display_errors", 1);
$string="aaaabbbaaaaaabbbb";
preg_match_all('/(.)\1{1,}/', $string,$matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => aaaa
[1] => bbb
[2] => aaaaaa
[3] => bbbb
)
[1] => Array
(
[0] => a
[1] => b
[2] => a
[3] => b
)
)
Or:
PHP code demo
<?php
$string="aaaabbbaaaaaabbbb";
$array=str_split($string);
$start=0;
$end= strlen($string);
$indexValue=$array[0];
$result=array();
$resultantArray=array();
while($start!=$end)
{
if($indexValue==$array[$start])
{
$result[]=$array[$start];
}
else
{
$resultantArray[]=implode("", $result);
$result=array();
$result[]=$indexValue=$array[$start];
}
$start++;
}
$resultantArray[]=implode("", $result);
print_r($resultantArray);
Output:
Array
(
[0] => aaaa
[1] => bbb
[2] => aaaaaa
[3] => bbbb
)
I have written a one-liner using only preg_split() that generates the expected result with no wasted memory (no array bloat):
Code (Demo):
$string = 'aaaabbbaaaaaabbbb';
var_export(preg_split('/(.)\1*\K/', $string, 0, PREG_SPLIT_NO_EMPTY));
Output:
array (
0 => 'aaaa',
1 => 'bbb',
2 => 'aaaaaa',
3 => 'bbbb',
)
Pattern:
(.) #match any single character
\1* #match the same character zero or more times
\K #keep what is matched so far out of the overall regex match
The real magic happens with the \K, for more reading go here.
The 0 parameter in preg_split() means "unlimited matches". This is the default behavior, but it needs to hold its place in the function so that the next parameter is used appropriately as a flag
The final parameter is PREG_SPLIT_NO_EMPTY which removes any empty matches.
Sahil's preg_match_all() method preg_match_all('/(.)\1{1,}/', $string,$matches); is a good attempt but it is not perfect for two reasons:
The first issue is that his use of preg_match_all() returns two subarrays which is double the necessary result.
The second issue is revealed when $string="abbbaaaaaabbbb";. His method will ignore the first lone character. Here is its output:
Array (
[0] => Array
(
[0] => bbb
[1] => aaaaaa
[2] => bbbb
)
[1] => Array
(
[0] => b
[1] => a
[2] => b
)
)
Sahil's second attempt produces the correct output, but requires much more code. A more concise non-regex solution could look like this:
$array = str_split($string);
$last = "";
foreach ($array as $v) {
if (!$last || strpos($last, $v) !== false) {
$last .= $v;
} else {
$result[] = $last;
$last = $v;
}
}
$result[] = $last;
var_export($result);
Is possible to capturing group under capturing group so i can have an array like that
regex = (asd1).(lol1),(asd2).(asd2)
string = asd1.lol1,asd2.lol2
return_array[0]=>group[0]='asd1';
return_array[0]=>group[1]='lol1';
return_array[1]=>group[0]='asd2';
return_array[1]=>group[1]='lol2';
While using regular expressions can get what you want, you could also use strtok() to iterate through what seems to simply be comma separated sets:
$results = array();
$str = 'asd1.lol1,asd2.lol2';
$token = strtok($str, ',');
while ($token !== false) {
$results[] = explode('.', $token, 2);
$token = strtok(',');
}
Output:
Array
(
[0] => Array
(
[0] => asd1
[1] => lol1
)
[1] => Array
(
[0] => asd2
[1] => lol2
)
)
With regular expressions your pattern needs to only include the two terms surrounding a period, i.e.:
$pattern = '/(?<=^|,)(\w+)\.(\w+)/';
preg_match_all($pattern, $str, $result, PREG_SET_ORDER);
The (?<=^|,) is a look-behind assertion; it makes sure to only match what comes after if preceded by either the start of your search string or a comma, but it doesn't "consume" anything.
Output:
Array
(
[0] => Array
(
[0] => asd1.lol1
[1] => asd1
[2] => lol1
)
[1] => Array
(
[0] => asd2.lol2
[1] => asd2
[2] => lol2
)
)
You're probably looking for preg_match_all.
$regex = '/^((\w+)\.(\w+)),((\w+)\.(\w+))$/';
$string = 'asd1.lol1,asd2.lol2';
preg_match_all($regex, $string, $matches);
This function will create a 2-dimensional array, where the first dimension represents the matched groups (i.e. the parentheses, 0 contains the whole matched string though) and each have subarrays to all the matched lines (only 1 in this case).
[0] => ("asd1.lol1,asd2.lol2") // a view of $matches
[1] => ("asd1.lol1")
[2] => ("asd1")
[3] => ("lol1")
[4] => ("asd2.lol2")
[5] => ("asd2")
[6] => ("lol2")
Your best bet to have groups is to process the first dimension of the array that you want and to then process them further, i.e. get "asd1.lol1" from 1 and 4 and then process these further into asd1 and lol1.
You wouldn't need as many parentheses in your first run:
$regex = '/^(\w+\.\w+),(\w+\.\w+)$/';
will yield:
[0] => ("asd1.lol1,asd2.lol2")
[1] => ("asd1.lol1")
[2] => ("asd2.lol2")
Then you can split the array in 1 and 2 into more granular values.
Flags can be set to preg_match_all to order the output differently. Particularly, PREG_SET_ORDER allows you to have all matched instances in the same subarray. This is of little importance if you're only processing one string, but if you're matching a pattern in a text, it might be more convenient to have all info about one match in $matches[0], and so forth.
Note that if you're just separating a string by comma and then by any periods, you might not need regular expressions and could conveniently use explode() as so:
$string = 'asd1.lol1,asd2.lol2';
$matches = explode(',', $string);
foreach($matches as &$match) {
$match = explode('.', $match);
}
This will give you exactly what you want, but do note that you don't have as much control over the process as with regular expressions – for instance, asd1.lol1.lmao,asd2.lol2.rofl.hehe will also work and they'll produce bigger arrays than you may want. You can check with count() on the size of the subarray and handle the cases when the array isn't of the appropriate size, though. I still believe that's more comfortable than using regular expressions.
I'm processing a single string which contains many pairs of data. Each pair is separated by a ; sign. Each pair contains a number and a string, separated by an = sign.
I thought it would be easy to process, but i've found that the string half of the pair can contain the = and ; sign, making simple splitting unreliable.
Here is an example of a problematic string:
123=one; two;45=three=four;6=five;
For this to be processed correctly I need to split it up into an array that looks like this:
'123', 'one; two'
'45', 'three=four'
'6', 'five'
I'm at a bit of dead end so any help is appreciated.
UPDATE:
Thanks to everyone for the help, this is where I am so far:
$input = '123=east; 456=west';
// split matches into array
preg_match_all('~(\d+)=(.*?);(?=\s*(?:\d|$))~', $input, $matches);
$newArray = array();
// extract the relevant data
for ($i = 0; $i < count($matches[2]); $i++) {
$type = $matches[2][$i];
$price = $matches[1][$i];
// add each key-value pair to the new array
$newArray[$i] = array(
'type' => "$type",
'price' => "$price"
);
}
Which outputs
Array
(
[0] => Array
(
[type] => east
[price] => 123
)
)
The second item is missing as it doesn't have a semicolon on the end, i'm not sure how to fix that.
I've now realised that the numeric part of the pair sometimes contains a decimal point, and that the last string pair does not have a semicolon after it. Any hints would be appreciated as i'm not having much luck.
Here is the updated string taking into account the things I missed in my initial question (sorry):
12.30=one; two;45=three=four;600.00=five
You need a look-ahead assertion for this; the look-ahead matches if a ; is followed by a digit or the end of your string:
$s = '12.30=one; two;45=three=four;600.00=five';
preg_match_all('/(\d+(?:.\d+)?)=(.+?)(?=(;\d|$))/', $s, $matches);
print_r(array_combine($matches[1], $matches[2]));
Output:
Array
(
[12.30] => one; two
[45] => three=four
[600.00] => five
)
I think this is the regex you want:
\s*(\d+)\s*=(.*?);(?=\s*(?:\d|$))
The trick is to consider only the semicolon that's followed by a digit as the end of a match. That's what the lookahead at the end is for.
You can see a detailed visualization on www.debuggex.com.
You can use following preg_match_all code to capture that:
$str = '123=one; two;45=three=four;6=five;';
if (preg_match_all('~(\d+)=(.+?);(?=\d|$)~', $str, $arr))
print_r($arr);
Live Demo: http://ideone.com/MG3BaO
$str = '123=one; two;45=three=four;6=five;';
preg_match_all('/(\d+)=([a-zA-z ;=]+)/', $str,$matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
o/p:
Array
(
[0] => Array
(
[0] => 123=one; two;
[1] => 45=three=four;
[2] => 6=five;
)
[1] => Array
(
[0] => 123
[1] => 45
[2] => 6
)
[2] => Array
(
[0] => one; two;
[1] => three=four;
[2] => five;
)
)
then y can combine
echo '<pre>';
print_r(array_combine($matches[1],$matches[2]));
echo '</pre>';
o/p:
Array
(
[123] => one; two;
[45] => three=four;
[6] => five;
)
Try this but this code is written in c#, you can change it into php
string[] res = Regex.Split("123=one; two;45=three=four;6=five;", #";(?=\d)");
--SJ
I am a bit confused on the use of preg_replace_callback()
I have a $content with some URLs inside .
Previously I used
$content = preg_match_all( '/(http[s]?:[^\s]*)/i', $content, $links );
foreach ($links[1] as $link ) {
// we have the link. find image , download, replace the content with image
// echo '</br>LINK : '. $link;
$url = esc_url_raw( $link );
$url_name = parse_url($url);
$url_name = $description = $url_name['host'];// get rid of http://..
$url = 'http://somescriptonsite/v1/' . urlencode($url) . '?w=' . $width ;
}
return $url;
But what I really need is to REPLACE the original URL with my parsed URL...
So I tried the preg_replace_callback:
function o99_simple_parse($content){
$content = preg_replace_callback( '/(http[s]?:[^\s]*)/i', 'o99_simple_callback', $content );
return $content;
}
and :
function o99_simple_callback($url){
// how to get the URL which is actually the match? and width ??
$url = esc_url_raw( $link );
$url_name = parse_url($url);
$url_name = $description = $url_name['host'];// get rid of http://..
$url = 'http://something' . urlencode($url) . '?w=' . $width ;
return $url; // what i really need to replace
}
I assumed that the callback will work in a way that EVERY match will call the callback (recursively ?) and get back results , thus allowing for to replace on-the-fly the URLS in $content with the parsed $url from o99_simple_callbac().
But another question here (and especially this comment) triggered my doubts .
If the preg_replace_callback() actually pass the whole array of matches , then what is actually the difference between what I used previously ( preg_match_all() in first example ) and the callback example ?
What am I missing / misunderstanding ??
What would be the correct way of replacing the URLS found in $content with the parsed urls ?
The other answers may have been sufficient, but let me give you one more take using a simpler example.
Let's say we have the following data in $subject,
RECORD Male 1987-11-29 New York
RECORD Female 1987-07-13 Tennessee
RECORD Female 1990-04-14 New York
and the following regular expression in $pattern,
/RECORD (Male|Female) (\d\d\d\d)-(\d\d)-(\d\d) ([\w ]+)/
Let's compare three approaches.
preg_match_all
First, the vanilla preg_match_all:
preg_match_all($pattern, $subject, $matches);
Here's what $matches comes out to be:
Array
(
[0] => Array
(
[0] => RECORD Male 1987-11-29 New York
[1] => RECORD Female 1987-07-13 Tennessee
[2] => RECORD Female 1990-04-14 New York
)
[1] => Array
(
[0] => Male
[1] => Female
[2] => Female
)
[2] => Array
(
[0] => 1987
[1] => 1987
[2] => 1990
)
[3] => Array
(
[0] => 11
[1] => 07
[2] => 04
)
[4] => Array
(
[0] => 29
[1] => 13
[2] => 14
)
[5] => Array
(
[0] => New York
[1] => Tennessee
[2] => New York
)
)
Whether we're talking about the gender field in my example with the URL field in your example, it's clear that looping through $matches[1] iterates through just that field:
foreach ($matches[1] as $match)
{
$gender = $match;
// ...
}
However, as you noticed, changes you make to $matches[1], even if you iterated through its subarrays by reference, do not reflect in $subject, i.e. you cannot perform replacements via preg_match_all.
preg_match_all with PREG_SET_ORDER
Before we jump into preg_replace_callback though, let's take a look at one of preg_match_all's commonly used flags, PREG_SET_ORDER.
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
This outputs something (seemingly) completely different!
Array
(
[0] => Array
(
[0] => RECORD Male 1987-11-29 New York
[1] => Male
[2] => 1987
[3] => 11
[4] => 29
[5] => New York
)
[1] => Array
(
[0] => RECORD Female 1987-07-13 Tennessee
[1] => Female
[2] => 1987
[3] => 07
[4] => 13
[5] => Tennessee
)
[2] => Array
(
[0] => RECORD Female 1990-04-14 New York
[1] => Female
[2] => 1990
[3] => 04
[4] => 14
[5] => New York
)
)
Now, each subarray contains the set of capture groups per match, as opposed to the set of matches, per capture group. (In yet other words, this is the transpose of the other array.) If you wanted to play with the gender (or URL) of each match, you'd now have to write this:
foreach ($matches as $match)
{
$gender = $match[1];
// ...
}
preg_replace_callback
And that's what preg_replace_callback is like. It calls the callback for each set of matches (that is, including all its capture groups, at once), as if you were using the PREG_SET_ORDER flag. That is, contrast the way preg_replace_callback is used,
preg_replace_callback($pattern, $subject, 'my_callback');
function my_callback($matches)
{
$gender = $match[1];
// ...
return $gender;
}
to the PREG_SET_ORDER example. Note how the two examples iterate through matches in exactly the same way, the only difference being that preg_replace_callback gives you an opportunity to return a value for replacement.
It does not pass all matches, but invokes the callback for each match. The callback won't receive a single string parameter, but a list of strings. $match[0] is the whole match, and $match[1] the first capture group (what's in your regex between the first parens).
So this is how your callback should look:
function o99_simple_callback($match){
$url = $match[1];
//$url = esc_url_raw( $link );
$url_name = parse_url($url);
$url_name = $description = $url_name['host'];// get rid of http://..
$url = 'http://something' . urlencode($url) . '?w=' . $width ;
return $url; // what i really need to replace
}
Please also see the manual examples on preg_replace_callback
preg_replace_callback
Using preg_replace_callback() to Replace Patterns
Generating replacement strings with a callback function
Generating replacement strings with an anonymous function
I need to save the number between every pair of curly brackets as a variable.
{2343} -> $number
echo $number;
Output = 2343
I don't know how to do the '->' part.
I've found a similar function, but it simply removes the curly brackets and does nothing else.
preg_replace('#{([0-9]+)}#','$1', $string);
Is there any function I can use?
You'll probably want to use preg_match with a capture:
$subject = "{2343}";
$pattern = '/\{(\d+)\}/';
preg_match($pattern, $subject, $matches);
print_r($matches);
Output:
Array
(
[0] => {2343}
[1] => 2343
)
The $matches array will contain the result at index 1 if it is found, so:
if(!empty($matches) && isset($matches[1)){
$number = $matches[1];
}
If your input string can contain many numbers, then use preg_match_all:
$subject = "{123} {456}";
$pattern = '/\{(\d+)\}/';
preg_match_all($pattern, $subject, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => {123}
[1] => {456}
)
[1] => Array
(
[0] => 123
[1] => 456
)
)
$string = '{1234}';
preg_replace('#{([0-9]+)}#e','$number = $1;', $string);
echo $number;