How can I match just the first line of occurrence? - php

I have this string:
$str = "11ff11
22mm22
33gg33
mm22mm
vv55vv
77ll77
55kk55
kk22kk
bb11bb";
There is two kind of patterns:
{two numbers}{two letters}{two numbers}
{two letters}{two numbers}{two letters}
I'm trying to match the first line when pattern changes. So I want to match these:
11ff11 -- this
22mm22
33gg33
mm22mm -- this
vv55vv
77ll77 -- this
55kk55
kk22kk -- this
bb11bb
Here is my current pattern:
/(\d{2}[a-z]{2}\d{2})|([a-z]{2}\d{2}[a-z]{2})/
But it matches all lines ..! How can I limit it to match just first line of same pattern?

I could not do it with lookaround due to the problem with spaces. But with classic regex it's available. It finds sequences of repeating pattern and capture only he first one
(?:(\d{2}[a-z]{2}\d{2})\s+)(?:\d{2}[a-z]{2}\d{2}\s+)*|(?:([a-z]{2}\d{2}[a-z]{2})\s+)(?:[a-z]{2}\d{2}[a-z]{2}\s+)*
demo and some explanation
To understand how it works i made simple exmple with patterns of digit and letter:
(?:(\d)\s+)(?:\d\s+)*|(?:(a)\s+)(?:a\s+)*
demo and some explanation

Not sure if you can do this with only one expression, but you can iterate over your string and test when changes:
<?php
$str = "11ff11
22mm22
33gg33
mm22mm
vv55vv
77ll77
55kk55
kk22kk
bb11bb";
$exploded = explode(PHP_EOL, $str);
$patternA = '/(\d{2}[a-z]{2}\d{2})/';
$patternB = '/([a-z]{2}\d{2}[a-z]{2})/';
$result = [];
$currentPattern = '';
//get first and check what pattern is
if(preg_match($patternA, $exploded[0])){
$currentPattern = $patternA;
$result[] = $exploded[0];
} elseif(preg_match($patternB, $exploded[0])){
$currentPattern = $patternB;
$result[] = $exploded[0];
} else {
//.. no pattern on first element, should we continue?
}
//toggle
$currentPattern = $currentPattern == $patternA ? $patternB : $patternA;
foreach($exploded as $e) {
if(preg_match($currentPattern, $e)) {
//toggle
$currentPattern = $currentPattern == $patternA ? $patternB : $patternA;
$result[] = trim($e);
}
}
echo "<pre>";
var_dump($result);
echo "</pre>";
Output:
array(4) {
[0]=>
string(6) "11ff11"
[1]=>
string(6) "mm22mm"
[2]=>
string(6) "77ll77"
[3]=>
string(6) "kk22kk"
}

Here's my take. Never used lookbehinds before and well, my regex skills are not that good but this does seem to return what you want.
/^.*|(?<=[a-z]{2}\n)\d{2}[a-z]{2}\d{2}|(?<=\d{2}\n)[a-z]{2}\d{2}[a-z]{2}/

Related

How to grab number after a word or symbol in PHP?

I want to grab a text with PHP just like for an example, There is a data "The apple=10" and I want to grab only the numbers from the data which looks exactly like that. I mean, the number's place would be after 'equals'.
and my problem is that the number from the source can be 2 or 3 characters or on the other word it is inconstant.
please help me to solve them :)
$string = "Apple=10 | Orange=3 | Banana=7";
$elements = explode("|", $string);
$values = array();
foreach($elements as $element)
{
$element = trim($element);
$val_array = explode("=", $element);
$values[$val_array[0]] = $val_array[1];
}
var_dump($values);
Output:
array(3) {
["Apple"]=> string(2) "10"
["Orange"]=> string(1) "3"
["Banana"]=> string(1) "7"
}
Hope thats how you need it :)
Well, php is a bit lazy about int conversion, so 12345blablabla can be converted to 12345:
$value = intval(substr($str, strpos($str, '=') + 1));
Of course, this is not the cleanest way but it is simple. If you want something cleaner, you could use a regexp:
preg_match ('#=([0-9]+)#', $str, $matches);
$value = intval($matches[1]) ;
Try the below code:
$givenString= "The apple=10";
$required_string = substr($givenString, strpos($givenString, "=") + 1);
echo "output = ".$required_string ; // output = 10
Using strpos() function, you can Find the position of the first occurrence of a substring in a string
and substr() function, Return part of a string.

How to use preg_replace_callback if i want to use same "string" twice on a callback?

I wrote a BBcode function that finds a match and does the replacement. However, if i need to use same preg_match it is not returning the match correctly.
the code is:
<?php
//CODE EXAMPLE
class BBCode {
protected $bbcode = array();
public function __construct() {
// Replace [div class="class name(s)"]...[/div] with <div class="...">...</div>
$this->bbcode["/\[div class=\"([^\"]+)\"\](.*?)\[\/div\]/is"] = function ($match) {
return "<div class=\"$match[1]\">$match[2]</div>";
};
}
public function rander($str) {
foreach ($this->bbcode as $key => $val) {
$str = preg_replace_callback($key, $val, $str);
}
return $str;
}
}
?>
if i use just one tag it works fine!
like that:
$str= "[div class="class1"]this is a div[/div]";
even if i use different tags it works great.
$str= "[div class="class1"][p]this is a paragraph inside a div[/p][/div]";
but when I try to use :
$str = "[div class="class1"][div class="class2"]A div inside a div[/div][/div]";
it is not working and the output is:
<div class="class1">[div class="class2"]div inside a div</div>[/div]
istead of:
<div class="class1"><div class="class2">div inside a div</div></div>
How can i fix it to work correctly ?
Thanks!
A link to the whole bbcode class code on github
Your regexp will match from the first [div class="..."] up to the first [/div] it finds. Therefore, the opening and closing div's do not match (as you can see in your example as well). To prevent this, use a look-around to prevent matching when there is another [div or [/div:
"/\[div class=\"([^\"]+)\"\]((?!\[div|\[\/div).)*\[\/div\]/is"
Note that this will match only once (only the inner div-pair), so you have to repeat the matching until nothing is found anymore.
Demo:
$str = "a[div class=\"b\"]c[div class=\"d\"]e[/div]f[/div]g";
$reg = "/\[div class=\"([^\"]+)\"\]((?!\[div|\[\/div).)*\[\/div\]/is";
preg_match($reg, $str, $matches);
var_dump($matches);
will output:
array(3)
{
[0]=>
string(22) "[div class="d"]e[/div]"
[1]=>
string(1) "d"
[2]=>
string(1) "e"
}
Edit: Yes, as you commented, it replaces only the first match. The * is at the wrong place. Try this code:
$str = 'a[div class="b"]c[div class="d"]e[/div]f[/div]g';
$reg = "/^(.*)(\[div class=\"([^\"]+)\"\])((?!\[div|\[\/div).*)\[\/div\](.*)$/is";
$result = $str;
$step = 1;
echo "step 0: $result\n";
do {
$result = preg_replace($reg, "$1<div class=\"$3\">$4</div>$5", $result, -1, $count);
echo "step $step: $result\n";
$step++;
} while ($count > 0);
This outputs:
step 0: a[div class="b"]c[div class="d"]e[/div]f[/div]g
step 1: a[div class="b"]c<div class="d">e[/div]f</div>g
step 2: a<div class="b">c<div class="d">e</div>f</div>g
step 3: a<div class="b">c<div class="d">e</div>f</div>g
Note: right now, it is matching one time too often, the loop is not optimal.

RegexCheck if "|" is in a string

I want to check if a string has the character "|" (only one time and not at the beginning or end of the string). If this is the case the string should be splitted. The second check is if the first part is "X" or not.
Example:
$string = "This|is an example"; // Output: $var1 = "This"; $var2 = "is an example";
RegEx is really difficult for me. This is my poor attempt:
if (preg_match('/(.*?)\|(.*?)/', $string, $m)) {
$var1 = $m[1];
$var2 = $m[2];
if ($var1 == "X") // do anything
else // Do something else
A pure regex solution would be:
^ -- start of input
[^|]+ -- some non-pipes
\| -- a pipe
[^|]+ -- some non-pipes
$ -- finita la comedia
However, string functions might work better in this case, since you're going to split it anyways:
$x = explode('|', $input);
if(count($x) == 2 && strlen($x[0]) && strlen($x[1]))
// all right
If you don't know regex you might want a solution which doesn't use regex.
$test = ["|sdf", "asd|asad", "asd|", "asdf", "sd|sdf|sd"];
foreach ($test as $string) {
$res = explode("|", $string);
if (2 === count($res) && strlen($res[0]) && strlen($res[1])) {
var_dump($res);
}
}
Result:
array(2) {
[0]=>
string(3) "asd"
[1]=>
string(4) "asad"
}

Parse words in PHP

From the string of words, can I get only the words with a capitalized first letter? For example, I have this string:
Page and Brin originally nicknamed THEIR new search engine "BackRub",
because the system checked backlinks to estimate the importance of a
site.
I need to get: Page, Brin, THEIR, BackRub
A non-regex solution (based on Mark Baker's comment):
$result = array_filter(str_word_count($str, 1), function($item) {
return ctype_upper($item[0]);
});
print_r($result);
Output:
Array
(
[0] => Page
[2] => Brin
[5] => THEIR
[9] => BackRub
)
You can match that with
preg_match("/[A-Z]{1}[a-zA-z]*/um", $searchText)
You can see on php.net how preg_match can be applied.
http://ca1.php.net/preg_match
EDIT, TO ADD EXAMPLE
Here's an example of how to get the array with full matches
$searchText = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
preg_match_all("/[A-Z]{1}[a-zA-z]*/um", $searchText, $matches );
var_dump( $matches );
The output is:
array(1) {
[0]=>
array(4) {
[0]=>
string(4) "Page"
[1]=>
string(4) "Brin"
[2]=>
string(5) "THEIR"
[3]=>
string(7) "BackRub"
}
}
The way I would do it is explode by space, ucfirst the exploded strings, and check them against the original.
here is what I mean:
$str = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
$strings = explode(' ', $str);
$i = 0;
$out = array();
foreach($strings as $s)
{
if($strings[$i] == ucfirst($s))
{
$out[] = $s;
}
++$i;
}
var_dump($out);
http://codepad.org/QwrS4HpE
I would use strtok function (http://pl1.php.net/strtok), which returns the words in the string, one by one. You can specify the delimiter between words:
$string = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
$delimiter = ' ,."'; // specify valid delimiters here (add others as needed)
$capitalized_words = array(); // array to hold the found words
$tok = strtok($string,$delimiter); // get first token
while ($tok !== false) {
$first_char = substr($tok,0,1);
if (strtoupper($first_char)===$first_char) {
// this word ($tok) is capitalized, store it
$capitalized_words[] = $tok;
}
$tok = strtok($delimiter); // get next token
}
var_dump($capitalized_words); // print the capitalized words found
This prints:
array(4) {
[0]=>
string(4) "Page"
[1]=>
string(4) "Brin"
[2]=>
string(5) "THEIR"
[3]=>
string(7) "BackRub"
}
Good luck!
Only drawback I can see is that it doesn't handle multibyte. If you have only English characters, then you're ok. If you have international characters, a modified/different solution may be needed.
You can do this using explode and loop through with regex:
$string = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
$list = explode(' ',$string);
$matches = array();
foreach($list as $str) {
if(preg_match('/[A-Z]+[a-zA-Z]*/um',$str) $matches[] = $str;
}
print_r($matches);

Compare strings and extract variables?

Could someone tell me how I would do this. I have 3 strings.
$route = '/user/$1/profile/$2';
$path = '/user/profile/$1/$2';
$url = '/user/jason/profile/friends';
What I need to do is check to see if the url conforms to the route. I am trying to do this as follows.
$route_segments = explode('/', $route);
$url_segments = explode('/', $url);
$count = count($url_segments);
for($i=0; $i < $count; $i++) {
if ($route_segments[$i] != $url_segments[$i] && ! preg_match('/\$[0-9]/', $route_segments[$i])) {
return false;
}
}
I assume the regex works, it's the first I have ever written by myself. :D
This is where I am stuck. How do I compare the following strings:
$route = '/user/$1/profile/$2';
$url = '/user/jason/profile/friends';
So I end up with:
array (
'$1' => 'jason',
'$2' => 'friends'
);
I assume that with this array I could then str_replace these values into the $path variable?
$route_segments = explode('/',$route);
$url_segments = explode('/',$url);
$combined_segments = array_combine($route_segments,$url_segments);
Untested and not sure how it reacts with unequal array lengths, but that's probably what you're looking for regarding an element-to-element match. Then you can pretty much iterate the array and look for $ and use the other value to replace it.
EDIT
/^\x24[0-9]+$/
Close on the RegEx except you need to "Escape" the $ in a regex because this is a flag for end of string (thus the \x24). The [0-9]+ is a match for 1+ number(s). The ^ means match to the beginning of the string, and, as explained, the $ means match to the end. This will insure it's always a dollar sign then a number.
(actually, netcoder has a nice solution)
I did something similar in a small framework of my own.
My solution was to transform the template URL: /user/$1/profile/$2
into a regexp capable of parsing parameters: ^\/user\/([^/]*)/profile/([^/]*)\/$
I then check if the regexp matches or not.
You can have a look at my controller code if you need to.
You could do this:
$route = '/user/$1/profile/$2';
$path = '/user/profile/$1/$2';
$url = '/user/jason/profile/friends';
$regex_route = '#'.preg_replace('/\$[0-9]+/', '([^/]*)', $route).'#';
if (preg_match($regex_route, $url, $matches)) {
$real_path = $path;
for ($i=1; $i<count($matches); $i++) {
$real_path = str_replace('$'.$i, $matches[$i], $real_path);
}
echo $real_path; // outputs /user/profile/jason/friends
} else {
// route does not match
}
You could replace any occurrence of $n by a named group with the same number (?P<_n>[^/]+) and then use it as pattern for preg_match:
$route = '/user/$1/profile/$2';
$path = '/user/profile/$1/$2';
$url = '/user/jason/profile/friends';
$pattern = '~^' . preg_replace('/\\\\\$([1-9]\d*)/', '(?P<_$1>[^/]+)', preg_quote($route, '~')) . '$~';
if (preg_match($pattern, $url, $match)) {
var_dump($match);
}
This prints in this case:
array(5) {
[0]=>
string(28) "/user/jason/profile/friends"
["_1"]=>
string(5) "jason"
[1]=>
string(5) "jason"
["_2"]=>
string(7) "friends"
[2]=>
string(7) "friends"
}
Using a regular expression allows you to use the wildcards at any position in the path and not just as a separate path segment (e.g. /~$1/ for /~jason/ would work too). And named subpatterns allows you to use an arbitrary order (e.g. /$2/$1/ works as well).
And for a quick fail you can additionally use the atomic grouping syntax (?>…).

Categories