PHP Regex Match getting unexpected output

PHP Regex Match getting unexpected output - php

I'm trying to create a simple PHP script that retrieves info from a string and puts it into an array. Ive looked around on some sites on multi capture regex for one pattern but can't seem to get the output im looking for
Currently this is my script.
$input = "username: jack number: 20";
//$input = file_get_contents("test.txt");
preg_match_all("/username: ([^\s]+)|number: ([^\s]+)/", $input, $data);
var_dump($data);
Which produces this output:
0 =>
array (size=2)
0 => string 'username: jack' (length=14)
1 => string 'number: 20' (length=10)
1 =>
array (size=2)
0 => string 'jack' (length=4)
1 => string '' (length=0)
2 =>
array (size=2)
0 => string '' (length=0)
1 => string '20' (length=2)
Im looking to get the data into the form of:
0 =>
array (size=x)
0 => string 'jack'
1 =>
array (size=x)
0 => string '20'
Or two different arrays where the keys correspond to the same user/number combo

You can use match-reset \K:
preg_match_all('/\b(?:username|number):\h*\K\S+/', $input, $data);
print_r($data[0]);
Array
(
[0] => jack
[1] => 20
)
RegEx Breakup:
\b => a word boundary
(?:username|number) => matches username or number. (?:..) is non-capturing group
:\h* => matches a colon followed optional horizontal spaces
\K => match reset, causes regex engine to forget matched data
\S+ => match 1 or more non-space chars
Or else you can use a capturing group to get your matched data like this:
preg_match_all('/\b(?:username|number):\h*(\S+)/', $input, $data);
print_r($data[1]);
Array
(
[0] => jack
[1] => 20
)

(?<=username:|number:)\s*(\S+)
You can use lookbehind here.See demo.
https://regex101.com/r/mG8kZ9/10

Related

How to combine Regex with removing space and # character

I have this array, that I need to remove white spaces and # hashtag character:
array (size=7)
0 => string 'darwin' (length=6)
1 => string ' #nature' (length=8)
2 => string ' explore' (length=8)
3 => string ' galapagos' (length=10)
4 => string 'karma' (length=5)
foreach ($feedSinglePosts["hashtags_list"] as $key=>&$item) {
$item = preg_replace('/(\s|^)/', '', $item);
$item = preg_replace('/\#+/', '', $item);
}
The Regex above works well but I want to make it one line if possible.
When I do: /(\s|^)\#+/ it outputs this:
array (size=7)
0 => string 'darwin' (length=6)
1 => string 'nature' (length=6)
2 => string ' explore' (length=8)
3 => string ' galapagos' (length=10)
4 => string 'karma' (length=5)
How to make the regex on liner that removes white spaces and3 hashtag.

It appears that the characters will be at the beginning or end. If so then no need for loops or regex:
array_walk($feedSinglePosts["hashtags_list"], function(&$v) { $v = trim($v, "\n\r #"); });
If you need to remove them anywhere:
$feedSinglePosts["hashtags_list"] = str_replace(["\n","\r"," ","#"], "", $feedSinglePosts["hashtags_list"]);

A non-regex way with array_walk() and trim(),
<?php
$array = ['darwin' ,' #nature',' explore', ' galapagos','karma'];
function remove_hash_space(&$value,$key){
$value = trim($value,'# ');
}
array_walk($array, 'remove_hash_space');
print_r($array);
?>
DEMO: https://3v4l.org/nOadH
OR with single line array_map(),
$array = array_map(function($e){return trim($e,'# ');},$array);
DEMO: https://3v4l.org/OaS1F

You may use
$arr = ['darwin',' #nature',' explore',' galapagos','karma'];
print_r( preg_replace('~^[\s#]+~', '', $arr) );
// => Array ( [0] => darwin [1] => nature [2] => explore [3] => galapagos [4] => karma )
See the regex demo
The ^[\s#]+ pattern matches 1 or more occurrences (+) of whitespace or # characters ([\s#]) at the start of the string (^).
If your strings may contain some wierd Unicode whitespace, consider adding the u modifier: ~^[\s#]+~u.
If you only need to handle horizontal whitespace, replace \s with \h.

how to pull elseif - preg_match_all

I need advise how to pull content from this string.
$string = "{elseif "xxx"=="xxx"} text {elseif "xx2"!="xx2"}
text text
text
{elseif ....} text";
//or 'xxx'=='xxx'
$regex = "??";
preg_match_all($regex, $string, $out, PREG_SET_ORDER);
var_dump($out);
And my idea of var_dump output is:
array
0 =>
array
0 => string 'xxx' (length=3)
1 => string '==' (length=2)
2 => string 'xxx' (length=3)
3 => string 'text' (length=4)
1 =>
array
1 => string 'xx2' (length=)
2 => string '!=' (length=)
3 => string 'xx2' (length=)
4 => string 'text text
text' (length=)
2 =>
array
...
The output need not necessarily be as follows, but the same content.
my attempt:
$regex = "~{elseif ([\"\'](.*)[\"\'])(!=|==|===|<=|<|>=|>)([\"\'](.*)[\"\'])}(.*)~sU";
But I have bad or no output content.

Do you mean something like this? If you want to test it.
$regex = "/\{\s*elseif\s*(\"[^"]+\")\s*([^"]+)\s*(\"[^"]+\")\s*\}\s*([^{]*)\s*/gi";

the fastest way to replace (and store in array) links in the text with their order numbers

There is a $str string that may contain html text including <a >link</a> tags.
I want to store links in array and set the proper changes in the $str.
For example, with this string:
$str="some text <a href='/review/'>review</a> here <a class='abc' href='/about/'>link2</a> hahaha";
we get:
linkArray[0]="<a href='/review/'>review</a>";
positionArray[0] = 10;//position of the first link in the string
linkArray[1]="<a class='abc' href='/about/'>link2</a>";
positionArray[1]=45;//position of the second link in the string
$changedStr="some text [[0]] here [[1]] hahaha";
Is there any faster way (the performance) to do that, than running through the whole string using for?

this can be done by preg_match_all with PREG_OFFSET_CAPTURE FLAG.
e.g.
$str="some text <a href='/review/'>review</a> here <a class='abc' href='/about/'>link2</a> hahaha";
preg_match_all("|<[^>]+>(.*)</[^>]+>|U",$str,$out,PREG_OFFSET_CAPTURE);
var_dump($out);
Here the output array is $out. PREG_OFFSET_CAPTURE captures the offset in the string where the pattern starts.
The above code will output:
array (size=2)0 =>
array (size=2)
0 =>
array (size=2)
0 => string '<a href='/review/'>review</a>' (length=29)
1 => int 10
1 =>
array (size=2)
0 => string '<a class='abc' href='/about/'>link2</a>' (length=39)
1 => int 45
1 =>
array (size=2)
0 =>
array (size=2)
0 => string 'review' (length=6)
1 => int 29
1 =>
array (size=2)
0 => string 'link2' (length=5)
1 => int 75
for more information you can click on the link http://php.net/manual/en/function.preg-match-all.php
for $changedStr:
let $out be the output string from preg_match_all
$count= 0;
foreach($out[0] as $result) {
$temp=preg_quote($result[0],'/');
$temp ="/".$temp."/";
$str =preg_replace($temp, "[[".$count."]]", $str,1);
$count++;
}
var_dump($str);
This gives the output :
string 'some text [[0]] here [[1]] hahaha' (length=33)

I would use a regular expression to do such, check this:
http://weblogtoolscollection.com/regex/regex.php
try them here:
http://www.solmetra.com/scripts/regex/index.php
And use this:
http://php.net/manual/en/function.preg-match-all.php
Find your best regular expression to solve every case you may find: preg_match_all, if you set the pattern correctly, will return you an array containing every link you desire.
Edit:
In your case, assuming you want to keep the "<a>", this may work:
$array = array();
preg_match_all('/<a.*.a>/', '{{your data}}', $arr, PREG_PATTERN_ORDER);
Input example:
test
Lkdlasdk
llkdla
xx
Output with the above regexp:
Array
(
[0] => Array
(
[0] => test
[1] => Lkdlasdk
[2] => xx
)
)
Hope this helps

How to get colon delimited values from a string in php

I have a string
$style = "font-color:#000;font-weight:bold;background-color:#fff";
I need only
font-color
font-weight
background-color
I have tried
preg_match_all('/(?<names>[a-z\-]+:)/', $style, $matches);
var_dump($matches);
it gives me following output
array
0 =>
array
0 => string 'font-color:' (length=11)
1 => string 'font-weight:' (length=12)
2 => string 'background-color:' (length=17)
'names' =>
array
0 => string 'font-color:' (length=11)
1 => string 'font-weight:' (length=12)
2 => string 'background-color:' (length=17)
1 =>
array
0 => string 'font-color:' (length=11)
1 => string 'font-weight:' (length=12)
2 => string 'background-color:' (length=17)
There are three problems with this output
1. It is two or three dimensional array, I need one dimensional array.
2. It is repeating the information
3. It is appending ":" at the end of each element.
I need a single array like this
array
0 => 'font-color'
1 => 'font-weight'
2 => 'background-color'

Take out the colon:
$style = "font-color:#000;font-weight:bold;background-color:#fff";
preg_match_all('/(?<names>[a-z\-]+):/', $style, $matches);
var_dump($matches['names']);
Then use $matches['names'], since you named it, so you dont have redundant informations

Split by whitespace only if not surrounded by [,<,{ or ],>,}

I have a string like this one:
traceroute <ip-address|dns-name> [ttl <ttl>] [wait <milli-seconds>] [no-dns] [source <ip-address>] [tos <type-of-service>] {router <router-instance>] | all}
I'd like to create an array like this:
$params = array(
<ip-address|dns-name>
[ttl <ttl>]
[wait <milli-seconds]
[no-dns]
[source <ip-address>]
[tos <tos>]
{router <router-instance>] | all}
);
Should I use preg_split('/someregex/', $mystring) ?
Or is there any better solution?

Use negative lookarounds. This one uses a negative lookahead for a <. This means it will not split if it finds a < ahead of the whitespace.
$regex='/\s(?!<)/';
$mystring='traceroute <192.168.1.1> [ttl <120>] [wait <1500>] [no-dns] [source <192.168.1.11>] [tos <service>] {router <instance>] | all}';
$array=array();
$array = preg_split($regex, $mystring);
var_dump($array);
And my output is
array
0 => string 'traceroute <192.168.1.1>' (length=24)
1 => string '[ttl <120>]' (length=11)
2 => string '[wait <1500>]' (length=13)
3 => string '[no-dns]' (length=8)
4 => string '[source <192.168.1.11>]' (length=23)
5 => string '[tos <service>]' (length=15)
6 => string '{router <instance>]' (length=19)
7 => string '|' (length=1)
8 => string 'all}' (length=4)

You could use preg_match_all such as:
preg_match_all("/\\[[^]]*]|<[^>]*>|{[^}]*}/", $str, $matches);
And get your result from the $matches array.

Yes, preg_split makes sense and is probably the most efficient way to do this.
Try:
preg_split('/[\{\[<](.*?)[>\]\}]/', $mystring);
Or if you want to match rather than split, you may want to try:
$matches=array();
preg_match('/[\{\[<](.*?)[>\]\}]/',$mystring,$matches);
print_r($matches);
Updated
I missed that you're trying to get the tokens, not the content of the tokens. I think you are going to need to use preg_match. Try something like this one for a good start:
$matches = array();
preg_match_all('/(\{.*?[\}])|(\[.*?\])|(<.*?>)/', $mystring,$matches);
var_dump($matches);
I get:
Array
(
[0] => Array
(
[0] => <ip-address|dns-name>
[1] => [ttl <ttl>]
[2] => [wait <milli-seconds>]
[3] => [no-dns]
[4] => [source <ip-address>]
[5] => [tos <type-of-service>]
[6] => {router <router-instance>] | all}
)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Regex Match getting unexpected output - php

(?<=username:|number:)\s*(\S+) You can use lookbehind here.See demo. https://regex101.com/r/mG8kZ9/10

Related

How to combine Regex with removing space and # character

how to pull elseif - preg_match_all

the fastest way to replace (and store in array) links in the text with their order numbers

How to get colon delimited values from a string in php

Split by whitespace only if not surrounded by [,<,{ or ],>,}

Categories

Resources