Compilation failed: POSIX collating elements are not supported at offset - php

i would like transform a string to array with pattern. But my regex give me the warning.
this is a string:
$string = typ="bar" title="Example" enabled=true count=true style="float: left; width: 30%;"
My regex:
$regex='/(.*?)[=\"|=](.*?)\"*\s*/';
preg_match_all($regex, $string1, $matchesreg, PREG_SET_ORDER);
Whith the regex is the output not correct. The last array must be split further
$regex='/(.?)="(.?)"\s*/';
preg_match_all($regex, $string1, $matchesreg, PREG_SET_ORDER);
The output
Array
(
[0] => Array
(
[0] => typ="bar"
[1] => typ
[2] => bar
) ...
[2] => Array
(
[0] => enabled=true count=true style="float: left; width: 30%;"
[1] => enabled=true count=true style
[2] => float: left; width: 30%;
)
)
My desired output like:
php
Array
(
[0] => Array
(
[0] => typ="bar"
[1] => typ
[2] => bar
)
[1] => Array
(
[0] => title="Example"
[1] => title
[2] => Example
)
[2] => Array
(
[0] => enabled=true
[1] => enabled
[2] => true
)
[3] => Array
(
[0] => count=true
[1] => count
[2] => true
)
[4] => Array
(
[0] => style="float: left; width: 30%;"
[1] => style
[2] => float: left; width: 30%;
)
)

You may use
preg_match_all('~([^\s=]+)=(?|"([^"]*)"|(\S+))~', $s, $m, PREG_SET_ORDER, 0)
See the PHP demo
Details
([^\s=]+) - Group 1: one or more chars other than whitespace and =
= - a = char
(?|"([^"]*)"|(\S+)) - a branch reset group matching either of
"([^"]*)" - ", then any 0 or more chars other than " are captured into Group 2, and then " is matched
| - or
(\S+) - Group 2: one or more non-whitespace chars.

Your expression seems to be working fine. There were two = in your char list that I removed one of them:
(.*?)[=\"|](.*?)\"*\s*
You can modify/change your expressions in this link, if you wish.
RegEx Circuit
You can visualize your expressions in this link:
Code
$string1 = 'typ="bar" title="Example" enabled=true count=true style="float: left; width: 30%;';
$regex = '/(.*?)[=\"|](.*?)\"*\s*/s';
preg_match_all($regex, $string1, $matchesreg, PREG_SET_ORDER);
var_dump($matchesreg);
Output
array(7) {
[0]=>
array(3) {
[0]=>
string(5) "typ=""
[1]=>
string(3) "typ"
[2]=>
string(0) ""
}
[1]=>
array(3) {
[0]=>
string(5) "bar" "
[1]=>
string(3) "bar"
[2]=>
string(0) ""
}
[2]=>
array(3) {
[0]=>
string(7) "title=""
[1]=>
string(5) "title"
[2]=>
string(0) ""
}
[3]=>
array(3) {
[0]=>
string(9) "Example" "
[1]=>
string(7) "Example"
[2]=>
string(0) ""
}
[4]=>
array(3) {
[0]=>
string(8) "enabled="
[1]=>
string(7) "enabled"
[2]=>
string(0) ""
}
[5]=>
array(3) {
[0]=>
string(11) "true count="
[1]=>
string(10) "true count"
[2]=>
string(0) ""
}
[6]=>
array(3) {
[0]=>
string(12) "true style=""
[1]=>
string(10) "true style"
[2]=>
string(0) ""
}
}

Related

Regex date formatting for templates not working

I'm trying to replace variables like {{{month}}} in a template to the current month and {{{month+1}}} to current month + 1.
That's not the hardest part of my code, except that the regex I wrote doesn't yield expected results.
$string = '{{{year}}}{{{month+1}}}';
preg_match_all('/{{{(?:([yY])ear|([mM])onth|([dD])ay)(?:(?<operation>[-|+])(?<amount>[1-9]+))?}}}/m', $string, $matches);
var_dump($matches);
Why do I have so much empty array entries?
I was expecting
[0] => array('{{{year}}}', '{{{month+1}}}')
[1] => array('y', 'm')
[2] => array('', '+')
[3] => array('', '1')
What am I doing wrong?
The respond of the above code is:
array(8) {
[0]=>
array(2) {
[0]=>
string(10) "{{{year}}}"
[1]=>
string(13) "{{{month+1}}}"
}
[1]=>
array(2) {
[0]=>
string(1) "y"
[1]=>
string(0) ""
}
[2]=>
array(2) {
[0]=>
string(0) ""
[1]=>
string(1) "m"
}
[3]=>
array(2) {
[0]=>
string(0) ""
[1]=>
string(0) ""
}
["operation"]=>
array(2) {
[0]=>
string(0) ""
[1]=>
string(1) "+"
}
[4]=>
array(2) {
[0]=>
string(0) ""
[1]=>
string(1) "+"
}
["amount"]=>
array(2) {
[0]=>
string(0) ""
[1]=>
string(1) "1"
}
[5]=>
array(2) {
[0]=>
string(0) ""
[1]=>
string(1) "1"
}
}
You may use a "generic" character class to match the first letters of month, year and day, and then use an alternation with positive look-behinds to make sure we match what we need.
preg_match_all('/{{{([yYmMdD])(?:(?<=[Yy])ear|(?<=[Mm])onth|(?<=[Dd])ay)(?:([-‌​+])([1-9]+))?}}}/m', $string, $matches);
See IDEONE demo
And this is the print_r view:
Array
(
[0] => Array
(
[0] => {{{year}}}
[1] => {{{month+1}}}
)
[1] => Array
(
[0] => y
[1] => m
)
[2] => Array
(
[0] =>
[1] => +
)
[3] => Array
(
[0] =>
[1] => 1
)
)

Exclude data from brackets using regex (preg_match_all)

Input string:
:txt{sometext}:alpha
I want to extract data like this (extracted from brackets):
Result using preg_match_all():
sometext
Trying like this, but none of this works:
php > preg_match_all('/^(\:txt)(.*)+(\{)(.*)+(\})/i', ':txt{sometext}:alpha', $m); var_dump($m);
array(6) {
[0] =>
array(1) {
[0] =>
string(14) ":txt{sometext}"
}
[1] =>
array(1) {
[0] =>
string(1) ":"
}
[2] =>
array(1) {
[0] =>
string(0) ""
}
[3] =>
array(1) {
[0] =>
string(1) "{"
}
[4] =>
array(1) {
[0] =>
string(0) ""
}
[5] =>
array(1) {
[0] =>
string(1) "}"
}
}
Note: as sample I have like this :txt{sometext}:alpha:another{mydata}, so I can extract data from :another and give results like mydata.
RESULTS:
Result from Sniffer:
php > preg_match_all('/(?<=:txt{)([^}]+)(?=})/', ':txt{sometext}:alpha', $x); var_dump($x);
array(2) {
[0] =>
array(1) {
[0] =>
string(8) "sometext"
}
[1] =>
array(1) {
[0] =>
string(8) "sometext"
}
}
Result from Jerry:
php > preg_match_all('/^:txt\{([^}]+)\}/', ':txt{sometext}:alpha', $x); var_dump($x);
array(2) {
[0] =>
array(1) {
[0] =>
string(14) ":txt{sometext}"
}
[1] =>
array(1) {
[0] =>
string(8) "sometext"
}
}
Why all this, why not just:
(?<=:txt{)([^}]+)(?=})
Regex101 Demo

Confusion with multidimensional arrays and merging

I have had success merging two arrays by difference using the following code:
$a=array("2013-08-22"=>"12","2013-08-25"=>"5","2013-08-27"=>"10");
$b=array("2013-08-22"=>"1","2013-08-23"=>"3","2013-08-25"=>"5","2013-08-27"=>"10","2013-08-29"=>"5");
foreach ($b as $key => $value){
if(!array_key_exists($key, $a)){
$a[$key]=0;
}
}
This will return:
Array
(
[2013-08-22] => 0
[2013-08-23] => 0
[2013-08-25] => 5
[2013-08-27] => 10
[2013-08-29] => 0
[2013-12-22] => 12
)
The idea is for a to additionally hold the elements from b that are not present in a.
I am having issues now doing the same thing for the following array format:
$a=array(array("2013-12-22","12"),array("2013-08-25","5"),array("2013-08-27","10"));
$b=array(array("2013-08-22","1"),array("2013-08-23","3"),array("2013-08-25","5"),array("2013-08-27","10"),array("2013-08-29","5"));
I went to try this:
foreach ($b as $key => $value){
if(!array_key_exists($key, $a)){
$a[$key]=array($value[0], 0);
}
}
But the returned result is far from what I need:
Array
(
[0] => Array
(
[0] => 2013-12-22
[1] => 12
)
[1] => Array
(
[0] => 2013-08-25
[1] => 5
)
[2] => Array
(
[0] => 2013-08-27
[1] => 10
)
[3] => Array
(
[0] => 2013-08-27
[1] => 0
)
[4] => Array
(
[0] => 2013-08-29
[1] => 0
)
)
I understand they keys are no longer the dates, but how should I go about checking each array and making sure I don't get double entries?
$a = array(
array("2013-12-22","12"),
array("2013-08-25","5"),
array("2013-08-27","10"));
$b = array(
array("2013-08-22","1"),
array("2013-08-23","3"),
array("2013-08-25","5"),
array("2013-08-27","10"),
array("2013-08-29","5"));
$exists = array();
foreach ($a as $data) {
$exists[$data[0]] = 1;
}
foreach ($b as $data) {
if (array_key_exists($data[0], $exists)) {
continue;
}
$a[] = array($data[0], $data[1]);
}
$a now contains:
array(6) {
[0]=>
array(2) {
[0]=>
string(10) "2013-12-22"
[1]=>
string(2) "12"
}
[1]=>
array(2) {
[0]=>
string(10) "2013-08-25"
[1]=>
string(1) "5"
}
[2]=>
array(2) {
[0]=>
string(10) "2013-08-27"
[1]=>
string(2) "10"
}
[3]=>
array(2) {
[0]=>
string(10) "2013-08-22"
[1]=>
string(1) "1"
}
[4]=>
array(2) {
[0]=>
string(10) "2013-08-23"
[1]=>
string(1) "3"
}
[5]=>
array(2) {
[0]=>
string(10) "2013-08-29"
[1]=>
string(1) "5"
}
}

How can I split a sentence into words and punctuation marks?

For example, I want to split this sentence:
I am a sentence.
Into an array with 5 parts; I, am, a, sentence, and ..
I'm currently using preg_split after trying explode, but I can't seem to find something suitable.
This is what I've tried:
$sentence = explode(" ", $sentence);
/*
returns array(4) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence."
}
*/
And also this:
$sentence = preg_split("/[.?!\s]/", $sentence);
/*
returns array(5) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence"
[4]=>
string(0) ""
}
*/
How can this be done?
You can split on word boundaries:
$sentence = preg_split("/(?<=\w)\b\s*/", 'I am a sentence.');
Pretty much the regex scans until a word character is found, then after it, the regex must capture a word boundary and some optional space.
Output:
array(5) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence"
[4]=>
string(1) "."
}
I was looking for the same solution and landed here. The accepted solution does not work with non-word characters like apostrophes and accent marks and so forth. Below, find the solution that worked for me.
Here is my test sentence:
Claire’s favorite sonata for piano is Mozart’s Sonata no. 15 in C Major.
The accepted answer gave me the following results:
Array
(
[0] => Claire
[1] => ’s
[2] => favorite
[3] => sonata
[4] => for
[5] => piano
[6] => is
[7] => Mozart
[8] => ’s
[9] => Sonata
[10] => no
[11] => . 15
[12] => in
[13] => C
[14] => Major
[15] => .
)
The solution I came up with follows:
$parts = preg_split("/\s+|\b(?=[!\?\.])(?!\.\s+)/", $sentence);
It gives the following results:
Array
(
[0] => Claire’s
[1] => favorite
[2] => sonata
[3] => for
[4] => piano
[5] => is
[6] => Mozart’s
[7] => Sonata
[8] => no.
[9] => 15
[10] => in
[11] => C
[12] => Major
[13] => .
)
If anyone is interested in an simple solution which ignores punctuation
preg_split( '/[^a-zA-Z0-9]+/', 'I am a sentence' );
would split into
array(4) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence"
}
Or an alternative solution where the punctuation is included in the adjacent word
preg_split( '/\b[^a-zA-Z0-9]+\b/', 'I am a sentence.' );
would split into
array(4) {
[0]=>
string(1) "I"
[1]=>
string(2) "am"
[2]=>
string(1) "a"
[3]=>
string(8) "sentence."
}

need preg_match_all links

i have a string like this one:
$string = "some text
http://dvz.local/index/index/regionId/28
http://stuff.kiev.ua/roadmap_page.php http://192.168.3.192/roadmap_page.php
http://192.168.3.192/roadmap_page.php#qwe";
need to get all links.
i tried this way: /http:\/\/(.*)[|\s]?/
returns:
array(2) {
[0] =>
array(3) {
[0] =>
string(42) "http://dvz.local/index/index/regionId/28\r\n"
[1] =>
string(77) "http://stuff.kiev.ua/roadmap_page.php http://192.168.3.192/roadmap_page.php\r\n"
[2] =>
string(41) "http://192.168.3.192/roadmap_page.php#qwe"
}
[1] =>
array(3) {
[0] =>
string(34) "dvz.local/index/index/regionId/28\r"
[1] =>
string(69) "stuff.kiev.ua/roadmap_page.php http://192.168.3.192/roadmap_page.php\r"
[2] =>
string(34) "192.168.3.192/roadmap_page.php#qwe"
}
}
EDIT 1:
expect:
array(2) {
[0] =>
array(3) {
[0] =>
string(42) "http://dvz.local/index/index/regionId/28"
[1] =>
string(77) "http://stuff.kiev.ua/roadmap_page.php"
[2] =>
string(77) "http://192.168.3.192/roadmap_page.php"
[3] =>
string(41) "http://192.168.3.192/roadmap_page.php#qwe"
}
[1] =>
array(3) {
[0] =>
string(34) "dvz.local/index/index/regionId/28"
[1] =>
string(69) "stuff.kiev.ua/roadmap_page.php"
[2] =>
string(69) "192.168.3.192/roadmap_page.php"
[3] =>
string(34) "192.168.3.192/roadmap_page.php#qwe"
}
}
Try this one:
/http:\/\/([^\s]+)/
Try this:
preg_match_all('|http://([^\s]*)|', $string, $matches);
var_dump($matches);
All links from text
http[s]?[^\s]*
Numerous pages have only relative links to the main document, (thus no http(s):// ... to parse), for those the following works fine, splitting by the href attribute:
preg_match_all('|href="([^\s]*)"><\/a>|', $html, $output_array);
Or even simpler:
preg_match_all('|href="(.*?)"><\/a>|', $html, $output_array);
Example output:
[0]=>
string(56) "/broadcast/bla/xZr300"
[1]=>
string(50) "/broadcast/lol/fMoott"

Categories