Convert Regexp in Js into PHP? - php

I have the following regular expression in javascript and i would like to have the exact same functionality (or similar) in php:
// -=> REGEXP - match "x bed" , "x or y bed":
var subject = query;
var myregexp1 = /(\d+) bed|(\d+) or (\d+) bed/img;
var match = myregexp1.exec(subject);
while (match != null){
if (match[1]) { "X => " + match[1]; }
else{ "X => " + match[2] + " AND Y => " + match[3]}
match = myregexp1.exec(subject);
}
This code searches a string for a pattern matching "x beds" or "x or y beds".
When a match is located, variable x and variable y are required for further processing.
QUESTION:
How do you construct this code snippet in php?
Any assistance appreciated guys...

You can use the regex unchanged. The PCRE syntax supports everything that Javascript does. Except the /g flag which isn't used in PHP. Instead you have preg_match_all which returns an array of results:
preg_match_all('/(\d+) bed|(\d+) or (\d+) bed/im', $subject, $matches,
PREG_SET_ORDER);
foreach ($matches as $match) {
PREG_SET_ORDER is the other trick here, and will keep the $match array similar to how you'd get it in Javascript.

I've found RosettaCode to be useful when answering these kinds of questions.
It shows how to do the same thing in various languages. Regex is just one example; they also have file io, sorting, all kinds of basic stuff.

You can use preg_match_all( $pattern, $subject, &$matches, $flags, $offset ), to run a regular expression over a string and then store all the matches to an array.
After running the regexp, all the matches can be found in the array you passed as third argument. You can then iterate trough these matches using foreach.
Without setting $flags, your array will have a structure like this:
$array[0] => array ( // An array of all strings that matched (e.g. "5 beds" or "8 or 9 beds" )
0 => "5 beds",
1 => "8 or 9 beds"
);
$array[1] => array ( // An array containing all the values between brackets (e.g. "8", or "9" )
0 => "5",
1 => "8",
2 => "9"
);
This behaviour isn't exactly the same, and I personally don't like it that much. To change the behaviour to a more "JavaScript-like"-one, set $flags to PREG_SET_ORDER. Your array will now have the same structure as in JavaScript.
$array[0] => array(
0 => "5 beds", // the full match
1 => "5", // the first value between brackets
);
$array[1] => array(
0 => "8 or 9 beds",
1 => "8",
2 => "9"
);

Related

PHP regular expression, match the last occurence

I have a php function that splits product names from their color name in woocommerce.
The full string is generally of this form "product name - product color", like for example:
"Boxer Welbar - ligth grey" splits into "Boxer Welbar" and "light grey"
"Longjohn Gari - marine stripe" splits into "Longjohn Gari" and "marine stripe"
But in some cases it can be "Tee-shirt - product color"...and in this case the split doesn't work as I want, because the "-" in Tee-shirt is detected.
How to circumvent this problem? Should I use a "lookahead" statement in the regexp?
function product_name_split($prod_name) {
$currenttitle = strip_tags($prod_name);
$splitted = preg_split("/–|[\p{Pd}\xAD]|(–)/", $currenttitle);
return $splitted;
}
I'd go for a negative lookahead.
Something like this:
-(?!.*-)
that means to search for a - not followed by any other -
This works if in the color name there will never be a -
What about counting space characters that surround a dash?
For example:
function product_name_split($prod_name) {
$currenttitle = strip_tags($prod_name);
$splitted = preg_split("/\s(–|[\p{Pd}\xAD]|(–))\s/", $currenttitle);
return $splitted;
}
This automatically trims spaces from split parts as well.
If you have - as delimiter (note the spaces around the dash), you may simply use explode(...). If not, use
\s*-(?=[^-]+$)\s*
or
\w+-\w+(*SKIP)(*FAIL)|-
with preg_split(), see the demos on regex101.com (#2)
In PHP this could be:
<?php
$strings = ["Tee-shirt - product color", "Boxer Welbar - ligth grey", "Longjohn Gari - marine stripe"];
foreach ($strings as $string) {
print_r(explode(" - ", $string));
}
foreach ($strings as $string) {
print_r(preg_split("~\s*-(?=[^-]+$)\s*~", $string));
}
?>
Both approaches will yield
Array
(
[0] => Tee-shirt
[1] => product color
)
Array
(
[0] => Boxer Welbar
[1] => ligth grey
)
Array
(
[0] => Longjohn Gari
[1] => marine stripe
)
To collect the splitted items, use array_map(...):
$splitted = array_map( function($item) {return preg_split("~\s*-(?=[^-]+$)\s*~", $item); }, $strings);
Your sample inputs convey that the neighboring whitespace around the delimiting hyphen/dash is just as critical as the hyphen/dash itself.
I recommend doing all of the html and special entity decoding before executing your regex -- that's what these other functions are built for and it will make your regex pattern much simpler to read and maintain.
\p{Pd} will match any hyphen/dash. Reinforce the business logic in the code by declaring a maximum of 2 elements to be generated by the split.
As a general rule, I discourage declaring single-use variables.
Code: (Demo)
function product_name_split($prod_name) {
return preg_split(
"/ \p{Pd} /u",
strip_tags(
html_entity_decode(
$prod_name
)
),
2
);
}
$tests = [
'Tee-shirt - product color',
'Boxer Welbar - ligth grey',
'Longjohn Gari - marine stripe',
'En dash – green',
'Entity – blue',
];
foreach ($tests as $test) {
echo var_export(product_name_split($test, true)) . "\n";
}
Output:
array (
0 => 'Tee-shirt',
1 => 'product color',
)
array (
0 => 'Boxer Welbar',
1 => 'ligth grey',
)
array (
0 => 'Longjohn Gari',
1 => 'marine stripe',
)
array (
0 => 'En dash',
1 => 'green',
)
array (
0 => 'Entity',
1 => 'blue',
)
As usual, there are several options for this, this is one of them
explode — Split a string by a string
end — Set the internal pointer of an array to its last element
$currenttitle = 'Tee-shirt - product color';
$array = explode( '-', $currenttitle );
echo end( $array );

Extracting javascript object from html using regex & php

I am trying to extract a specific JavaScript object from a page containing the usual HTML markup.
I have tried to use regex but i don't seem to be able to get it to parse the HTML correctly when the HTML contains a line break.
An example can be seen here: https://regex101.com/r/b8zN8u/2
The HTML i am trying to extract looks like this:
<script>
DATA.tracking.user = {
age: "19",
name: "John doe"
}
</script>
Using the following regex: DATA.tracking.user=(.*?)}
<?php
$re = '/DATA.tracking.user = (.*?)\}/m';
$str = '<script>
DATA.tracking.user = { age: "19", name: "John doe" }
</script>';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
If i parse DATA.tracking.user = { age: "19", name: "John doe" } without any linebreaks, Then it works fine but if i try to parse:
DATA.tracking.user = {
age: "19",
name: "John doe"
}
It does not like dealing with the line breaks.
Any help would be greatly appreciated.
Thanks.
You will need to specify whitespaces (\s) in your pattern in order to parse the javascript code containing linebreaks.
For example, if you use the following code:
<?php
$re = '/DATA.tracking.user = \{\s*.*\s*.*\s*\}/';
$str = '<script>
DATA.tracking.user = {
age: "19",
name: "John doe"
}
</script>';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches[0]);
?>
You will get the following output:
Array
(
[0] => DATA.tracking.user = {
age: "19",
name: "John doe"
}
)
The simple solution to your problem is to use the s pattern modifier to command the . (any character) to also match newline characters -- which it does not by default.
And you should:
escape your literal dots.
write the \{ outside of your capture group.
omit the m pattern modifier because you aren't using anchors.
...BUT...
If this was my task and I was going to be processing the data from the extracted string, I would probably start breaking up the components at extraction-time with the power of \G.
Code: (Demo) (Pattern Demo)
$htmls[] = <<<HTML
DATA.tracking.user = { age: "19", name: "John doe", int: 55 } // This works
HTML;
$htmls[] = <<<HTML
DATA.tracking.user = {
age: "20",
name: "Jane Doe",
int: 49
} // This does not works
HTML;
foreach ($htmls as $html) {
var_export(preg_match_all('~(?:\G(?!^),|DATA\.tracking\.user = \{)\s+([^:]+): (\d+|"[^"]*")~', $html, $out, PREG_SET_ORDER) ? $out : []);
echo "\n --- \n";
}
Output:
array (
0 =>
array (
0 => 'DATA.tracking.user = { age: "19"',
1 => 'age',
2 => '"19"',
),
1 =>
array (
0 => ', name: "John doe"',
1 => 'name',
2 => '"John doe"',
),
2 =>
array (
0 => ', int: 55',
1 => 'int',
2 => '55',
),
)
---
array (
0 =>
array (
0 => 'DATA.tracking.user = {
age: "20"',
1 => 'age',
2 => '"20"',
),
1 =>
array (
0 => ',
name: "Jane Doe"',
1 => 'name',
2 => '"Jane Doe"',
),
2 =>
array (
0 => ',
int: 49',
1 => 'int',
2 => '49',
),
)
---
Now you can simply iterate the matches and work with [1] (the keys) and [2] (the values). This is a basic solution, that can be further tailored to suit your project data. Admittedly, this doesn't account for values that contain an escaped double-quote. Adding this feature would be no trouble. Accounting for more complex value types may be more of a challenge.
You need to add the 's' modifier to the end of your regex - otherwise, "." does not include newlines. See this:
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
So basically change your regex to be:
'/DATA.tracking.user = (.*?)\}/ms'
Also, you should quote your other dots (otherwise you will match "DATAYtrackingzZuser". So...
'/DATA\.tracking\.user = (.*?)\}/ms'
I'd also add in the open curly bracket and not enforce the single space around the equal sign, so:
'/DATA\.tracking\.user\s*=\s*\{(.*?)\}/ms'
Since you seem to be scraping/reading the page anyway (so you have a local copy), you can simply replace all the newline characters in the HTML page with whitespace characters, then it should work perfectly without even changing your script.
Refer to this for the ascii values:
https://www.techonthenet.com/ascii/chart.php

Parsing parameters from command line with RegEx and PHP

I have this as an input to my command line interface as parameters to the executable:
-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"
What I want to is to get all of the parameters in a key-value / associative array with PHP like this:
$result = [
'Parameter1' => '1234',
'Parameter2' => '1234',
'param3' => 'Test \"escaped\"',
'param4' => '10',
'param5' => '0',
'param6' => 'TT',
'param7' => 'Seven',
'param8' => 'secret',
'SuperParam9' => '4857',
'SuperParam10' => '123',
];
The problem here lies at the following:
parameter's prefix can be - or --
parameter's glue (value assignment operator) can be either an = sign or a whitespace ' '
some parameters may be inside a quote block and can also have different, both separators and glues and prefixes, ie. a ? mark for the separator.
So far, since I'm really bad with RegEx, and still learning it, is this:
/(-[a-zA-Z]+)/gui
With which I can get all the parameters starting with an -...
I can go to manually explode the entire thing and parse it manually, but there are way too many contingencies to think about.
You can try this that uses the branch reset feature (?|...|...) to deal with the different possible formats of the values:
$str = '-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"';
$pattern = '~ --?(?<key> [^= ]+ ) [ =]
(?|
" (?<value> [^\\\\"]*+ (?s:\\\\.[^\\\\"]*)*+ ) "
|
([^ ?"]*)
)~x';
preg_match_all ($pattern, $str, $matches);
$result = array_combine($matches['key'], $matches['value']);
print_r($result);
demo
In a branch reset group, the capture groups have the same number or the same name in each branch of the alternation.
This means that (?<value> [^\\\\"]*+ (?s:\\\\.[^\\\\"]*)*+ ) is (obviously) the value named capture, but that ([^ ?"]*) is also the value named capture.
You could use
--?
(?P<key>\w+)
(?|
=(?P<value>[^-\s?"]+)
|
\h+"(?P<value>.*?)(?<!\\)"
|
\h+(?P<value>\H+)
)
See a demo on regex101.com.
Which in PHP would be:
<?php
$data = <<<DATA
-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"
DATA;
$regex = '~
--?
(?P<key>\w+)
(?|
=(?P<value>[^-\s?"]+)
|
\h+"(?P<value>.*?)(?<!\\\\)"
|
\h+(?P<value>\H+)
)~x';
if (preg_match_all($regex, $data, $matches)) {
$result = array_combine($matches['key'], $matches['value']);
print_r($result);
}
?>
This yields
Array
(
[Parameter1] => 1234
[Parameter2] => 38518
[param3] => Test \"escaped\"
[param4] => 10
[param5] => 0
[param6] => TT
[param7] => Seven
[param8] => secret
[SuperParam9] => 4857
[SuperParam10] => 123
)

split string in php "genesis 1:3-16" to "genesis", "1", "3", "16"

In php when user saves a text, I need to split string as
"genesis1:3-16" ==> "genesis", "1", "3", "16"
"revelation2:3-5" ==> "revelation", "2", "3", "5"
The conditions are there will be no white spaces between all characters I need to split according to symbol ":", "-", and character. the numbers can go up to only '999' 3 digits.
$sample = "genesis1:3-16";
//magic happens....
$book = ""; // genesis
$chapter = ""; // 1
$start_verse = ""; // 3
$end_verse = ""; //16
I have limited knowledge of reg expression and can't figure out using only strpos and substr...
Thank you in advance
I think this regex would accomplish what you are after:
([a-z]+)(\d{1,3}):(\d{1,3})-(\d{1,3})
Demo (with explanation of what each part does): https://regex101.com/r/uP4gW6/1
PHP Usage:
preg_match('~([a-z]+)(\d{1,3}):(\d{1,3})-(\d{1,3})~', 'genesis1:3-16', $data);
print_r($data);
Output:
Array
(
[0] => genesis1:3-16
[1] => genesis
[2] => 1
[3] => 3
[4] => 16
)
With preg_match the 0 index is the found content. The subsequent indexes are each captured group.
If you have a fixed set of names the book could be you could replace [a-z]+ with that list seperated by |, for example revelation|genesis|othername.
$parts = array();
preg_match('/^(.*?)\s*(\d+):(\d+)-(\d+)$/', $sample, $parts);
$book = $parts[1];
$chapter = $parts[2];
$startVerse = $parts[3];
$endVerse = $parts[4];
you could use this simple pattern
([a-zA-Z]+|\d{1,3})
Demo

Negotiate arrays inside an array

When i perform a regular expression
preg_match_all('~(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)~', $content, $turls);
print_r($turls);
i got an array inside array. I need a single array only.
How to negotiate the arrays inside another arrays
By default preg_match_all() uses PREG_PATTERN_ORDER flag, which means:
Orders results so that $matches[0] is
an array of full pattern matches,
$matches1 is an array of strings
matched by the first parenthesized
subpattern, and so on.
See http://php.net/preg_match_all
Here is sample output:
array(
0 => array( // Full pattern matches
0 => 'http://www.w3.org/TR/html4/strict.dtd',
1 => ...
),
1 => array( // First parenthesized subpattern.
// In your case it is the same as full pattern, because first
// parenthesized subpattern includes all pattern :-)
0 => 'http://www.w3.org/TR/html4/strict.dtd',
1 => ...
),
2 => array( // Second parenthesized subpattern.
0 => 'www.w3.org',
1 => ...
),
...
)
So, as R. Hill answered, you need $matches[0] to access all matched urls.
And as budinov.com pointed, you should remove outer parentheses to avoid second match duplicate first one, e.g.:
preg_match_all('~https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?~', $content, $turls);
// where $turls[0] is what you need
Not sure what you mean by 'negociate'. If you mean fetch the inner array, that should work:
$urls = preg_match_all('~(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)~', $content, $matches) ? $matches[0] : array();
if ( count($urls) ) {
...
}
Generally you can replace your regexp with one that doesn't contain parenthesis (). This way your results will be hold just in the $turls[0] variable :
preg_match_all('/https?\:\/\/[^\"\'\s]+/i', file_get_contents('http://www.yahoo.com'), $turls);
and then do some code to make urls unique like this:
$result = array_keys(array_flip($turls[0]));

Categories