Get values from formatted, delimited string with quoted labels and values - php

I have an input string like this:
"Day":June 8-10-2012,"Location":US,"City":Newyork
I need to match 3 value substrings:
June 8-10-2012
US
Newyork
I don't need the labels.

Per my comment above, if this is JSON, you should definitely use those functions as they are more suited for this.
However, you can use the following REGEX.
/:([a-zA-Z0-9\s-]*)/g
<?php
preg_match('/:([a-zA-Z0-9\s-]*)/', '"Day":June 8-10-2012,"Location":US,"City":Newyork', $matches);
print_r($matches);
The regex demo is here:
https://regex101.com/r/BbwVQ5/1

Here are a couple of simple ways:
Code: (Demo)
$string = '"Day":June 8-10-2012,"Location":US,"City":Newyork';
var_export(preg_match_all('/:\K[^,]+/', $string, $out) ? $out[0] : 'fail');
echo "\n\n";
var_export(preg_split('/,?"[^"]+":/', $string, 0, PREG_SPLIT_NO_EMPTY));
Output:
array (
0 => 'June 8-10-2012',
1 => 'US',
2 => 'Newyork',
)
array (
0 => 'June 8-10-2012',
1 => 'US',
2 => 'Newyork',
)
Pattern #1 Demo \K restarts the match after : so that a positive lookbehind can be avoided (saving "steps" / improving pattern efficiency) By matching all following characters that are not a comma, a capture group can be avoided (saving "steps" / improving pattern efficiency).
Patter #2 Demo ,? makes the comma optional and qualifies the leading double-quoted "key" to be matched (split on). The targeted substring to split on will match the full "key" substring and end on the following : colon.

Related

How to replace a substring with help of preg_replace

I have a string that consists of repeated words. I want to replace a substring 'OK' located between 'L3' and 'L4'. Below you can find my code:
$search = "/(?<=L3).*(OK).*(?=L4)/";
$replace = "REPLACEMENT";
$subject = "'L1' => ('Vanessa', 'Prague', 'OK'), 'L2' => ('Alex', 'Paris', 'OK'), 'L3' => ('Paul', 'Paris', 'OK'), 'L4' => ('John', 'Madrid', 'OK')";
$str = preg_replace($search, $replace, $str);
If I use that pattern with preg_match, it finds a correct substring(third 'OK'). However, when I apply that pattern to preg_replace, it replaces substring that matches the full pattern, instead of the parenthesized subpattern.
So could you please give me an advice what I should change in my code? I know that there are plenty amount of similar questions about regex, but as I understand my pattern is correct and I'm only confused with preg_replace function
It is true that your regex matches a place in the string that is preceded with L3 then contains the last OK substring after 0+ chars other than linebreak symbols and then matches any 0+ chars up to the place followed with L4. See your regex demo.
A possible solution is to use 2 capturing groups around the subpatterns before and after the OK, and use backreferences in the replacement pattern:
$search = "/(L3.*?)OK(.*?L4)/";
$replace = "REPLACEMENT";
$subject = "'L1' => ('Vanessa', 'Prague', 'OK'), 'L2' => ('Alex', 'Paris', 'OK'), 'L3' => ('Paul', 'Paris', 'OK'), 'L4' => ('John', 'Madrid', 'OK')";
$str = preg_replace($search, '$1'.$replace.'$2', $subject);
echo $str; // => 'L1' => ('Vanessa', 'Prague', 'OK'), 'L2' => ('Alex', 'Paris', 'OK'), 'L3' => ('Paul', 'Paris', 'REPLACEMENT'), 'L4' => ('John', 'Madrid', 'OK')
See the PHP demo
If there cannot be any L3.5 in between L3 and L4, the (L3.*?)OK(.*?L4) pattern is safe to use. It will match and capture L3 and then 0+ chars other than a linebreak up to the first OK, then will match OK, and then will match and capture 0+ chars up to the first L4.
If there can be no L4, use a (?:(?!L4).)* tempered greedy token matching any symbol other than a linebreak symbol that is not starting an L4 sequence:
'~(L3(?:(?!L4).)*)OK~'
See the regex demo
NOTE: If you want to make the regexps safer, add ' around L# inside the patterns.

Regex to match specific pattern

I am so bad at creating regex and I'm struggling with what I am SURE it's a simple stupid regex.
I am using PHP to do this match. Here is what I have until now.
Test string: 8848842356063003
if(!preg_match('/^[0-2]|[7-9]{16}/', $token)) {
return array('status' => 'failed', 'message' => "Invalid token", 'token' => '');
}
The regex must comply to this: Start with 0-2 or 7-9 and have EXACTLY 16 characters. What am I doing wrong? Because I get, as a match:
array(
0 => 8
)
And I should get:
array(
0 => 8848842356063003
)
By the way: I am using PHP Live Regex to test my regex string.
Thanks in advance,
Ares D.
The regex must comply to this: Start with 0-2 or 7-9 and have EXACTLY 16 characters
You can put starting numbers in same character class and use end anchor after matching 15 more charaters:
/^[0-27-9].{15}$/
If you want to match only digits then use:
/^[0-27-9]\d{15}$/

Stop regex splitting a matched url with preg_split

Given the following code:
$regex = '/(http\:\/\/|https\:\/\/)([a-z0-9-\.\/\?\=\+_]*)/i';
$text = preg_split($regex, $note, -1, PREG_SPLIT_DELIM_CAPTURE);
its returning an array such as:
array (size=4)
0 => string '...' (length=X)
1 => string 'https://' (length=8)
2 => string 'duckduckgo.com/?q=how+much+wood+could+a+wood-chuck+chuck+if+a+wood-chuck+could+chuck+wood' (length=89)
3 => string '...' (length=X)
I would prefer it if the returned array had size=3, with one single URL. Is this possible?
Sure that can be done, just remove those extra matching groups from your regex. Try following code:
$regex = '#(https?://[a-z0-9.?=+_-]*)#i';
$text = preg_split($regex, $note, -1, PREG_SPLIT_DELIM_CAPTURE);
Now resulting array will have 3 elements in the array instead of 4.
Besides removing extra grouping I have also simplified your regex also since most of the special characters don't need to be escaped inside character class.

Parse string of alternating letters and numbers (no delimiters) to associative array

I need to parse a string of alternating letters and number and populate an array where the letters are the keys and the numbers are the values.
Example:
p10s2z1234
Output
Array(
'p' => 10,
's' => 2,
'z' => 1234
)
Use regex to get desired values and then combine arrays to get associative array. For example:
$str = 'p10s2z1234';
preg_match_all('/([a-z]+)(\d+)/', $str, $matches); //handles only lower case chars. feel free to extend regex
print_r(array_combine($matches[1], $matches[2]));
Scenario 1: You want to parse the string which has single letters to be keys, will produce three pairs of values, and you want the digits to be cast as integers. Then the best, most direct approach is sscanf() with array destructuring -- a single function call does it all. (Demo)
$str = 'p10s2z1234';
[
$k1,
$result[$k1],
$k2,
$result[$k2],
$k3,
$result[$k3]
] = sscanf($str, '%1s%d%1s%d%1s%d');
var_export($result);
Output:
array (
'p' => 10,
's' => 2,
'z' => 1234,
)
Scenario 2: You want the same parsing and output as scenario 1, but the substrings to be keys have variable/unknown length. (Demo)
$str = 'pie10sky2zebra1234';
[
$k1,
$result[$k1],
$k2,
$result[$k2],
$k3,
$result[$k3]
] = sscanf($str, '%[^0-9]%d%[^0-9]%d%[^0-9]%d');
var_export($result);
Scenario 3: You want to parse the string with regex and don't care that the values are "string" data-typed. (Demo)
$str = 'pie10sky2zebra1234';
[
$k1,
$result[$k1],
$k2,
$result[$k2],
$k3,
$result[$k3]
] = preg_split('/(\d+)/', $str, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
var_export($result);
Scenario 4: If you don't know how many pairs will be generated by the input string, use array_combine(). (Demo)
$str = 'pie10sky2zebra1234extra999';
var_export(
preg_match_all('/(\D+)(\d+)/', $str, $m)
? array_combine($m[1], $m[2])
: []
);

Negotiate arrays inside an array

When i perform a regular expression
preg_match_all('~(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)~', $content, $turls);
print_r($turls);
i got an array inside array. I need a single array only.
How to negotiate the arrays inside another arrays
By default preg_match_all() uses PREG_PATTERN_ORDER flag, which means:
Orders results so that $matches[0] is
an array of full pattern matches,
$matches1 is an array of strings
matched by the first parenthesized
subpattern, and so on.
See http://php.net/preg_match_all
Here is sample output:
array(
0 => array( // Full pattern matches
0 => 'http://www.w3.org/TR/html4/strict.dtd',
1 => ...
),
1 => array( // First parenthesized subpattern.
// In your case it is the same as full pattern, because first
// parenthesized subpattern includes all pattern :-)
0 => 'http://www.w3.org/TR/html4/strict.dtd',
1 => ...
),
2 => array( // Second parenthesized subpattern.
0 => 'www.w3.org',
1 => ...
),
...
)
So, as R. Hill answered, you need $matches[0] to access all matched urls.
And as budinov.com pointed, you should remove outer parentheses to avoid second match duplicate first one, e.g.:
preg_match_all('~https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?~', $content, $turls);
// where $turls[0] is what you need
Not sure what you mean by 'negociate'. If you mean fetch the inner array, that should work:
$urls = preg_match_all('~(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)~', $content, $matches) ? $matches[0] : array();
if ( count($urls) ) {
...
}
Generally you can replace your regexp with one that doesn't contain parenthesis (). This way your results will be hold just in the $turls[0] variable :
preg_match_all('/https?\:\/\/[^\"\'\s]+/i', file_get_contents('http://www.yahoo.com'), $turls);
and then do some code to make urls unique like this:
$result = array_keys(array_flip($turls[0]));

Categories