PHP Regex Parse query string containing un-encoded ampersands - php

I'm receiving a query string (from a terrible payment system whose name I do not wish to sully publicly) that contains un-encoded ampersands
name=joe+jones&company=abercrombie&fitch&other=no
parse_str can't handle this, and I don't know enough of regex to come up with my own scheme (though I did try). My hang up was look-ahead regex which I did not quite understand.
What I'm looking for:
Array
(
[name] => joe jones
[company] => abercrombie&fitch
[other] => no
)
I thought about traipsing through the string, ampersand by ampersand, but that just seemed silly. Help?

How about this:
If two ampersands occur with no = between them, encode the first one. Then pass the result to the normal query string parser.
That should accomplish your task. This works because the pattern for a "normal" query string should always alternate equals signs and ampersands; thus two ampersands in a row means one of them should have been encoded, and as long as keys don't have ampersands in them, the last ampersand in a row is always the "real" ampersand preceding a new key.
You should be able to use the following regex to do the encoding:
$better_qs = preg_replace("/&(?=[^=]*&)/", "%26", $bad_qs);

You could also use the split() function to split the string by ampersands. After that, you could split again each element with the delimeter "="... something like that:
$myarray = split("&", $mystring);
foreach ($myarray as $element) {
$keyvalue = split("=", $element);
$resultarray[$keyvalue[0]] = $keyvalue[1];
}
print_r($resultarray);
Not tested! But you should get the idea.

Related

Use regex to quote the name in name-value pair of a list of pairs

I am trying to put quotes around the names of name-value pairs separated by commas. I use preg_replace and regex to achieve that. However, my pattern is not working properly.
$str="f1=1,f2='2',f3='a',f4=4,f5='5'";
$newstr=Preg_replace(/'(?.[^=]+)'/,"'$1'",$str);
I expected $newstr to come out like so:
'f1'=1,'f2'='2','f3'='a','f4'=4,'f5'='5'
But it doesn't and the qoutes don't contain the name.
What should the pattern be and how can I use the comma to get all of them correctly?
There are a few issues with your attempt:
PHP does not have a regex-literal syntax as in JavaScript, so starting the regex value with a forward slash is a syntax error. It should be a string, so start with a quote. Maybe you accidently swapped the slash and quote at the start and the end.
(?. is not valid. Maybe you intended (?:, but then there is no capture group and $1 is not a valid back reference. To have the capture group, you should not have (?., but just (.
[^=]+ could include substrings like 1,f2. There should be logic to not start matching while still inside a value (whether quoted or not).
I would suggest a regex where you match both parts around the = (both key and value), and then in the replacement, just reproduce the second part without change. This will ensure you don't accidently use anything in the value side for wrapping in quotes:
$newstr = preg_replace("/([^,=]+)=('[^']*'|[^,]*)/","'$1'=$2",$str);
Basically, match beginning of line or a comma (with negative capture) and then capture everything until a =
$reg = "/(?<=^|,)([^=]+)/";
$str = "f1=1,f2='2',f3='a',f4=4,f5='5'";
print_r(preg_replace($reg, "'$1'", $str));
// output:
// 'f1'=1,'f2'='2','f3'='a','f4'=4,'f5'='5'
This will also work, a different approach, but assuming there will be no comma in the values or names except the separators..
$newstr = preg_replace("/(.)(?==)|(?<=,|^)(.)/", "$1'$2", $str);
But I believe string and simple array operations will be faster as the regex is really getting complex and there are so many steps to get the characters.. Here is the same output but with array functions only.
$newstr = implode(",", array_map(function($element){ return "'". implode("'=", explode("=", $element)); }, explode(",", $str)));
RegEx is not always fast than string or array operations, but yes it can do complex things with little bit of code.

How can I extract a query string from these logs?

I have a bunch of lines in a log file where I need to extract only the query string part. I have identified these pattern:
/path/optin.html?e=somebase64string&l=somedifferentbase64string HTTP...
"/path/optin.html?e=somebase64string%3D&l=somedifferentbase64string" "browser info"...
"/path/optin.html?" "browser info"...
Some notes:
Sometimes path and query string are enclosed in double quotes
Sometimes there's no query string at all, obviously the ones with no query string are to be discarded.
Sometimes the base64 string was url encoded, so the ending "=" part comes as "%3D" instead. I don't think this has affected my script but I'd thought I'd note it also.
So, I was able to correctly extract - hopefully - all of the lines that follow the first pattern above, but the others I'm having some trouble with.
This is the pattern I'm trying with:
$pattern = '/html\?(.*)\s*HTTP/';
then I run a preg_match against the log line.
Anyone can help me out with a better regex pattern?
I need to grab this part off the log lines:
e=somebase64string&l=somedifferentbase64string
Thanks
You can use a pattern like ~\?([^\s.]*)~ to match everything after a ? until you reach a whitespace character (assuming a rule that "URLs will never have spaces in them [that aren't %20]):
$pattern = '~\?([^\s.]*)~';
preg_match_all($pattern, $logs, $output);
Then trim off any quotes (e.g. in your last example):
$output = array_map(function($var) { return rtrim($var, '"'); }, $output[1]);
Giving you:
Array
(
[0] => e=somebase64string&l=somedifferentbase64string
[1] => e=somebase64string%3D&l=somedifferentbase64string
[2] =>
)
Example

Pass a string into an array PHP

I have list of strings like this:
'PYRAMID','htc_europe','htc_pyramid','pyramid','pyramid','HTC','1.11.401.110
CL68035
release-keys','htc_europe/pyramid/pyramid:4.0.3/IML74K/68035.110:user/release-keys'
It looks like elements of an array,
But when i use
<?php
$string = "'PYRAMID','htc_europe','htc_pyramid','pyramid','pyramid','HTC','1.11.401.110 CL68035 release-keys','htc_europe/pyramid/pyramid:4.0.3/IML74K/68035.110:user/release-keys'";
$arr = array($string);
print_r($arr);
?>
It doesnt work as I want:
Array ( [0] =>
'PYRAMID','htc_europe','htc_pyramid','pyramid','pyramid','HTC','1.11.401.110
CL68035
release-keys','htc_europe/pyramid/pyramid:4.0.3/IML74K/68035.110:user/release-keys')
Instead of:
Array ( [0] => PYRAMID, [1] => htc_europe, [2] => htc_pyramid,
...
I dont want to use explode() because my strings are already in array format and many strings have the ',' character.
Please help me, thanks.
Your string is not in an array format. From the way it looks and based on your comments, I would say that you have comma separated values, CSV. So the best way to parse that would be to use functions specifically made for that format like str_getcsv():
$str = "'PYRAMID','htc_europe','htc_pyramid','pyramid','pyramid','HTC','1.11.401.110 CL68035 release-keys','htc_europe/pyramid/pyramid:4.0.3/IML74K/68035.110:user/release-keys'";
// this will get you the result you are looking for
$arr = str_getcsv($str, ',', "'");
var_dump($arr);
The use of the second and third parameters ensures that it gets parsed correctly also when a string contains a comma.
$string is still a string, so you explode it if you want to make an array out of it.
If your problem is strings have the ',' character, use some other seperator, maybe |
$string = "'PYRAMID'|'htc_europe'|'htc_pyramid'|'pyramid'|'pyramid'|'HTC'|'1.11.401.110 CL68035 release-keys'|'htc_europe/pyramid/pyramid:4.0.3/IML74K/68035.110:user/release-keys'";
$arr = explode('|',$string);
print_r($arr);
<?php
$int = preg_match_all(
"/'(.+?)'/",
"'PYRAMID','htc_europe','htc_pyramid','pyramid','pyramid','HTC','1.11.401.110 CL68035 release-keys','htc_europe/pyramid/pyramid:4.0.3/IML74K/68035.110:user/release-keys'",
$matches);
print_r($matches[1]);
You can test it here http://micmap.org/php-by-example/en/function/preg_match_all
Due to the edits in the question, my answer is now out of date. I will leave it here because it contains a little explanation why in a particular case explode will be a valid solution.
as you can read in the manual online of php, there is a very precise syntax that can be used when creating an array, this is the reference:
http://php.net/manual/en/function.array.php
As you can see the correct way to use array() to create a new array is declaring each value separated by a comma or by declaring each pair index => value separated by a comma.
There is -no way- to pass a single string to that method (I see it something json like in javascript or java maybe, but this is Off Topic) simply because it won't parse it, the method will take the whole string as is and of course putting it into a single index (that in your case will be index zero).
I am telling you of course to use explode() or split() or to parse your string before, and what I told you before is the reason to my statement.
You probabily want to have each single model of phone in a string inside the array so you will have to remove the single quote first:
$stringReplaced = str_replace("'", "", $yourString);
And then you will have to split the string into an array using:
$array = explode(',',$yourString);
I hope you will take this in consideration
Of course as told by my collegue up there, you can treat this string as a comma separated value and use str_getcsv.
~Though you will need to remove the single quotes to have the pure string.~
(last statement is wrong because you can use the enclosure char param provided by str_getcsv)

Is this $string[1] valid PHP?

I have coded in PHP for over 10 years and this is the first time I have come across this (legacy code base):
$string = "123456789";
$new_number = $string[1]
$new_number is now 2.
Am I going crazy? I never knew you could add to a string with [x].
EDIT: As many have pointed out, strings in PHP are stored as an array of characters. However, I am right in thinking this is quite unique to PHP? Could you do this in a strongly typed language like C++/C#/Java?
No you don't got crazy. You can access any string within PHP as it would be an array and access each character individual. As you can see here in the manual:
Characters within strings may be accessed and modified by specifying the zero-based offset of the desired character after the string using square array brackets, as in $str[42]. Think of a string as an array of characters for this purpose. [...]
Because a string is effective an array of char's in PHP. This means that you can individually get the characters by accessing the right index. This isnt adding anything really, its just fetching the second character of the string which is '2'. Hope this helps.
Yep! You are not crazy. In PHP you can access the characters in a string by their index. It's treated a lot like an array. You cannot add to the string with $string[] as you would an array, but you can change the value to another character.
$string = "test";
$string[1] = "s";
echo $string; // Echoes 'tsst'

Regex issue: replace the first and the last $ character from all the elements of an array

I am trying to replace few words from a huge string with preg_replace()
using it like this :
preg_replace($match[0], $variable_value_array, $form_body)
Here $match[0] is an array with value in it in the form like:
$contacts-firstname$
$contacts-lastname$
$contacts-mobile$
$leads-leadstatus$
$leads-noofemployees$
and $variable_value_array is also an array with values in it like :
Linda
William
(091) 115-9385
Value Not Present
Value Not Present
and $form_body is a really long string.
The function is replacing the values of $form_body but instead of replacing the whole $contacts-firstname$ with Linda it is replacing only contacts-firstname with Linda making it like $Linda$. What should i do to replace both the $ sigh's as well ?
Thanks.
That's because $ is interpreted as the delimiter of your regex. You should use a simple str_replace instead. The signature is exactly the same:
str_replace($match[0], $variable_value_array, $form_body);
If you desperately wanted to use preg_replace you need to do two things. Firstly, you need to wrap every array element in explicit delimiters (/ is kind of the standard choice). And also you need to run every array element through preg_quote, otherwise the $ will be treated as an end-of-string anchors:
$patterns = array();
foreach($match[0] as $value)
$patterns[] = '/'.preg_quote($value, '/').'/';
And then use $patterns instead of $match[0]. But that is only intended as a nice-to-know if you ever actually need to use an array of literal search-strings inside a more complex pattern.
Try str_replace instead of preg_replace its faster and much appropriate for your use case

Categories