If I pass PHP variables with . in their names via $_GET PHP auto-replaces them with _ characters. For example:
<?php
echo "url is ".$_SERVER['REQUEST_URI']."<p>";
echo "x.y is ".$_GET['x.y'].".<p>";
echo "x_y is ".$_GET['x_y'].".<p>";
... outputs the following:
url is /SpShipTool/php/testGetUrl.php?x.y=a.b
x.y is .
x_y is a.b.
... my question is this: is there any way I can get this to stop? Cannot for the life of me figure out what I've done to deserve this
PHP version I'm running with is 5.2.4-2ubuntu5.3.
Here's PHP.net's explanation of why it does it:
Dots in incoming variable names
Typically, PHP does not alter the
names of variables when they are
passed into a script. However, it
should be noted that the dot (period,
full stop) is not a valid character in
a PHP variable name. For the reason,
look at it:
<?php
$varname.ext; /* invalid variable name */
?>
Now, what
the parser sees is a variable named
$varname, followed by the string
concatenation operator, followed by
the barestring (i.e. unquoted string
which doesn't match any known key or
reserved words) 'ext'. Obviously, this
doesn't have the intended result.
For this reason, it is important to
note that PHP will automatically
replace any dots in incoming variable
names with underscores.
That's from http://ca.php.net/variables.external.
Also, according to this comment these other characters are converted to underscores:
The full list of field-name characters that PHP converts to _ (underscore) is the following (not just dot):
chr(32) ( ) (space)
chr(46) (.) (dot)
chr(91) ([) (open square bracket)
chr(128) - chr(159) (various)
So it looks like you're stuck with it, so you'll have to convert the underscores back to dots in your script using dawnerd's suggestion (I'd just use str_replace though.)
Long-since answered question, but there is actually a better answer (or work-around). PHP lets you at the raw input stream, so you can do something like this:
$query_string = file_get_contents('php://input');
which will give you the $_POST array in query string format, periods as they should be.
You can then parse it if you need (as per POSTer's comment)
<?php
// Function to fix up PHP's messing up input containing dots, etc.
// `$source` can be either 'POST' or 'GET'
function getRealInput($source) {
$pairs = explode("&", $source == 'POST' ? file_get_contents("php://input") : $_SERVER['QUERY_STRING']);
$vars = array();
foreach ($pairs as $pair) {
$nv = explode("=", $pair);
$name = urldecode($nv[0]);
$value = urldecode($nv[1]);
$vars[$name] = $value;
}
return $vars;
}
// Wrapper functions specifically for GET and POST:
function getRealGET() { return getRealInput('GET'); }
function getRealPOST() { return getRealInput('POST'); }
?>
Hugely useful for OpenID parameters, which contain both '.' and '_', each with a certain meaning!
Highlighting an actual answer by Johan in a comment above - I just wrapped my entire post in a top-level array which completely bypasses the problem with no heavy processing required.
In the form you do
<input name="data[database.username]">
<input name="data[database.password]">
<input name="data[something.else.really.deep]">
instead of
<input name="database.username">
<input name="database.password">
<input name="something.else.really.deep">
and in the post handler, just unwrap it:
$posdata = $_POST['data'];
For me this was a two-line change, as my views were entirely templated.
FYI. I am using dots in my field names to edit trees of grouped data.
Do you want a solution that is standards compliant, and works with deep arrays (for example: ?param[2][5]=10) ?
To fix all possible sources of this problem, you can apply at the very top of your PHP code:
$_GET = fix( $_SERVER['QUERY_STRING'] );
$_POST = fix( file_get_contents('php://input') );
$_COOKIE = fix( $_SERVER['HTTP_COOKIE'] );
The working of this function is a neat idea that I came up during my summer vacation of 2013. Do not be discouraged by a simple regex, it just grabs all query names, encodes them (so dots are preserved), and then uses a normal parse_str() function.
function fix($source) {
$source = preg_replace_callback(
'/(^|(?<=&))[^=[&]+/',
function($key) { return bin2hex(urldecode($key[0])); },
$source
);
parse_str($source, $post);
$result = array();
foreach ($post as $key => $val) {
$result[hex2bin($key)] = $val;
}
return $result;
}
This happens because a period is an invalid character in a variable's name, the reason for which lies very deep in the implementation of PHP, so there are no easy fixes (yet).
In the meantime you can work around this issue by:
Accessing the raw query data via either php://input for POST data or $_SERVER['QUERY_STRING'] for GET data
Using a conversion function.
The below conversion function (PHP >= 5.4) encodes the names of each key-value pair into a hexadecimal representation and then performs a regular parse_str(); once done, it reverts the hexadecimal names back into their original form:
function parse_qs($data)
{
$data = preg_replace_callback('/(?:^|(?<=&))[^=[]+/', function($match) {
return bin2hex(urldecode($match[0]));
}, $data);
parse_str($data, $values);
return array_combine(array_map('hex2bin', array_keys($values)), $values);
}
// work with the raw query string
$data = parse_qs($_SERVER['QUERY_STRING']);
Or:
// handle posted data (this only works with application/x-www-form-urlencoded)
$data = parse_qs(file_get_contents('php://input'));
This approach is an altered version of Rok Kralj's, but with some tweaking to work, to improve efficiency (avoids unnecessary callbacks, encoding and decoding on unaffected keys) and to correctly handle array keys.
A gist with tests is available and any feedback or suggestions are welcome here or there.
public function fix(&$target, $source, $keep = false) {
if (!$source) {
return;
}
$keys = array();
$source = preg_replace_callback(
'/
# Match at start of string or &
(?:^|(?<=&))
# Exclude cases where the period is in brackets, e.g. foo[bar.blarg]
[^=&\[]*
# Affected cases: periods and spaces
(?:\.|%20)
# Keep matching until assignment, next variable, end of string or
# start of an array
[^=&\[]*
/x',
function ($key) use (&$keys) {
$keys[] = $key = base64_encode(urldecode($key[0]));
return urlencode($key);
},
$source
);
if (!$keep) {
$target = array();
}
parse_str($source, $data);
foreach ($data as $key => $val) {
// Only unprocess encoded keys
if (!in_array($key, $keys)) {
$target[$key] = $val;
continue;
}
$key = base64_decode($key);
$target[$key] = $val;
if ($keep) {
// Keep a copy in the underscore key version
$key = preg_replace('/(\.| )/', '_', $key);
$target[$key] = $val;
}
}
}
The reason this happens is because of PHP's old register_globals functionality. The . character is not a valid character in a variable name, so PHP coverts it to an underscore in order to make sure there's compatibility.
In short, it's not a good practice to do periods in URL variables.
If looking for any way to literally get PHP to stop replacing '.' characters in $_GET or $_POST arrays, then one such way is to modify PHP's source (and in this case it is relatively straightforward).
WARNING: Modifying PHP C source is an advanced option!
Also see this PHP bug report which suggests the same modification.
To explore you'll need to:
download PHP's C source code
disable the . replacement check
./configure, make and deploy your customized build of PHP
The source change itself is trivial and involves updating just one half of one line in main/php_variables.c:
....
/* ensure that we don't have spaces or dots in the variable name (not binary safe) */
for (p = var; *p; p++) {
if (*p == ' ' /*|| *p == '.'*/) {
*p='_';
....
Note: compared to original || *p == '.' has been commented-out
Example Output:
given a QUERY_STRING of a.a[]=bb&a.a[]=BB&c%20c=dd,
running <?php print_r($_GET); now produces:
Array
(
[a.a] => Array
(
[0] => bb
[1] => BB
)
[c_c] => dd
)
Notes:
this patch addresses the original question only (it stops replacement of dots, not spaces).
running on this patch will be faster than script-level solutions, but those pure-.php answers are still generally-preferable (because they avoid changing PHP itself).
in theory a polyfill approach is possible here and could combine approaches -- test for the C-level change using parse_str() and (if unavailable) fall-back to slower methods.
My solution to this problem was quick and dirty, but I still like it. I simply wanted to post a list of filenames that were checked on the form. I used base64_encode to encode the filenames within the markup and then just decoded it with base64_decode prior to using them.
After looking at Rok's solution I have come up with a version which addresses the limitations in my answer below, crb's above and Rok's solution as well. See a my improved version.
#crb's answer above is a good start, but there are a couple of problems.
It reprocesses everything, which is overkill; only those fields that have a "." in the name need to be reprocessed.
It fails to handle arrays in the same way that native PHP processing does, e.g. for keys like "foo.bar[]".
The solution below addresses both of these problems now (note that it has been updated since originally posted). This is about 50% faster than my answer above in my testing, but will not handle situations where the data has the same key (or a key which gets extracted the same, e.g. foo.bar and foo_bar are both extracted as foo_bar).
<?php
public function fix2(&$target, $source, $keep = false) {
if (!$source) {
return;
}
preg_match_all(
'/
# Match at start of string or &
(?:^|(?<=&))
# Exclude cases where the period is in brackets, e.g. foo[bar.blarg]
[^=&\[]*
# Affected cases: periods and spaces
(?:\.|%20)
# Keep matching until assignment, next variable, end of string or
# start of an array
[^=&\[]*
/x',
$source,
$matches
);
foreach (current($matches) as $key) {
$key = urldecode($key);
$badKey = preg_replace('/(\.| )/', '_', $key);
if (isset($target[$badKey])) {
// Duplicate values may have already unset this
$target[$key] = $target[$badKey];
if (!$keep) {
unset($target[$badKey]);
}
}
}
}
Well, the function I include below, "getRealPostArray()", isn't a pretty solution, but it handles arrays and supports both names: "alpha_beta" and "alpha.beta":
<input type='text' value='First-.' name='alpha.beta[a.b][]' /><br>
<input type='text' value='Second-.' name='alpha.beta[a.b][]' /><br>
<input type='text' value='First-_' name='alpha_beta[a.b][]' /><br>
<input type='text' value='Second-_' name='alpha_beta[a.b][]' /><br>
whereas var_dump($_POST) produces:
'alpha_beta' =>
array (size=1)
'a.b' =>
array (size=4)
0 => string 'First-.' (length=7)
1 => string 'Second-.' (length=8)
2 => string 'First-_' (length=7)
3 => string 'Second-_' (length=8)
var_dump( getRealPostArray()) produces:
'alpha.beta' =>
array (size=1)
'a.b' =>
array (size=2)
0 => string 'First-.' (length=7)
1 => string 'Second-.' (length=8)
'alpha_beta' =>
array (size=1)
'a.b' =>
array (size=2)
0 => string 'First-_' (length=7)
1 => string 'Second-_' (length=8)
The function, for what it's worth:
function getRealPostArray() {
if ($_SERVER['REQUEST_METHOD'] !== 'POST') {#Nothing to do
return null;
}
$neverANamePart = '~#~'; #Any arbitrary string never expected in a 'name'
$postdata = file_get_contents("php://input");
$post = [];
$rebuiltpairs = [];
$postraws = explode('&', $postdata);
foreach ($postraws as $postraw) { #Each is a string like: 'xxxx=yyyy'
$keyvalpair = explode('=',$postraw);
if (empty($keyvalpair[1])) {
$keyvalpair[1] = '';
}
$pos = strpos($keyvalpair[0],'%5B');
if ($pos !== false) {
$str1 = substr($keyvalpair[0], 0, $pos);
$str2 = substr($keyvalpair[0], $pos);
$str1 = str_replace('.',$neverANamePart,$str1);
$keyvalpair[0] = $str1.$str2;
} else {
$keyvalpair[0] = str_replace('.',$neverANamePart,$keyvalpair[0]);
}
$rebuiltpair = implode('=',$keyvalpair);
$rebuiltpairs[]=$rebuiltpair;
}
$rebuiltpostdata = implode('&',$rebuiltpairs);
parse_str($rebuiltpostdata, $post);
$fixedpost = [];
foreach ($post as $key => $val) {
$fixedpost[str_replace($neverANamePart,'.',$key)] = $val;
}
return $fixedpost;
}
Using crb's I wanted to recreate the $_POST array as a whole though keep in mind you'll still have to ensure you're encoding and decoding correctly both at the client and the server. It's important to understand when a character is truly invalid and it is truly valid. Additionally people should still and always escape client data before using it with any database command without exception.
<?php
unset($_POST);
$_POST = array();
$p0 = explode('&',file_get_contents('php://input'));
foreach ($p0 as $key => $value)
{
$p1 = explode('=',$value);
$_POST[$p1[0]] = $p1[1];
//OR...
//$_POST[urldecode($p1[0])] = urldecode($p1[1]);
}
print_r($_POST);
?>
I recommend using this only for individual cases only, offhand I'm not sure about the negative points of putting this at the top of your primary header file.
My current solution (based on prev topic replies):
function parseQueryString($data)
{
$data = rawurldecode($data);
$pattern = '/(?:^|(?<=&))[^=&\[]*[^=&\[]*/';
$data = preg_replace_callback($pattern, function ($match){
return bin2hex(urldecode($match[0]));
}, $data);
parse_str($data, $values);
return array_combine(array_map('hex2bin', array_keys($values)), $values);
}
$_GET = parseQueryString($_SERVER['QUERY_STRING']);
Related
If I pass PHP variables with . in their names via $_GET PHP auto-replaces them with _ characters. For example:
<?php
echo "url is ".$_SERVER['REQUEST_URI']."<p>";
echo "x.y is ".$_GET['x.y'].".<p>";
echo "x_y is ".$_GET['x_y'].".<p>";
... outputs the following:
url is /SpShipTool/php/testGetUrl.php?x.y=a.b
x.y is .
x_y is a.b.
... my question is this: is there any way I can get this to stop? Cannot for the life of me figure out what I've done to deserve this
PHP version I'm running with is 5.2.4-2ubuntu5.3.
Here's PHP.net's explanation of why it does it:
Dots in incoming variable names
Typically, PHP does not alter the
names of variables when they are
passed into a script. However, it
should be noted that the dot (period,
full stop) is not a valid character in
a PHP variable name. For the reason,
look at it:
<?php
$varname.ext; /* invalid variable name */
?>
Now, what
the parser sees is a variable named
$varname, followed by the string
concatenation operator, followed by
the barestring (i.e. unquoted string
which doesn't match any known key or
reserved words) 'ext'. Obviously, this
doesn't have the intended result.
For this reason, it is important to
note that PHP will automatically
replace any dots in incoming variable
names with underscores.
That's from http://ca.php.net/variables.external.
Also, according to this comment these other characters are converted to underscores:
The full list of field-name characters that PHP converts to _ (underscore) is the following (not just dot):
chr(32) ( ) (space)
chr(46) (.) (dot)
chr(91) ([) (open square bracket)
chr(128) - chr(159) (various)
So it looks like you're stuck with it, so you'll have to convert the underscores back to dots in your script using dawnerd's suggestion (I'd just use str_replace though.)
Long-since answered question, but there is actually a better answer (or work-around). PHP lets you at the raw input stream, so you can do something like this:
$query_string = file_get_contents('php://input');
which will give you the $_POST array in query string format, periods as they should be.
You can then parse it if you need (as per POSTer's comment)
<?php
// Function to fix up PHP's messing up input containing dots, etc.
// `$source` can be either 'POST' or 'GET'
function getRealInput($source) {
$pairs = explode("&", $source == 'POST' ? file_get_contents("php://input") : $_SERVER['QUERY_STRING']);
$vars = array();
foreach ($pairs as $pair) {
$nv = explode("=", $pair);
$name = urldecode($nv[0]);
$value = urldecode($nv[1]);
$vars[$name] = $value;
}
return $vars;
}
// Wrapper functions specifically for GET and POST:
function getRealGET() { return getRealInput('GET'); }
function getRealPOST() { return getRealInput('POST'); }
?>
Hugely useful for OpenID parameters, which contain both '.' and '_', each with a certain meaning!
Highlighting an actual answer by Johan in a comment above - I just wrapped my entire post in a top-level array which completely bypasses the problem with no heavy processing required.
In the form you do
<input name="data[database.username]">
<input name="data[database.password]">
<input name="data[something.else.really.deep]">
instead of
<input name="database.username">
<input name="database.password">
<input name="something.else.really.deep">
and in the post handler, just unwrap it:
$posdata = $_POST['data'];
For me this was a two-line change, as my views were entirely templated.
FYI. I am using dots in my field names to edit trees of grouped data.
Do you want a solution that is standards compliant, and works with deep arrays (for example: ?param[2][5]=10) ?
To fix all possible sources of this problem, you can apply at the very top of your PHP code:
$_GET = fix( $_SERVER['QUERY_STRING'] );
$_POST = fix( file_get_contents('php://input') );
$_COOKIE = fix( $_SERVER['HTTP_COOKIE'] );
The working of this function is a neat idea that I came up during my summer vacation of 2013. Do not be discouraged by a simple regex, it just grabs all query names, encodes them (so dots are preserved), and then uses a normal parse_str() function.
function fix($source) {
$source = preg_replace_callback(
'/(^|(?<=&))[^=[&]+/',
function($key) { return bin2hex(urldecode($key[0])); },
$source
);
parse_str($source, $post);
$result = array();
foreach ($post as $key => $val) {
$result[hex2bin($key)] = $val;
}
return $result;
}
This happens because a period is an invalid character in a variable's name, the reason for which lies very deep in the implementation of PHP, so there are no easy fixes (yet).
In the meantime you can work around this issue by:
Accessing the raw query data via either php://input for POST data or $_SERVER['QUERY_STRING'] for GET data
Using a conversion function.
The below conversion function (PHP >= 5.4) encodes the names of each key-value pair into a hexadecimal representation and then performs a regular parse_str(); once done, it reverts the hexadecimal names back into their original form:
function parse_qs($data)
{
$data = preg_replace_callback('/(?:^|(?<=&))[^=[]+/', function($match) {
return bin2hex(urldecode($match[0]));
}, $data);
parse_str($data, $values);
return array_combine(array_map('hex2bin', array_keys($values)), $values);
}
// work with the raw query string
$data = parse_qs($_SERVER['QUERY_STRING']);
Or:
// handle posted data (this only works with application/x-www-form-urlencoded)
$data = parse_qs(file_get_contents('php://input'));
This approach is an altered version of Rok Kralj's, but with some tweaking to work, to improve efficiency (avoids unnecessary callbacks, encoding and decoding on unaffected keys) and to correctly handle array keys.
A gist with tests is available and any feedback or suggestions are welcome here or there.
public function fix(&$target, $source, $keep = false) {
if (!$source) {
return;
}
$keys = array();
$source = preg_replace_callback(
'/
# Match at start of string or &
(?:^|(?<=&))
# Exclude cases where the period is in brackets, e.g. foo[bar.blarg]
[^=&\[]*
# Affected cases: periods and spaces
(?:\.|%20)
# Keep matching until assignment, next variable, end of string or
# start of an array
[^=&\[]*
/x',
function ($key) use (&$keys) {
$keys[] = $key = base64_encode(urldecode($key[0]));
return urlencode($key);
},
$source
);
if (!$keep) {
$target = array();
}
parse_str($source, $data);
foreach ($data as $key => $val) {
// Only unprocess encoded keys
if (!in_array($key, $keys)) {
$target[$key] = $val;
continue;
}
$key = base64_decode($key);
$target[$key] = $val;
if ($keep) {
// Keep a copy in the underscore key version
$key = preg_replace('/(\.| )/', '_', $key);
$target[$key] = $val;
}
}
}
The reason this happens is because of PHP's old register_globals functionality. The . character is not a valid character in a variable name, so PHP coverts it to an underscore in order to make sure there's compatibility.
In short, it's not a good practice to do periods in URL variables.
If looking for any way to literally get PHP to stop replacing '.' characters in $_GET or $_POST arrays, then one such way is to modify PHP's source (and in this case it is relatively straightforward).
WARNING: Modifying PHP C source is an advanced option!
Also see this PHP bug report which suggests the same modification.
To explore you'll need to:
download PHP's C source code
disable the . replacement check
./configure, make and deploy your customized build of PHP
The source change itself is trivial and involves updating just one half of one line in main/php_variables.c:
....
/* ensure that we don't have spaces or dots in the variable name (not binary safe) */
for (p = var; *p; p++) {
if (*p == ' ' /*|| *p == '.'*/) {
*p='_';
....
Note: compared to original || *p == '.' has been commented-out
Example Output:
given a QUERY_STRING of a.a[]=bb&a.a[]=BB&c%20c=dd,
running <?php print_r($_GET); now produces:
Array
(
[a.a] => Array
(
[0] => bb
[1] => BB
)
[c_c] => dd
)
Notes:
this patch addresses the original question only (it stops replacement of dots, not spaces).
running on this patch will be faster than script-level solutions, but those pure-.php answers are still generally-preferable (because they avoid changing PHP itself).
in theory a polyfill approach is possible here and could combine approaches -- test for the C-level change using parse_str() and (if unavailable) fall-back to slower methods.
My solution to this problem was quick and dirty, but I still like it. I simply wanted to post a list of filenames that were checked on the form. I used base64_encode to encode the filenames within the markup and then just decoded it with base64_decode prior to using them.
After looking at Rok's solution I have come up with a version which addresses the limitations in my answer below, crb's above and Rok's solution as well. See a my improved version.
#crb's answer above is a good start, but there are a couple of problems.
It reprocesses everything, which is overkill; only those fields that have a "." in the name need to be reprocessed.
It fails to handle arrays in the same way that native PHP processing does, e.g. for keys like "foo.bar[]".
The solution below addresses both of these problems now (note that it has been updated since originally posted). This is about 50% faster than my answer above in my testing, but will not handle situations where the data has the same key (or a key which gets extracted the same, e.g. foo.bar and foo_bar are both extracted as foo_bar).
<?php
public function fix2(&$target, $source, $keep = false) {
if (!$source) {
return;
}
preg_match_all(
'/
# Match at start of string or &
(?:^|(?<=&))
# Exclude cases where the period is in brackets, e.g. foo[bar.blarg]
[^=&\[]*
# Affected cases: periods and spaces
(?:\.|%20)
# Keep matching until assignment, next variable, end of string or
# start of an array
[^=&\[]*
/x',
$source,
$matches
);
foreach (current($matches) as $key) {
$key = urldecode($key);
$badKey = preg_replace('/(\.| )/', '_', $key);
if (isset($target[$badKey])) {
// Duplicate values may have already unset this
$target[$key] = $target[$badKey];
if (!$keep) {
unset($target[$badKey]);
}
}
}
}
Well, the function I include below, "getRealPostArray()", isn't a pretty solution, but it handles arrays and supports both names: "alpha_beta" and "alpha.beta":
<input type='text' value='First-.' name='alpha.beta[a.b][]' /><br>
<input type='text' value='Second-.' name='alpha.beta[a.b][]' /><br>
<input type='text' value='First-_' name='alpha_beta[a.b][]' /><br>
<input type='text' value='Second-_' name='alpha_beta[a.b][]' /><br>
whereas var_dump($_POST) produces:
'alpha_beta' =>
array (size=1)
'a.b' =>
array (size=4)
0 => string 'First-.' (length=7)
1 => string 'Second-.' (length=8)
2 => string 'First-_' (length=7)
3 => string 'Second-_' (length=8)
var_dump( getRealPostArray()) produces:
'alpha.beta' =>
array (size=1)
'a.b' =>
array (size=2)
0 => string 'First-.' (length=7)
1 => string 'Second-.' (length=8)
'alpha_beta' =>
array (size=1)
'a.b' =>
array (size=2)
0 => string 'First-_' (length=7)
1 => string 'Second-_' (length=8)
The function, for what it's worth:
function getRealPostArray() {
if ($_SERVER['REQUEST_METHOD'] !== 'POST') {#Nothing to do
return null;
}
$neverANamePart = '~#~'; #Any arbitrary string never expected in a 'name'
$postdata = file_get_contents("php://input");
$post = [];
$rebuiltpairs = [];
$postraws = explode('&', $postdata);
foreach ($postraws as $postraw) { #Each is a string like: 'xxxx=yyyy'
$keyvalpair = explode('=',$postraw);
if (empty($keyvalpair[1])) {
$keyvalpair[1] = '';
}
$pos = strpos($keyvalpair[0],'%5B');
if ($pos !== false) {
$str1 = substr($keyvalpair[0], 0, $pos);
$str2 = substr($keyvalpair[0], $pos);
$str1 = str_replace('.',$neverANamePart,$str1);
$keyvalpair[0] = $str1.$str2;
} else {
$keyvalpair[0] = str_replace('.',$neverANamePart,$keyvalpair[0]);
}
$rebuiltpair = implode('=',$keyvalpair);
$rebuiltpairs[]=$rebuiltpair;
}
$rebuiltpostdata = implode('&',$rebuiltpairs);
parse_str($rebuiltpostdata, $post);
$fixedpost = [];
foreach ($post as $key => $val) {
$fixedpost[str_replace($neverANamePart,'.',$key)] = $val;
}
return $fixedpost;
}
Using crb's I wanted to recreate the $_POST array as a whole though keep in mind you'll still have to ensure you're encoding and decoding correctly both at the client and the server. It's important to understand when a character is truly invalid and it is truly valid. Additionally people should still and always escape client data before using it with any database command without exception.
<?php
unset($_POST);
$_POST = array();
$p0 = explode('&',file_get_contents('php://input'));
foreach ($p0 as $key => $value)
{
$p1 = explode('=',$value);
$_POST[$p1[0]] = $p1[1];
//OR...
//$_POST[urldecode($p1[0])] = urldecode($p1[1]);
}
print_r($_POST);
?>
I recommend using this only for individual cases only, offhand I'm not sure about the negative points of putting this at the top of your primary header file.
My current solution (based on prev topic replies):
function parseQueryString($data)
{
$data = rawurldecode($data);
$pattern = '/(?:^|(?<=&))[^=&\[]*[^=&\[]*/';
$data = preg_replace_callback($pattern, function ($match){
return bin2hex(urldecode($match[0]));
}, $data);
parse_str($data, $values);
return array_combine(array_map('hex2bin', array_keys($values)), $values);
}
$_GET = parseQueryString($_SERVER['QUERY_STRING']);
If I pass PHP variables with . in their names via $_GET PHP auto-replaces them with _ characters. For example:
<?php
echo "url is ".$_SERVER['REQUEST_URI']."<p>";
echo "x.y is ".$_GET['x.y'].".<p>";
echo "x_y is ".$_GET['x_y'].".<p>";
... outputs the following:
url is /SpShipTool/php/testGetUrl.php?x.y=a.b
x.y is .
x_y is a.b.
... my question is this: is there any way I can get this to stop? Cannot for the life of me figure out what I've done to deserve this
PHP version I'm running with is 5.2.4-2ubuntu5.3.
Here's PHP.net's explanation of why it does it:
Dots in incoming variable names
Typically, PHP does not alter the
names of variables when they are
passed into a script. However, it
should be noted that the dot (period,
full stop) is not a valid character in
a PHP variable name. For the reason,
look at it:
<?php
$varname.ext; /* invalid variable name */
?>
Now, what
the parser sees is a variable named
$varname, followed by the string
concatenation operator, followed by
the barestring (i.e. unquoted string
which doesn't match any known key or
reserved words) 'ext'. Obviously, this
doesn't have the intended result.
For this reason, it is important to
note that PHP will automatically
replace any dots in incoming variable
names with underscores.
That's from http://ca.php.net/variables.external.
Also, according to this comment these other characters are converted to underscores:
The full list of field-name characters that PHP converts to _ (underscore) is the following (not just dot):
chr(32) ( ) (space)
chr(46) (.) (dot)
chr(91) ([) (open square bracket)
chr(128) - chr(159) (various)
So it looks like you're stuck with it, so you'll have to convert the underscores back to dots in your script using dawnerd's suggestion (I'd just use str_replace though.)
Long-since answered question, but there is actually a better answer (or work-around). PHP lets you at the raw input stream, so you can do something like this:
$query_string = file_get_contents('php://input');
which will give you the $_POST array in query string format, periods as they should be.
You can then parse it if you need (as per POSTer's comment)
<?php
// Function to fix up PHP's messing up input containing dots, etc.
// `$source` can be either 'POST' or 'GET'
function getRealInput($source) {
$pairs = explode("&", $source == 'POST' ? file_get_contents("php://input") : $_SERVER['QUERY_STRING']);
$vars = array();
foreach ($pairs as $pair) {
$nv = explode("=", $pair);
$name = urldecode($nv[0]);
$value = urldecode($nv[1]);
$vars[$name] = $value;
}
return $vars;
}
// Wrapper functions specifically for GET and POST:
function getRealGET() { return getRealInput('GET'); }
function getRealPOST() { return getRealInput('POST'); }
?>
Hugely useful for OpenID parameters, which contain both '.' and '_', each with a certain meaning!
Highlighting an actual answer by Johan in a comment above - I just wrapped my entire post in a top-level array which completely bypasses the problem with no heavy processing required.
In the form you do
<input name="data[database.username]">
<input name="data[database.password]">
<input name="data[something.else.really.deep]">
instead of
<input name="database.username">
<input name="database.password">
<input name="something.else.really.deep">
and in the post handler, just unwrap it:
$posdata = $_POST['data'];
For me this was a two-line change, as my views were entirely templated.
FYI. I am using dots in my field names to edit trees of grouped data.
Do you want a solution that is standards compliant, and works with deep arrays (for example: ?param[2][5]=10) ?
To fix all possible sources of this problem, you can apply at the very top of your PHP code:
$_GET = fix( $_SERVER['QUERY_STRING'] );
$_POST = fix( file_get_contents('php://input') );
$_COOKIE = fix( $_SERVER['HTTP_COOKIE'] );
The working of this function is a neat idea that I came up during my summer vacation of 2013. Do not be discouraged by a simple regex, it just grabs all query names, encodes them (so dots are preserved), and then uses a normal parse_str() function.
function fix($source) {
$source = preg_replace_callback(
'/(^|(?<=&))[^=[&]+/',
function($key) { return bin2hex(urldecode($key[0])); },
$source
);
parse_str($source, $post);
$result = array();
foreach ($post as $key => $val) {
$result[hex2bin($key)] = $val;
}
return $result;
}
This happens because a period is an invalid character in a variable's name, the reason for which lies very deep in the implementation of PHP, so there are no easy fixes (yet).
In the meantime you can work around this issue by:
Accessing the raw query data via either php://input for POST data or $_SERVER['QUERY_STRING'] for GET data
Using a conversion function.
The below conversion function (PHP >= 5.4) encodes the names of each key-value pair into a hexadecimal representation and then performs a regular parse_str(); once done, it reverts the hexadecimal names back into their original form:
function parse_qs($data)
{
$data = preg_replace_callback('/(?:^|(?<=&))[^=[]+/', function($match) {
return bin2hex(urldecode($match[0]));
}, $data);
parse_str($data, $values);
return array_combine(array_map('hex2bin', array_keys($values)), $values);
}
// work with the raw query string
$data = parse_qs($_SERVER['QUERY_STRING']);
Or:
// handle posted data (this only works with application/x-www-form-urlencoded)
$data = parse_qs(file_get_contents('php://input'));
This approach is an altered version of Rok Kralj's, but with some tweaking to work, to improve efficiency (avoids unnecessary callbacks, encoding and decoding on unaffected keys) and to correctly handle array keys.
A gist with tests is available and any feedback or suggestions are welcome here or there.
public function fix(&$target, $source, $keep = false) {
if (!$source) {
return;
}
$keys = array();
$source = preg_replace_callback(
'/
# Match at start of string or &
(?:^|(?<=&))
# Exclude cases where the period is in brackets, e.g. foo[bar.blarg]
[^=&\[]*
# Affected cases: periods and spaces
(?:\.|%20)
# Keep matching until assignment, next variable, end of string or
# start of an array
[^=&\[]*
/x',
function ($key) use (&$keys) {
$keys[] = $key = base64_encode(urldecode($key[0]));
return urlencode($key);
},
$source
);
if (!$keep) {
$target = array();
}
parse_str($source, $data);
foreach ($data as $key => $val) {
// Only unprocess encoded keys
if (!in_array($key, $keys)) {
$target[$key] = $val;
continue;
}
$key = base64_decode($key);
$target[$key] = $val;
if ($keep) {
// Keep a copy in the underscore key version
$key = preg_replace('/(\.| )/', '_', $key);
$target[$key] = $val;
}
}
}
The reason this happens is because of PHP's old register_globals functionality. The . character is not a valid character in a variable name, so PHP coverts it to an underscore in order to make sure there's compatibility.
In short, it's not a good practice to do periods in URL variables.
If looking for any way to literally get PHP to stop replacing '.' characters in $_GET or $_POST arrays, then one such way is to modify PHP's source (and in this case it is relatively straightforward).
WARNING: Modifying PHP C source is an advanced option!
Also see this PHP bug report which suggests the same modification.
To explore you'll need to:
download PHP's C source code
disable the . replacement check
./configure, make and deploy your customized build of PHP
The source change itself is trivial and involves updating just one half of one line in main/php_variables.c:
....
/* ensure that we don't have spaces or dots in the variable name (not binary safe) */
for (p = var; *p; p++) {
if (*p == ' ' /*|| *p == '.'*/) {
*p='_';
....
Note: compared to original || *p == '.' has been commented-out
Example Output:
given a QUERY_STRING of a.a[]=bb&a.a[]=BB&c%20c=dd,
running <?php print_r($_GET); now produces:
Array
(
[a.a] => Array
(
[0] => bb
[1] => BB
)
[c_c] => dd
)
Notes:
this patch addresses the original question only (it stops replacement of dots, not spaces).
running on this patch will be faster than script-level solutions, but those pure-.php answers are still generally-preferable (because they avoid changing PHP itself).
in theory a polyfill approach is possible here and could combine approaches -- test for the C-level change using parse_str() and (if unavailable) fall-back to slower methods.
My solution to this problem was quick and dirty, but I still like it. I simply wanted to post a list of filenames that were checked on the form. I used base64_encode to encode the filenames within the markup and then just decoded it with base64_decode prior to using them.
After looking at Rok's solution I have come up with a version which addresses the limitations in my answer below, crb's above and Rok's solution as well. See a my improved version.
#crb's answer above is a good start, but there are a couple of problems.
It reprocesses everything, which is overkill; only those fields that have a "." in the name need to be reprocessed.
It fails to handle arrays in the same way that native PHP processing does, e.g. for keys like "foo.bar[]".
The solution below addresses both of these problems now (note that it has been updated since originally posted). This is about 50% faster than my answer above in my testing, but will not handle situations where the data has the same key (or a key which gets extracted the same, e.g. foo.bar and foo_bar are both extracted as foo_bar).
<?php
public function fix2(&$target, $source, $keep = false) {
if (!$source) {
return;
}
preg_match_all(
'/
# Match at start of string or &
(?:^|(?<=&))
# Exclude cases where the period is in brackets, e.g. foo[bar.blarg]
[^=&\[]*
# Affected cases: periods and spaces
(?:\.|%20)
# Keep matching until assignment, next variable, end of string or
# start of an array
[^=&\[]*
/x',
$source,
$matches
);
foreach (current($matches) as $key) {
$key = urldecode($key);
$badKey = preg_replace('/(\.| )/', '_', $key);
if (isset($target[$badKey])) {
// Duplicate values may have already unset this
$target[$key] = $target[$badKey];
if (!$keep) {
unset($target[$badKey]);
}
}
}
}
Well, the function I include below, "getRealPostArray()", isn't a pretty solution, but it handles arrays and supports both names: "alpha_beta" and "alpha.beta":
<input type='text' value='First-.' name='alpha.beta[a.b][]' /><br>
<input type='text' value='Second-.' name='alpha.beta[a.b][]' /><br>
<input type='text' value='First-_' name='alpha_beta[a.b][]' /><br>
<input type='text' value='Second-_' name='alpha_beta[a.b][]' /><br>
whereas var_dump($_POST) produces:
'alpha_beta' =>
array (size=1)
'a.b' =>
array (size=4)
0 => string 'First-.' (length=7)
1 => string 'Second-.' (length=8)
2 => string 'First-_' (length=7)
3 => string 'Second-_' (length=8)
var_dump( getRealPostArray()) produces:
'alpha.beta' =>
array (size=1)
'a.b' =>
array (size=2)
0 => string 'First-.' (length=7)
1 => string 'Second-.' (length=8)
'alpha_beta' =>
array (size=1)
'a.b' =>
array (size=2)
0 => string 'First-_' (length=7)
1 => string 'Second-_' (length=8)
The function, for what it's worth:
function getRealPostArray() {
if ($_SERVER['REQUEST_METHOD'] !== 'POST') {#Nothing to do
return null;
}
$neverANamePart = '~#~'; #Any arbitrary string never expected in a 'name'
$postdata = file_get_contents("php://input");
$post = [];
$rebuiltpairs = [];
$postraws = explode('&', $postdata);
foreach ($postraws as $postraw) { #Each is a string like: 'xxxx=yyyy'
$keyvalpair = explode('=',$postraw);
if (empty($keyvalpair[1])) {
$keyvalpair[1] = '';
}
$pos = strpos($keyvalpair[0],'%5B');
if ($pos !== false) {
$str1 = substr($keyvalpair[0], 0, $pos);
$str2 = substr($keyvalpair[0], $pos);
$str1 = str_replace('.',$neverANamePart,$str1);
$keyvalpair[0] = $str1.$str2;
} else {
$keyvalpair[0] = str_replace('.',$neverANamePart,$keyvalpair[0]);
}
$rebuiltpair = implode('=',$keyvalpair);
$rebuiltpairs[]=$rebuiltpair;
}
$rebuiltpostdata = implode('&',$rebuiltpairs);
parse_str($rebuiltpostdata, $post);
$fixedpost = [];
foreach ($post as $key => $val) {
$fixedpost[str_replace($neverANamePart,'.',$key)] = $val;
}
return $fixedpost;
}
Using crb's I wanted to recreate the $_POST array as a whole though keep in mind you'll still have to ensure you're encoding and decoding correctly both at the client and the server. It's important to understand when a character is truly invalid and it is truly valid. Additionally people should still and always escape client data before using it with any database command without exception.
<?php
unset($_POST);
$_POST = array();
$p0 = explode('&',file_get_contents('php://input'));
foreach ($p0 as $key => $value)
{
$p1 = explode('=',$value);
$_POST[$p1[0]] = $p1[1];
//OR...
//$_POST[urldecode($p1[0])] = urldecode($p1[1]);
}
print_r($_POST);
?>
I recommend using this only for individual cases only, offhand I'm not sure about the negative points of putting this at the top of your primary header file.
My current solution (based on prev topic replies):
function parseQueryString($data)
{
$data = rawurldecode($data);
$pattern = '/(?:^|(?<=&))[^=&\[]*[^=&\[]*/';
$data = preg_replace_callback($pattern, function ($match){
return bin2hex(urldecode($match[0]));
}, $data);
parse_str($data, $values);
return array_combine(array_map('hex2bin', array_keys($values)), $values);
}
$_GET = parseQueryString($_SERVER['QUERY_STRING']);
I am getting an "Array to string conversion error on PHP";
I am using the "variable" (that should be a string) as the third parameter to str_replace. So in summary (very simplified version of whats going on):
$str = "very long string";
str_replace("tag", $some_other_array, $str);
$str is throwing the error, and I have been trying to fix it all day, the thing I have tried is:
if(is_array($str)) die("its somehow an array");
serialize($str); //inserted this before str_replace call.
I have spent all day on it, and no its not something stupid like variables around the wrong way - it is something bizarre. I have even dumped it to a file and its a string.
My hypothesis:
The string is too long and php can't deal with it, turns into an array.
The $str value in this case is nested and called recursively, the general flow could be explained like this:
--code
//pass by reference
function the_function ($something, &$OFFENDING_VAR, $something_else) {
while(preg_match($something, $OFFENDING_VAR)) {
$OFFENDING_VAR = str_replace($x, y, $OFFENDING_VAR); // this is the error
}
}
So it may be something strange due to str_replace, but that would mean that at some point str_replace would have to return an array.
Please help me work this out, its very confusing and I have wasted a day on it.
---- ORIGINAL FUNCTION CODE -----
//This function gets called with multiple different "Target Variables" Target is the subject
//line, from and body of the email filled with << tags >> so the str_replace function knows
//where to replace them
function perform_replacements($replacements, &$target, $clean = TRUE,
$start_tag = '<<', $end_tag = '>>', $max_substitutions = 5) {
# Construct separate tag and replacement value arrays for use in the substitution loop.
$tags = array();
$replacement_values = array();
foreach ($replacements as $tag_text => $replacement_value) {
$tags[] = $start_tag . $tag_text . $end_tag;
$replacement_values[] = $replacement_value;
}
# TODO: this badly needs refactoring
# TODO: auto upgrade <<foo>> to <<foo_html>> if foo_html exists and acting on html template
# Construct a regular expression for use in scanning for tags.
$tag_match = '/' . preg_quote($start_tag) . '\w+' . preg_quote($end_tag) . '/';
# Perform the substitution until all valid tags are replaced, or the maximum substitutions
# limit is reached.
$substitution_count = 0;
while (preg_match ($tag_match, $target) && ($substitution_count++ < $max_substitutions)) {
$target = serialize($target);
$temp = str_replace($tags,
$replacement_values,
$target); //This is the line that is failing.
unset($target);
$target = $temp;
}
if ($clean) {
# Clean up any unused search values.
$target = preg_replace($tag_match, '', $target);
}
}
How do you know $str is the problem and not $some_other_array?
From the manual:
If search and replace are arrays, then str_replace() takes a value
from each array and uses them to search and replace on subject. If
replace has fewer values than search, then an empty string is used for
the rest of replacement values. If search is an array and replace is a
string, then this replacement string is used for every value of
search. The converse would not make sense, though.
The second parameter can only be an array if the first one is as well.
If I pass PHP variables with . in their names via $_GET PHP auto-replaces them with _ characters. For example:
<?php
echo "url is ".$_SERVER['REQUEST_URI']."<p>";
echo "x.y is ".$_GET['x.y'].".<p>";
echo "x_y is ".$_GET['x_y'].".<p>";
... outputs the following:
url is /SpShipTool/php/testGetUrl.php?x.y=a.b
x.y is .
x_y is a.b.
... my question is this: is there any way I can get this to stop? Cannot for the life of me figure out what I've done to deserve this
PHP version I'm running with is 5.2.4-2ubuntu5.3.
Here's PHP.net's explanation of why it does it:
Dots in incoming variable names
Typically, PHP does not alter the
names of variables when they are
passed into a script. However, it
should be noted that the dot (period,
full stop) is not a valid character in
a PHP variable name. For the reason,
look at it:
<?php
$varname.ext; /* invalid variable name */
?>
Now, what
the parser sees is a variable named
$varname, followed by the string
concatenation operator, followed by
the barestring (i.e. unquoted string
which doesn't match any known key or
reserved words) 'ext'. Obviously, this
doesn't have the intended result.
For this reason, it is important to
note that PHP will automatically
replace any dots in incoming variable
names with underscores.
That's from http://ca.php.net/variables.external.
Also, according to this comment these other characters are converted to underscores:
The full list of field-name characters that PHP converts to _ (underscore) is the following (not just dot):
chr(32) ( ) (space)
chr(46) (.) (dot)
chr(91) ([) (open square bracket)
chr(128) - chr(159) (various)
So it looks like you're stuck with it, so you'll have to convert the underscores back to dots in your script using dawnerd's suggestion (I'd just use str_replace though.)
Long-since answered question, but there is actually a better answer (or work-around). PHP lets you at the raw input stream, so you can do something like this:
$query_string = file_get_contents('php://input');
which will give you the $_POST array in query string format, periods as they should be.
You can then parse it if you need (as per POSTer's comment)
<?php
// Function to fix up PHP's messing up input containing dots, etc.
// `$source` can be either 'POST' or 'GET'
function getRealInput($source) {
$pairs = explode("&", $source == 'POST' ? file_get_contents("php://input") : $_SERVER['QUERY_STRING']);
$vars = array();
foreach ($pairs as $pair) {
$nv = explode("=", $pair);
$name = urldecode($nv[0]);
$value = urldecode($nv[1]);
$vars[$name] = $value;
}
return $vars;
}
// Wrapper functions specifically for GET and POST:
function getRealGET() { return getRealInput('GET'); }
function getRealPOST() { return getRealInput('POST'); }
?>
Hugely useful for OpenID parameters, which contain both '.' and '_', each with a certain meaning!
Highlighting an actual answer by Johan in a comment above - I just wrapped my entire post in a top-level array which completely bypasses the problem with no heavy processing required.
In the form you do
<input name="data[database.username]">
<input name="data[database.password]">
<input name="data[something.else.really.deep]">
instead of
<input name="database.username">
<input name="database.password">
<input name="something.else.really.deep">
and in the post handler, just unwrap it:
$posdata = $_POST['data'];
For me this was a two-line change, as my views were entirely templated.
FYI. I am using dots in my field names to edit trees of grouped data.
Do you want a solution that is standards compliant, and works with deep arrays (for example: ?param[2][5]=10) ?
To fix all possible sources of this problem, you can apply at the very top of your PHP code:
$_GET = fix( $_SERVER['QUERY_STRING'] );
$_POST = fix( file_get_contents('php://input') );
$_COOKIE = fix( $_SERVER['HTTP_COOKIE'] );
The working of this function is a neat idea that I came up during my summer vacation of 2013. Do not be discouraged by a simple regex, it just grabs all query names, encodes them (so dots are preserved), and then uses a normal parse_str() function.
function fix($source) {
$source = preg_replace_callback(
'/(^|(?<=&))[^=[&]+/',
function($key) { return bin2hex(urldecode($key[0])); },
$source
);
parse_str($source, $post);
$result = array();
foreach ($post as $key => $val) {
$result[hex2bin($key)] = $val;
}
return $result;
}
This happens because a period is an invalid character in a variable's name, the reason for which lies very deep in the implementation of PHP, so there are no easy fixes (yet).
In the meantime you can work around this issue by:
Accessing the raw query data via either php://input for POST data or $_SERVER['QUERY_STRING'] for GET data
Using a conversion function.
The below conversion function (PHP >= 5.4) encodes the names of each key-value pair into a hexadecimal representation and then performs a regular parse_str(); once done, it reverts the hexadecimal names back into their original form:
function parse_qs($data)
{
$data = preg_replace_callback('/(?:^|(?<=&))[^=[]+/', function($match) {
return bin2hex(urldecode($match[0]));
}, $data);
parse_str($data, $values);
return array_combine(array_map('hex2bin', array_keys($values)), $values);
}
// work with the raw query string
$data = parse_qs($_SERVER['QUERY_STRING']);
Or:
// handle posted data (this only works with application/x-www-form-urlencoded)
$data = parse_qs(file_get_contents('php://input'));
This approach is an altered version of Rok Kralj's, but with some tweaking to work, to improve efficiency (avoids unnecessary callbacks, encoding and decoding on unaffected keys) and to correctly handle array keys.
A gist with tests is available and any feedback or suggestions are welcome here or there.
public function fix(&$target, $source, $keep = false) {
if (!$source) {
return;
}
$keys = array();
$source = preg_replace_callback(
'/
# Match at start of string or &
(?:^|(?<=&))
# Exclude cases where the period is in brackets, e.g. foo[bar.blarg]
[^=&\[]*
# Affected cases: periods and spaces
(?:\.|%20)
# Keep matching until assignment, next variable, end of string or
# start of an array
[^=&\[]*
/x',
function ($key) use (&$keys) {
$keys[] = $key = base64_encode(urldecode($key[0]));
return urlencode($key);
},
$source
);
if (!$keep) {
$target = array();
}
parse_str($source, $data);
foreach ($data as $key => $val) {
// Only unprocess encoded keys
if (!in_array($key, $keys)) {
$target[$key] = $val;
continue;
}
$key = base64_decode($key);
$target[$key] = $val;
if ($keep) {
// Keep a copy in the underscore key version
$key = preg_replace('/(\.| )/', '_', $key);
$target[$key] = $val;
}
}
}
The reason this happens is because of PHP's old register_globals functionality. The . character is not a valid character in a variable name, so PHP coverts it to an underscore in order to make sure there's compatibility.
In short, it's not a good practice to do periods in URL variables.
If looking for any way to literally get PHP to stop replacing '.' characters in $_GET or $_POST arrays, then one such way is to modify PHP's source (and in this case it is relatively straightforward).
WARNING: Modifying PHP C source is an advanced option!
Also see this PHP bug report which suggests the same modification.
To explore you'll need to:
download PHP's C source code
disable the . replacement check
./configure, make and deploy your customized build of PHP
The source change itself is trivial and involves updating just one half of one line in main/php_variables.c:
....
/* ensure that we don't have spaces or dots in the variable name (not binary safe) */
for (p = var; *p; p++) {
if (*p == ' ' /*|| *p == '.'*/) {
*p='_';
....
Note: compared to original || *p == '.' has been commented-out
Example Output:
given a QUERY_STRING of a.a[]=bb&a.a[]=BB&c%20c=dd,
running <?php print_r($_GET); now produces:
Array
(
[a.a] => Array
(
[0] => bb
[1] => BB
)
[c_c] => dd
)
Notes:
this patch addresses the original question only (it stops replacement of dots, not spaces).
running on this patch will be faster than script-level solutions, but those pure-.php answers are still generally-preferable (because they avoid changing PHP itself).
in theory a polyfill approach is possible here and could combine approaches -- test for the C-level change using parse_str() and (if unavailable) fall-back to slower methods.
My solution to this problem was quick and dirty, but I still like it. I simply wanted to post a list of filenames that were checked on the form. I used base64_encode to encode the filenames within the markup and then just decoded it with base64_decode prior to using them.
After looking at Rok's solution I have come up with a version which addresses the limitations in my answer below, crb's above and Rok's solution as well. See a my improved version.
#crb's answer above is a good start, but there are a couple of problems.
It reprocesses everything, which is overkill; only those fields that have a "." in the name need to be reprocessed.
It fails to handle arrays in the same way that native PHP processing does, e.g. for keys like "foo.bar[]".
The solution below addresses both of these problems now (note that it has been updated since originally posted). This is about 50% faster than my answer above in my testing, but will not handle situations where the data has the same key (or a key which gets extracted the same, e.g. foo.bar and foo_bar are both extracted as foo_bar).
<?php
public function fix2(&$target, $source, $keep = false) {
if (!$source) {
return;
}
preg_match_all(
'/
# Match at start of string or &
(?:^|(?<=&))
# Exclude cases where the period is in brackets, e.g. foo[bar.blarg]
[^=&\[]*
# Affected cases: periods and spaces
(?:\.|%20)
# Keep matching until assignment, next variable, end of string or
# start of an array
[^=&\[]*
/x',
$source,
$matches
);
foreach (current($matches) as $key) {
$key = urldecode($key);
$badKey = preg_replace('/(\.| )/', '_', $key);
if (isset($target[$badKey])) {
// Duplicate values may have already unset this
$target[$key] = $target[$badKey];
if (!$keep) {
unset($target[$badKey]);
}
}
}
}
Well, the function I include below, "getRealPostArray()", isn't a pretty solution, but it handles arrays and supports both names: "alpha_beta" and "alpha.beta":
<input type='text' value='First-.' name='alpha.beta[a.b][]' /><br>
<input type='text' value='Second-.' name='alpha.beta[a.b][]' /><br>
<input type='text' value='First-_' name='alpha_beta[a.b][]' /><br>
<input type='text' value='Second-_' name='alpha_beta[a.b][]' /><br>
whereas var_dump($_POST) produces:
'alpha_beta' =>
array (size=1)
'a.b' =>
array (size=4)
0 => string 'First-.' (length=7)
1 => string 'Second-.' (length=8)
2 => string 'First-_' (length=7)
3 => string 'Second-_' (length=8)
var_dump( getRealPostArray()) produces:
'alpha.beta' =>
array (size=1)
'a.b' =>
array (size=2)
0 => string 'First-.' (length=7)
1 => string 'Second-.' (length=8)
'alpha_beta' =>
array (size=1)
'a.b' =>
array (size=2)
0 => string 'First-_' (length=7)
1 => string 'Second-_' (length=8)
The function, for what it's worth:
function getRealPostArray() {
if ($_SERVER['REQUEST_METHOD'] !== 'POST') {#Nothing to do
return null;
}
$neverANamePart = '~#~'; #Any arbitrary string never expected in a 'name'
$postdata = file_get_contents("php://input");
$post = [];
$rebuiltpairs = [];
$postraws = explode('&', $postdata);
foreach ($postraws as $postraw) { #Each is a string like: 'xxxx=yyyy'
$keyvalpair = explode('=',$postraw);
if (empty($keyvalpair[1])) {
$keyvalpair[1] = '';
}
$pos = strpos($keyvalpair[0],'%5B');
if ($pos !== false) {
$str1 = substr($keyvalpair[0], 0, $pos);
$str2 = substr($keyvalpair[0], $pos);
$str1 = str_replace('.',$neverANamePart,$str1);
$keyvalpair[0] = $str1.$str2;
} else {
$keyvalpair[0] = str_replace('.',$neverANamePart,$keyvalpair[0]);
}
$rebuiltpair = implode('=',$keyvalpair);
$rebuiltpairs[]=$rebuiltpair;
}
$rebuiltpostdata = implode('&',$rebuiltpairs);
parse_str($rebuiltpostdata, $post);
$fixedpost = [];
foreach ($post as $key => $val) {
$fixedpost[str_replace($neverANamePart,'.',$key)] = $val;
}
return $fixedpost;
}
Using crb's I wanted to recreate the $_POST array as a whole though keep in mind you'll still have to ensure you're encoding and decoding correctly both at the client and the server. It's important to understand when a character is truly invalid and it is truly valid. Additionally people should still and always escape client data before using it with any database command without exception.
<?php
unset($_POST);
$_POST = array();
$p0 = explode('&',file_get_contents('php://input'));
foreach ($p0 as $key => $value)
{
$p1 = explode('=',$value);
$_POST[$p1[0]] = $p1[1];
//OR...
//$_POST[urldecode($p1[0])] = urldecode($p1[1]);
}
print_r($_POST);
?>
I recommend using this only for individual cases only, offhand I'm not sure about the negative points of putting this at the top of your primary header file.
My current solution (based on prev topic replies):
function parseQueryString($data)
{
$data = rawurldecode($data);
$pattern = '/(?:^|(?<=&))[^=&\[]*[^=&\[]*/';
$data = preg_replace_callback($pattern, function ($match){
return bin2hex(urldecode($match[0]));
}, $data);
parse_str($data, $values);
return array_combine(array_map('hex2bin', array_keys($values)), $values);
}
$_GET = parseQueryString($_SERVER['QUERY_STRING']);
I'm trying to create a form which allows the user to define own custom query key and while I was testing the validation function for the form, I've noticed that %20 in a url query key is converted to a underscore in the $_GET array.
$key = 'a b';
$key = rawurlencode($key);
$value = 'value';
print_r($_GET); // output: Array ( [a_b] => value )
echo '<p>key:' . $key . '</p>';
echo '<p>value:' . $value . '</p>';
echo '<p>test</p>';
Are there other characters converted irregularly? I'm not sure "irregular" is the right word here since there might be a rule for this behavior but I didn't expect this would happen.
PHP replaces certain characters with an underscore because they are illegal in variable names. Even though they are legal in array keys, earlier versions of PHP would put form variables directly in variables (i.e. $a_b; see Register Globals), so this conversion was put in. This is done with space, dot, open square bracket, and control characters between 128 and 159.
This is only done with the names themselves, not to, for example, any array key parameters (i.e. http://example.com/foo.php?a[b.%20c]=1) since any character is legal in an array key. (Note that the array parameter feature itself means that open square bracket will not be replaced with _ as implied by the above in certain situations - the example will give $_GET['a']['b. c'] == 1.)
Source: http://ca.php.net/variables.external
Related question: Get PHP to stop replacing '.' characters in $_GET or $_POST arrays?
This function would fix those strings.
$key = 'a b.c[d';
$key = fix_key($key);
$value = 'value';
$_GET[$key] = $value;
print_r($_GET);
echo '<p>test</p>';
function fix_key($strKey) {
$search = array(chr(32), chr(46), chr(91));
for ($i=128; $i <= 159; $i++) array_push($search, chr($i));
return str_replace ( $search , '_', $strKey);
}