Understanding 'parse_str' in PHP - php

I'm a PHP newbie trying to find a way to use parse_str to parse a number of URLs from a database (note: not from the request, they are already stored in a database, don't ask... so _GET won't work)
So I'm trying this:
$parts = parse_url('http://www.jobrapido.se/?w=teknikinformat%C3%B6r&l=malm%C3%B6&r=auto');
parse_str($parts['query'], $query);
return $query['w'];
Please note that here I am just supplying an example URL, in the real application the URL will be passed in as a parameter from the database. And if I do this it works fine. However, I don't understand how to use this function properly, and how to avoid errors.
First of all, here I used "w" as the index to return, because I could clearly see it was in the query. But how do these things work? Is there a set of specific values I can use to get the entire query string? I mean, if I look further, I can see "l" and "r" here as well...
Sure I could extract those too and concatenate the result, but will these value names be arbitrary, or is there a way to know exactly which ones to extract? Of course there's the "q" value, which I originally thought would be the only one I would need, but apparently not. It's not even in the example URL, although I know it's in lots of others.
So how do I do this? Here's what I want:
Extract all parts of the query string that gives me a readable output of the search string part of the URL (so in the above it would be "teknikinformatör Malmö auto". Note that I would need to translate the URL encoding to Swedish characters, any easy way to do that in PHP?)
Handle errors so that if the above doesn't work for some reason, the method should only return an empty string, thus not breaking the code. Because at this point, if I were to use the above with an actual parameter, $url, passed in instead of the example URL, I would get errors, because many of the URLs do not have the "w" parameter, some may be empty fields in the database, some may be malformed, etc. So how can I handle such errors stably, and just return a value if the parsing works, and return empty string otherwise?
There seems to be a very strange problem that occurs that I cannot see during debugging. I put this test code in just to see what is going on:
function getQuery($url)
{
try
{
$parts = parse_url($url);
parse_str($parts['query'], $query);
if (isset($query['q'])) {
/* return $query['q']; */
return '';
}
} catch (Exception $e) {
return '';
}
}
Now, obviously in the real code I would want something like the commented out part to be returned. However, the puzzling thing is this:
With this code, as far as I see, every path should lead to returning an empty string. But this does not work - it gives me a completely empty grid in the result page. No errors or anything during debugging, and objects look fine when I step through them during debugging.
However, if I remove everything from this method except return ''; then it works fine - of course the field in the grid where the query is supposed to be is empty, but all the other fields have all the information as they should. So this was just a test. But how is it possible that code that should only be able to return an empty string does not work, while the one that only returns an empty string and does nothing else does work? I'm thoroughly confused...

The meaning of the query parameters is entirely up to the application that handles the URL, so there is no "right" parameter - it might be w, q, or searchquery. You can heuristically search for the most common variables (=guess), or return an array of all arguments. It depends on what you're trying to achieve.
parse_str already decodes urlencoding. Note that urlencoding is a way to encode bytes, not characters. It depends on what encoding the application expects. Usually (and in this example query), that should be UTF-8 everywhere, so you should be covered on 1.
Test whether the value exists, and if not, return the empty string, like this:
$heuristicFields = array('q', 'w', 'searchquery');
foreach ($heuristicFields as $hf) {
if (isset($query[$hf])) return $query[$hf];
}
return '';
The function returns null if the input is valid, and runs into errors (i.e., displays warning messages) when the URL is obviously invalid. The try...catch block has no effect.

It turned out the problem was with Swedish characters - if I used utf8_encode() on the value before returning it, it worked fine.

Related

PHP functions: passing variables by reference (and if not possible) by value

I am trying to build an "escape" function (as an exercise). The objective of this function is to transform "dangerous" values into safe values to be inserted in a database. The content of this function is not important.
function escape(&$value){
//some code
return $value;
}
Here's the problem: I want to make this function very handy to use, therefore it should be able to support 2 possible scenarios:
1) returning a safe value:
$safe_val = escape($unsafe_val);
2) changing a variable "by reference":
escape($value);
At the moment, my function does its job, however...if I pass something like:
$safe_val = escape(php_native_change_string_to_something($value));
PHP gets angry and says:
Notice: Only variables should be passed by reference
How can I make PHP accept that if something can't be passed by reference it does not matter and it should just ignore the error and continue the execution?
PHP is complaining because the value being passed into escape by escape(php_native_change_string_to_something($value)) is a temporary value (rvalue). The argument has no permanent memory address so it does not make sense to modify the value.
However, despite this not making sense, PHP will still do what you want. You are receiving a notice, not an error. Your code should still produce the output you are expecting. This short program models your setup:
<?php
function escape (&$s) {
return $s;
}
$s = 'TEXT TO ESCAPE';
$new_s = escape( strtolower( $s ) );
echo "$s\n";
echo "$new_s\n";
and produces the following results:
s: TEXT TO ESCAPE
new_s: text to escape
If you would like to get rid of the notice you will need to use the error control operator (#), #escape(php_native_change_string_to_something($value)).
Despite this being something that will work in PHP I would suggest avoiding this type of usage as it will decrease code readability and is not suggested by PHP (as the notice indicates).

Checking for vulnerabilities including remote file, with parameter, in PHP script

I'm including a remote file with file_get_contents() like so:
function checkData($serial) {
file_get_contents("http://example.com/page.php?somevar=".$serial."&check=1");
return $http_response_header;
}
This remote page performs some basic data manipulation, and looks up the serial number in a database (The input is sanitised and I'm using PDO, so I don't have to worry about SQL injections), and then returns a value in the response header. The input $serial is a get parameter - So completely controlled by the user. I'm wondering if there are any inputs to this function that would lead to undesirable behaviour, for example getting contents of another page other than the one desired.
Thanks in advance.
If the $serial variable is always going to be numeric you can apply intval() around the value to ensure the value will always be a number and not contain other non-numeric data for path traversal / RFC, etc.
E.G.
file_get_contents("http://example.com/page.php?somevar=".intval($serial)."&check=1");
Alternatively you can use preg_replace to strip unwanted characters, should you need alpha characters also.
http://php.net/manual/en/function.preg-replace.php

PHP - evaluating param

I have following code:
<?php
$param = $_GET['param'];
echo $param;
?>
when I use it like:
mysite.com/test.php?param=2+2
or
mysite.com/test.php?param="2+2"
it prints
2 2
not
4
I tried also eval - neither worked
+ is encoded as a space in query strings. To have an actual addition sign in your string, you should use %2B.
However, it should be noted this will not perform the actual addition. I do not believe it is possible to perform actual addition inside the query string.
Now. I would like to stress to avoid using eval as if it's your answer, you're asking the wrong question. It's a very dangerous piece of work. It can create more problems than it's worth, as per the manual specifications on this function:
The eval() language construct is very dangerous because it allows
execution of arbitrary PHP code. Its use thus is discouraged. If you
have carefully verified that there is no other option than to use this
construct, pay special attention not to pass any user provided data
into it without properly validating it beforehand.
So, everything that you wish to pass into eval should be screened against a very.. Very strict criteria, stripping out other function calls and other possible malicious calls & ensure that 100% that what you are passing into eval is exactly as you need it. No more, no less.
A very basic scenario for your problem would be:
if (!isset($_GET['Param'])){
$Append = urlencode("2+2");
header("Location: index.php?Param=".$Append);
}
$Code_To_Eval = '$Result = '.$_GET['Param'].';';
eval($Code_To_Eval);
echo $Result;
The first lines 1 through to 4 are only showing how to correctly pass a character such a plus symbol, the other lines of code are working with the data string. & as #andreiP stated:
Unless I'm not mistaking the "+" is used for URL encoding, so it would
be translated to a %, which further translates to a white space.
That's why you're getting 2 2
This is correct. It explains why you are getting your current output & please note using:
echo urldecode($_GET['Param']);
after encoding it will bring you back to your original output to which you want to avoid.
I would highly suggest looking into an alternative before using what i've posted

Regular Expression to Isolate a String of Characters in the proper context

So, I have a dashboard which I'm currently writing (PHP). The idea is that it is supposed to display data in a database relative to a given url specified. If the user wishes to just grab everything, they simply need to specify "all". If they wish to scrape data for specific URLs AND display everything at once, they will specify additional URLs with the "all" directive.
I discovered a bug, however.
If I have a URL which has the characters "all" in it (such as, say, http://everythingallatonce.com <-- that's just an example - I have no idea if that actually exists), the dashboard's parsing algorithm which takes the instruction given won't work properly. In fact, according to this logic, it will think that the user specified a given URL as well AS the words "all", without actually checking off the "perform scrape?" checkbox, which makes no sense at all (hence, it just throws an exception/dies with an error message).
So far, I just have a function like the following:
function _strExists( $needle, $haystack )
{
$pos = strpos( $haystack, $needle );
return ( $pos !== false );
}
Which I use to detect to see if the word "all" exists in the query, like so:
$fetchEverything = _strExists('all', $urls);
What would be a good work around for something like this, to avoid ambiguity between URLs specified which have "all" in them, and the actual query of all by itself? I'm thinking regular expressions, but I'm not sure...
Also
I have considered just using *, but I'd like to avoid that if possible.
If some value for all is being passed in the URL (i.e. all=1). Then you should look in the $_GET superglobal for it's existence (i.e. $_GET['all'])

Should you verify parameter types in PHP functions?

I'm used to the habit of checking the type of my parameters when writing functions. Is there a reason for or against this? As an example, would it be good practice to keep the string verification in this code or remove it, and why?
function rmstr($string, $remove) {
if (is_string($string) && is_string($remove)) {
return str_replace($remove, '', $string);
}
return '';
}
rmstr('some text', 'text');
There are times when you may expect different parameter types and run different code for them, in which case the verification is essential, but my question is if we should explicitly check for a type and avoid an error.
Yes, it's fine. However, php is not strongly typed to begin with, so I think this is not very useful in practice.
Additionally, if one uses an object other than string, an exception is a more informative; therefore, I'd try to avoid just returning an empty string at the end, because it's not semantically explaining that calling rmstr(array, object) returns an empty string.
My opinion is that you should perform such verification if you are accepting input from the user. If those strings were not accepted from the user or are sanitized input from the user, then doing verification there is excessive.
As for me, type checking actual to data, getted from user on top level of abstraction, but after that, when You call most of your functions you already should now their type, and don't check it out in every method. It affects performance and readability.
Note: you can add info, which types is allowed to arguments for your functions by phpDoc
It seems local folks understood this question as "Should you verify parameters" where it was "Should you verify parameter types", and made nonsense answers and comments out of it.
Personally I am never checking operand types and never experienced any trouble of it.
It depends which code you produce. If it's actually production code, you should ensure that your function is working properly under any circumstances. This includes checking that parameters contain the data you expect. Otherwise throw an exception or have another form of error handling (which your example is totally missing).
If it's not for production use and you don't need to code defensively, you can ignore anything and follow the garbage-in-garbage-out principle (or the three shit principle: code shit, process shit, get shit).
In the end it is all about matching expectations: If you don't need your function to work properly, you don't need to code it properly. If you are actually relying on your code to work precisely, you even need to validate input data per each unit (function, class).

Categories