i am writting an small crawler that extract some 5 to 10 sites while getting the links i am getting some urls like this
../tets/index.html
if it is /test/index.html we can add with base url http://www.example.com/test/index.html
what can i do for this kind of urls.
Url like these are relative urls . ".." means "parent directory", whereas "." simply means "this directory", as in bash.
For instance, if you are looking at this page : http://www.someserver/test/foo/bar.html , and there is an url like this in it : "../baz/foobar.html", it will in fact point to http://www.someserver/test/baz/foobar.html I think. Just test.
Use dirname() to get base directoy, remove the .. using substr() and append it there. Like this:
<?php
$url = "../tets/index.html";
$currentURL = "http://example.com/somedir/anotherdir";
echo dirname($currentURL).substr($url, 2);
?>
This outputs:
http://example.com/somedir/tets/index.html
Take a look into this URL Normalization Wikipedia page.
Related
I have a url (as below). I'd like to get the value of "2". How can I get that?
http://domain.com/site1/index.php/page/2
What you're looking for is a combination of pathinfo and parseurl:
pathinfo(parseurl($url)['path'])['filename'];
pathinfo will break the path into well-defined parts, of which filename is that last part you're looking for (2). If you're looking instaed for the absolute location in the path, you may want to split the path on / and simply get the value at index 3.
We can test this like so:
<?php
$url = "http://domain.com/site1/index.php/page/2";
$value=pathinfo(parse_url($url)['path'])['filename'];
echo $value."\n";
And then on the command line:
$ php url.php
2
I use nathaniel fords example but if you run into a problem where files are named '2.html' some servers will load those even though you have '2'.
You can also do this.
home.php?page=2 as the web address
home.php
<?php
// check to see if $page is set
$page = $POST[page];
$page = preg_replace('/\D/', '', $page);
if(!isset($page)){
query page two stuff or what you need.
}
?>
I want to show on my site an element depending on my site's url.
Currently i have the following code:
<?php
if(URL matches)
{
echo $something;
}
else
{
echo $otherthing;
}
?>
I wanted to know how do I get the URL on the if condition, because I need to have only one php archive to show on many diferent pages
EDIT: The solution provided by Rixhers Ajazi doesnt work for me, when i use ur code i get the same URI for both of my pages, so the if sentence always goes by the else side, is any way to get the exact string u can see on the browser to the PHP code
http://img339.imageshack.us/img339/5774/sinttulocbe.png
This is the place where it changes but, the URL i get on both sides is equal, im a little bit confused
To get the URL, use:
$url = http://$_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'];
Use following syntax with URL
http://mysite.com/index.php?var1=val&var2=val
Now you can get the values of variables in your $_GET variable and use in if condition like
if($_GET['var1'])
You can do so by using the $_SERVER method like so :
$url = $_SERVER['PHP_SELF']; or $url = $_SERVER['SERVER_NAME'];
Read up on this more here
if($url == 'WHATEVER')
{
echo $something;
}
else
{
echo $otherthing;
}
?>
You can use different variables, e.g., $_SERVER["PHP_SELF"], or $_SERVER["REQUEST_URI"]. The first one contains the path after the server name and until a possible ? in the URL (the part with the GET parameters is excluded). The second one contains also the GET parameters. You can also retrieve the hostname used to connect to the server (in case you have a virtual host situation) using $_SERVER["HTTP_HOST"]. Therefore by concatenating all these you can reconstruct the full URL (if you really need it, maybe the script name is enough).
I am able to scrape a page for URLs, but I want to know what is the easiest way to convert the various formats that these links can be in, into a fully fledged url. For example:
If I scrape: www.mysite.com/some/place/in/space.html
And I get the following urls:
../img.jpg
img.jpg
../../bla.jpg
inc/bla.jpg
/
./
They should resolve to
www.mysite.com/some/place/img.jpg
www.mysite.com/some/place/in/img.jpg
www.mysite.com/some/bla.jpg
www.mysite.com/some/place/in/inc/bla.jpg
www.mysite.com/some/place/in/
www.mysite.com/some/place/in/
Is there a function that does this for all cases or is it something I would have to code?
I use this function for a crawler i wrote long time ago: http://codepad.org/1VxMECNj
call the function with host prepended:
relativeUrl('http://host/dir/dir2/../../file.html');
//> returns http://host/file.html
You can just add www.mysite.com/some/place/in/ in front of the urls.. www.mysite.com/some/place/in/../img.jpg should resolve I think.
You could do a REGEX to replace the relative links with the absolute URLs:
$data = preg_replace('#(href|src)="([^:"]*)("|(?:(?:%20|\s|\+)[^"]*"))#', '$1="' . $site_url . '$2$3', $data);
I would like to create a bookmarklet for adding bookmarks. So you just click on the Bookmark this Page JavaScript Snippet in your Bookmarks and you are redirected to the page.
This is my current bookmarklet:
"javascript: location.href='http://…/bookmarks/add/'+encodeURIComponent(document.URL);"
This gives me an URL like this when I click on it on the Bookmarklet page:
http://localhost/~mu/cakemarks/bookmarks/add/http%3A%2F%2Flocalhost%2F~mu%2Fcakemarks%2Fpages%2Fbookmarklet
The server does not like that though:
The requested URL /~mu/cakemarks/bookmarks/add/http://localhost/~mu/cakemarks/pages/bookmarklet was not found on this server.
This gives the desired result, but is pretty useless for my use case:
http://localhost/~mu/cakemarks/bookmarks/add/test-string
There is the CakePHP typical mod_rewrite in progress, and it should transform the last part into a parameter for my BookmarksController::add($url = null) action.
What am I doing wrong?
I had a similar problem, and tried different solutions, only to be confused by the cooperation between CakePHP and my Apache-config.
My solution was to encode the URL in Base64 with JavaScript in browser before sending the request to server.
Your bookmarklet could then look like this:
javascript:(function(){function myb64enc(s){s=window.btoa(s);s=s.replace(/=/g, '');s=s.replace(/\+/g, '-');s=s.replace(/\//g, '_');return s;} window.open('http://…/bookmarks/add/'+myb64enc(window.location));})()
I make two replacements here to make the Base64-encoding URL-safe. Now it's only to reverse those two replacements and Base64-decode at server-side. This way you won't confuse your URL-controller with slashes...
Bases on poplitea's answer I translate troubling characters, / and : manually so that I do not any special function.
function esc(s) {
s=s.replace(/\//g, '__slash__');
s=s.replace(/:/g, '__colon__');
s=s.replace(/#/g, '__hash__');
return s;
}
In PHP I convert it back easily.
$url = str_replace("__slash__", "/", $url);
$url = str_replace("__colon__", ":", $url);
$url = str_replace("__hash__", "#", $url);
I am not sure what happens with chars like ? and so …
Not sure, but hope it helps
you should add this string to yout routs.php
Router::connect (
'/crazycontroller/crazyaction/crazyparams/*',
array('controller'=>'somecontroller', 'action'=>'someaction')
);
and after that your site will able to read url like this
http://site.com/crazycontroller/crazyaction/crazyparams/http://crazy.com
I need to be grabbing the URL of the current page in a Drupal site. It doesn't matter what content type it is - can be any type of node.
I am NOT looking for the path to theme, or the base url, or Drupal's get_destination. I'm looking for a function or variable that will give me the following in full:
http://example.com/node/number
Either with or without (more likely) the http://.
drupal_get_destination() has some internal code that points at the correct place to getthe current internal path. To translate that path into an absolute URL, the url() function should do the trick. If the 'absolute' option is passed in it will generate the full URL, not just the internal path. It will also swap in any path aliases for the current path as well.
$path = isset($_GET['q']) ? $_GET['q'] : '<front>';
$link = url($path, array('absolute' => TRUE));
This is what I found to be useful
global $base_root;
$base_root . request_uri();
Returns query strings and it's what's used in core: page_set_cache()
You can also do it this way:
$current_url = 'http://' .$_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'];
It's a bit faster.
Try the following:
url($_GET['q'], array('absolute' => true));
This method all is old method, in drupal 7 we can get it very simple
current_path()
http://example.com/node/306 returns "node/306".
http://example.com/drupalfolder/node/306 returns "node/306" while base_path() returns "/drupalfolder/".
http://example.com/path/alias (which is a path alias for node/306) returns "node/306" as opposed to the path alias.
and another function with tiny difference
request_path()
http://example.com/node/306 returns "node/306".
http://example.com/drupalfolder/node/306 returns "node/306" while base_path() returns "/drupalfolder/".
http://example.com/path/alias (which is a path alias for node/306) returns "path/alias" as opposed to the internal path.
http://example.com/index.php returns an empty string (meaning: front page).
http://example.com/index.php?page=1 returns an empty string.
I find using tokens pretty clean.
It is integrated into core in Drupal 7.
<?php print token_replace('[current-page:url]'); ?>
The following is more Drupal-ish:
url(current_path(), array('absolute' => true));
For Drupal 8 you can do this :
$url = 'YOUR_URL';
$url = \Drupal\Core\Url::fromUserInput('/' . $url, array('absolute' => 'true'))->toString();
Maybe what you want is just plain old predefined variables.
Consider trying
$_SERVER['REQUEST_URI'']
Or read more here.