PHP - matching and replacing urls - php

If I have multiple urls like this in a variable that a user enters called $url:
http://www.example.com/home
/home
?home=true
home/
And it's all supposed to be under the www.example.com website,
How can I replace the urls to their right form?
Something like this :
http://www.example.com/home => http://www.example.com/home
/ihome => http://www.example.com/ihome
?home=true => http://www.example.com/ihome?home=true
home/ => http://www.example.com/ihome/home/
Both last with current page to /ihome.

I'm assuming your question is about relative URLs.
Suppose you are at location http://example.com/home, and have a link to about page, which is hardlinked to http://example.com/about. This is accessible within 'home.php' (or wherever /home is hosted) by relatively linking About. This follows standard directory protocols and won't change much. Plenty of these examples are documented in the link I provided for more information.
Edit: Initial question was too vague for my answer to be true to what the question really calls for. Proper answer, how to replace, is below.
Use str_replace(), as indicated in the comment below. An example, for how to convert "/ihome". Suppose we are at location http://example.com/about.
$body = ...; // assuming this is going through whole php document
$url = "http://{$_SERVER['HTTP_HOST']}{$_SERVER['REQUEST_URI']}";
$tokens = explode('/', $url);
array_pop($tokens); // remove "/about"
$url = implode("", $tokens);
str_replace("/ihome", $url . "/ihome", $body);
The last line is certainly up for debate, but this should work. If you have multiple links you want to check, you may want to go with a regex approach (ie any links defined as just "/sub" or any queries "?query").

Theres probably a way to do this with a single preg_replace function in each loop, but something like this should do the trick.
<?php
foreach($urls as $key => $val)
if(!preg_match('/^http\:\/\/www\.example\.com\//', $val))
$url[$key] = 'http://www.example.com/'.ltrim($val, '/');

Related

php __FILE__ inside includes?

I have (maybe) an unusual issue with using __FILE__ in a file within a file.
I created a snippet of code (in the php 5 my server mandates) to take elements of the current filename and put it into a variable to use later. After some headache, I got it working totally fine. However, I realized I didn't want to have to write it every time and realized "oh no, if I include this it's only going to work on the literal filename of the include". If I wanted to grab the filename of the page the user is looking at, as opposed to the literal name of the included file, what's the best approach? Grab the URL from the address bar? Use a different magic variable?
EDIT1: Example
I probably should have provided an example in the first draft, pfft. Say I have numbered files, and the header where the include takes place in is 01header.php, but the file it's displayed in is Article0018.html. I used:
$bn = (int) filter_var(__FILE__, FILTER_SANITIZE_NUMBER_INT);
…to get the article number, but realized it would get the 1 in the header instead.
EDIT2: Temporary Solution
I've """solved""" the issue by creating a function to get the URL / URI and putting it into the variable $infile, and replaced all former appearances of __FILE__ with $infile, like so:
function getAddress() {
$protocol = $_SERVER['HTTPS'] == 'on' ? 'https' : 'http';
return $protocol.'://'.$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];}
$infile = urlencode(getAddress());
$bn = (int) filter_var($infile, FILTER_SANITIZE_NUMBER_INT);
echo "$bn";
So if the file the user is looking at is called "005-extremelynormalfile.html", I can display the number 5 inside the page, e.g., to say it's article number five.
While it's not as bad as I initially thought based on your description your code is still very fragile, and really only works by accident. If you have any other digits or hyphens it's going to go wrong, as below.
$infile = 'https://example-123.com/foo/42/bar/005-extremelynormalfile.html?x=8&y=9';
var_dump(
filter_var($infile, FILTER_SANITIZE_NUMBER_INT),
(int)filter_var($infile, FILTER_SANITIZE_NUMBER_INT)
);
Output:
string(12) "-12342005-89"
int(-12342005)
Sanitize functions are a blunt instrument for destroying data, and should only ever be used as a last resort when all other good sense has failed.
You need to use a proper parsing function to parse the url into its component parts, and then a simple regular expression to get what you want out of the filename.
function getIdFromURL($url) {
$url_parts = parse_url($url);
$path = $url_parts['path'];
$path_parts = explode('/', $path);
$filename = end($path_parts);
if( preg_match('/^(\d+)/', $filename, $matches) ) {
return (int)$matches[1];
}
return null;
}
var_dump(
getIdFromURL($infile)
);
Lastly, a lot of people are tempted to cram as much logic as possible into a regular expression. If I wanted to the above could be a single regex, but it would also be rigid, unreadable, and unmaintainable. Use regular expressions sparingly, as there's nearly always a parser/library that already does what you want, or the majority of it.
Quickly threw together a function that gets the url from the page as a variable, and replaced all occurrences of __FILE__ with that variable, and it worked correctly. Assuming the user cannot edit the URL / URI in any way, this should work well enough.

How do I get the depth of a URL using PHP?

I'd like to echo the depth (or number of directories from my home) of my current page's URL using PHP. How would I do that?
For example, if I'm on mysite.com, the output displays "0", if I'm on mysite.com/recipes, the output displays "1", and if I'm on mysite.com/recipes/pies, the output displays "2", and so on.
How do I do that?
I tried simplifying it and doing this, but it's exporting as 0:
$folder_depth = substr_count($_SERVER["PHP_SELF"] , "/");
echo $folder_depth;
Just for fun, here is my cheap and cheezy solution using PHP's parse_url() and its PHP_URL_PATH return value along with a couple of other functions:
$url = 'http://universeofscifi.com/content/tagged/model/battlestar_galactica.html';
echo var_dump(parse_url($url, PHP_URL_PATH));
echo count(explode('/', (parse_url($url, PHP_URL_PATH)))) - 2;
This returns:
string(47) "/content/tagged/model/battlestar_galactica.html"
3
I subtract 2 from the count to discard the domain at the front and the file at the end, leaving only the directory depth count.
If you won't have a query string, you can explode on /. If you will have a query string, you need to remove that first, such as...
$url = preg_replace('/?.*$/','',$url);
If you have http:// or https:// at the front of your URL, that can mess it up also. So remove it...
$url = preg_replace('~^https*://~','',$url);
Now, you only have the url as example.com/some/path/to/something. You can explode on / and get a count:
$a = explode('/',$url);
The size of $a will be 1 more than what you want. So, you need to subtract one:
$depth = sizeof($a)-1;
New problem... I just counted the file itself, such as example.com/links.html will come up as 1, not just 0. So, before the explode I need to get rid of the file name. But... how do I know if it is a file or a directory? That isn't built into the URL specification. For example, example.com/test could be a file or it could be a directory (and then it automatically goes to example.com/test/index.html). You need to assume what file extensions you will have and remove those files before you explode, such as:
$url = preg_replace('~/[^/]+.(html|php|gig|png|mp3)$~','',$url);
#kainaw, I like your answer! Thanks!
I took a spin on that. First, I noticed I was using the wrong PHP function to get the part of the URL I needed. Second, I needed to use #kaniaw's example and get the parts of the URL which I'm supposed to count, and ignore the others.
I also had to account for urls without content between the "/", so something like /word//// would still count as 1. Therefore, I only counted array elements after explode() which were not empty.
Here's my code:
$url = $_SERVER['REQUEST_URI'];
//echo "*".$_SERVER['REQUEST_URI']."*";
//$url = preg_replace('/?.*$/','',$url);
//$url = preg_replace('~^https*://~','',$url);
//$url = preg_replace('~/[^/]+.(html|php|gig|png|mp3)$~','',$url);
$a = explode('/',$url);
$depth =count(array_filter($a));
echo $depth;
I commented out some of those lines because I didn't seen them, but they were mentioned above.
Thanks!

PHP - Get the end of a URL

So I think I need to submit a new question for this...
Here is my old question: PHP - Get path minus root
I need a way in PHP to take the URL being any of the following...
http://kenthomes.net/plan_detail.php?mod=39
http://kenthomes.net/Amelia-Cove
and get everything after leaving me with...
"plan_detail.php?mod=39" // If there is no alias for that page
OR
"Amelia-Cove" // If that page has an alias being applied
In reality, they are the same page, because of the alias, but not all of these pages have aliases associated with them such as...
http://kenthomes.net/plan_detail.php?mod=52
unlike...
http://kenthomes.net/Amelia-Cove
Currently I am using...
trim($_SERVER['REQUEST_URI'],'/')
which gives me...
"Amelia-Cove" // Which is fine.
OR
"plan_detail.php" // Which is not okay.
I need..
"Amelia-Cove" // Which is fine.
OR
"plan_detail.php?mod=39" // Which is fine.
How do I do this?
You can get all the parts of an URL via parse_url()
For example;
$parts = parse_url('http://kenthomes.net/plan_detail.php?mod=39');
print_r($parts);
Should give you something like this:
Array
(
[scheme] => http
[host] => kenthomes.net
[path] => /plan_detail.php
[query] => mod=39
)
Which you can use to create your own URL containing the parts that you need
$_SERVER["REQUEST_URI"] only contains the URI.
When you also want the part after the ?, you need to also use $_SERVER["QUERY_STRING"].
Use:
trim($_SERVER['REQUEST_URI'] . $_SERVER['QUERY_STRING'], '/');
Append $_SERVER['REQUEST_URI'] with $_SERVER['QUERY_STRING'].
PHP: $_SERVER - Manual
You can get the query string (the bit after the question mark), via $_SERVER['QUERY_STRING']
parse_url, and if you just want the far right, using str_split would be sufficient.
$data = parse_url($url, PHP_URL_PATH);
Should be enough.
Otherwise if responding to the current request, $_SERVER['REQUEST_URI'] might work, as that is the entire URI.
You can try this:
$uri = $_SERVER['REQUEST_URI'];
$qs = $_SERVER['QUERY_STRING'];
echo trim($uri . $qs, '/');

Remove certain part of string in PHP [duplicate]

This question already has answers here:
Get domain name (not subdomain) in php
(18 answers)
Closed 10 years ago.
I've already seen a bunch of questions on this exact subject, but none seem to solve my problem. I want to create a function that will remove everything from a website address, except for the domain name.
For example if the user inputs: http://www.stackoverflow.com/blahblahblah I want to get stackoverflow, and the same way if the user inputs facebook.com/user/bacon I want to get facebook.
Do anyone know of a function or a way where I can remove certain parts of strings? Maybe it'll search for http, and when found it'll remove everything until after the // Then it'll search for www, if found it'll remove everything until the . Then it keeps everything until the next dot, where it removes everything behind it? Looking at it now, this might cause problems with sites as http://www.en.wikipedia.org because I'll be left with only en.
Any ideas (preferably in PHP, but JavaScript is also welcome)?
EDIT 1:
Thanks to great feedback I think I've been able to work out a function that does what I want:
function getdomain($url) {
$parts = parse_url($url);
if($parts['scheme'] != 'http') {
$url = 'http://'.$url;
}
$parts2 = parse_url($url);
$host = $parts2['host'];
$remove = explode('.', $host);
$result = $remove[0];
if($result == 'www') {
$result = $remove[1];
}
return $result;
}
It's not perfect, at least considering subdomains, but I think it's possible to do something about it. Maybe add a second if statement at the end to check the length of the array. If it's bigger than two, then choose item nr1 instead of item nr0. This obviously gives me trouble related to any domain using .co.uk (because that'll be tree items long, but I don't want to return co). I'll try to work around on it a little bit, and see what I come up with. I'd be glad if some of you PHP gurus out there could take a look as well. I'm not as skilled or as experienced as any of you... :P
Use parse_url to split the URL into the different parts. What you need is the hostname. Then you will want to split it by the dot and get the first part:
$url = 'http://facebook.com/blahblah';
$parts = parse_url($url);
$host = $parts['host']; // facebook.com
$foo = explode('.', $host);
$result = $foo[0]; // facebook
You can use the parse_url function from PHP which returns exactly what you want - see
Use the parse_url method in php to get domain.com and then use replace .com with empty string.
I am a little rusty on my regular expressions but this should work.
$url='http://www.en.wikipedia.org';
$domain = parse_url($url, PHP_URL_HOST); //Will return en.wikipedia.org
$domain = preg_replace('\.com|\.org', '', $domain);
http://php.net/manual/en/function.parse-url.php
PHP REGEX: Get domain from URL
http://rubular.com/r/MvyPO9ijnQ //Check regular expressions
You're looking for info on Regular Expression. It's a bit complicated, so be prepared to read up. In your case, you'll best utilize preg_match and preg_replace. It searches for a match based on your pattern and replaces the matches with your replacement.
preg_match
preg_replace
I'd start with a pattern like this: find .com, .net or .org and delete it and everything after it. Then find the last . and delete it and everything in front of it. Finally, if // exists, delete it and everything in front of it.
if (preg_match("/^http:\/\//i",$url))
preg_replace("/^http:\/\//i","",$url);
if (preg_match("/www./i",$url))
preg_replace("/www./i","",$url);
if (preg_match("/.com/i",$url))
preg_replace("/.com/i","",$url);
if (preg_match("/\/*$/",$url))
preg_replace("/\/*$/","",$url);
^ = at the start of the string
i = case insensitive
\ = escape char
$ = the end of the string
This will have to be played around with and tweaked, but it should get your pointed in the right direction.
Javascript:
document.domain.replace(".com","")
PHP:
$url = 'http://google.com/something/something';
$parse = parse_url($url);
echo str_replace(".com","", $parse['host']); //returns google
This is quite a quick method but should do what you want in PHP:
function getDomain( $URL ) {
return explode('.',$URL)[1];
}
I will update it when I get chance but basically it splits the URL into pieces by the full stop and then returns the second item which should be the domain. A bit more logic would be required for longer domains such as www.abc.xyz.com but for normal urls it would suffice.

Preg_Replace Change URL

I am trying to grab content from another one of my site which is working fine, apart from all the links are incorrect.
include_once('../simple_html_dom.php');
$page = file_get_html('http://www.website.com');
$ret = $page->find('div[id=header]');
echo $ret[0];
Is there anyway instead of all links showing link to have the full link? using preg replace.
$ret[0] = preg_replace('#(http://([\w-.]+)+(:\d+)?(/([\w/_.]*(\?\S+)?)?)?)#',
'http://fullwebsitellink.com$1', $ret[0]);
I guess it would be something like above but I dont understand?
Thanks
Your question doesn't really explain what is "incorrect" about the links, but I'm guessing you have something like this:
<div id="header">Home | Sitemap</div>
and you want to embed it in another site, where those links need to be fully-qualified with a domain name, like this:
<div id="header">Home | Sitemap</div>
Assuming this is the case, the replacement you want is so simple you don't even need a regex: find all href attributes beginning "/", and add the domain part (I'll use "http://example.com") to their beginning to make them absolute:
$scraped_html = str_replace('href="/', 'href="http://example.com/', $scraped_html);

Categories