i am dynamically loading a website via file_get_contents with the following script.
<?php
header('Content-Type: text/html; charset=iso-8859-1');
$url = (substr($_GET['url'], 0, 7) == 'http://') ? $_GET['url'] : "http://{$_GET['url']}";
$base_url = explode('/', $url);
$base_url = (substr($url, 0, 7) == 'http://') ? $base_url[2] : $base_url[0];
if (file_get_contents($url) != false) {
$content = #file_get_contents($url);
// $search = array('#(<a\s*[^>]*href=[\'"]?(?![\'"]?http))#', '|(<img\s*[^>]*src=[\'"]?)|');
// $replace = array('\1proxy2.php?url=', '\1'.$url.'/');
// $new_content = preg_replace($search, $replace, $content);
function prepend_proxy($matches) {
$url = (substr($_GET['url'], 0, 7) == 'http://') ? $_GET['url'] : "http://{$_GET['url']}";
$prepend = $matches[2] ? $matches[2] : $url;
$prepend = 'http://h899310.devhost.se/proxy/proxy2.php?url='. $prepend .'/';
return $matches[1] . $prepend . $matches[3];
}
function imgprepend_proxy($matches2) {
$url = (substr($_GET['url'], 0, 7) == 'http://') ? $_GET['url'] : "http://{$_GET['url']}";
$prepend2 = $matches2[2] ? $matches2[2] : $url;
$prepend2 = $prepend2 .'/';
return $matches2[1] . $prepend2 . $matches2[3];
}
$new_content = preg_replace_callback(
'|(href=[\'"]?)(https?://)?([^\'"\s]+[\'"]?)|i',
'prepend_proxy',
preg_replace_callback(
'|(src=[\'"]?)(https?://)?([^\'"\s]+[\'"]?)|i',
'imgprepend_proxy',
$content
)
);
echo "<base href='http://{$base_url}' />";
echo $new_content;
} else {
echo "Sidan kan inte visas";
}
?>
Now the problem is that some pictures doesn't show in websites. For example those sites who does have CSS links. It is a CSS problem i think.
You can test the script here to see what i mean:
http://h899310.devhost.se/proxy/index.html
How can I fix this?
It would appear that one of your URL replacement methods is adding a slash too many. Visit one of the pages your proxy provides, and you will see several URLs beginning with:
http:///www.msdn.com
Take for example loading msdn.com; the CSS won't load, because when looking at the source code of the proxy'd page, we see the URL to the CSS is (note the tree forward slashes):
http://h899310.devhost.se/proxy/proxy2.php?url=http:///i3.msdn.microsoft.com/global/global-bn20090721.css
Viewing the URL directly reveals a warning in your script showing that file_get_contents can't load the URL:
Warning: file_get_contents(http:///i3.msdn.microsoft.com/global/global-bn20090721.css) [function.file-get-contents]: failed to open stream: No error in D:\users\u190790\h899310.devhost.se\Wwwroot\proxy\proxy2.php on line 9
Sidan kan inte visas
Briefly look at your code, it seems the problem is with $prepend; it should look like this instead:
<?php
$prepend = $matches2[2] ? $matches2[2] : $url . '/';
$prepend = $prepend;
?>
header('Content-Type: text/html; charset=iso-8859-1');
This sets your proxy to display only text; css and images won't load through your proxy (or at least, won't display correctly).
Related
This is how I get my url on my localhost:
$url = (!empty($_SERVER['HTTPS'])) ? "https://".$_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'] : "http://".$_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'];
echo $url;
it returns : http://localhost/CodeSensei/menu because I am on the menu page.
How can I trim this? I only want "http://localhost/CodeSensei"
I know I can trim it like this
echo trim($url,"menu");
But the problem is, that "menu" is dynamic, it keep changing dependes on the page. Is there any way to trim my url so it will always and only print "http://localhost/CodeSensei" in any page?
There are many ways to achieve this. You can play around with different string manipulators like explode() etc.
Here is a solution using explode()
$variable = "http://localhost/CodeSensei/menu";
$variable = (explode("/",$variable));
$url='';
for($i=2;$i<count($variable)-1;$i++)
{
$url .= "/".$variable[$i];
}
$final_url = "http:/".$url;
echo $final_url;
Your output
http://localhost/CodeSensei
This function may help you. It get a url and return the url without after the last /.
<?php
function getSubUrl($originUrl) {
$url = parse_url($originUrl);
$url['scheme'] .= '://';
$url['path'] = dirname($url['path']);
return implode($url);
}
echo getSubUrl('http://localhost/CodeSensei/menu/123') . PHP_EOL;
// http://localhost/CodeSensei/menu
echo getSubUrl('http://localhost/CodeSensei/menu') . PHP_EOL;
// http://localhost/CodeSensei
echo getSubUrl('http://localhost/CodeSensei') . PHP_EOL;
// http://localhost/
echo getSubUrl('http://localhost/') . PHP_EOL;
// http://localhost/
echo getSubUrl('http://localhost') . PHP_EOL;
// http://localhost
I'm trying to check the string after the last trailing slash in my URL.
My code is as follows:
$url = "http://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]";
$data = substr($url, strrpos($url, '/') + 1);
if($data == "dashboard") {
require_once VIEW_ROOT . '/cp/dashboard_view.php';
} else {
echo $data;
}
Once I go to http://MYURL/dashboard/in it should show in as the $data. Instead it gives me a 500 error.
You can simply use explode() function to break the string... .Or else $_SERVER[REQUEST_URI] shall give you the data after the host name...
But for the data after the last '/' explode function will work the best..
This will work.
$url = "http://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]";
$x = explode('/',$url);
$data = $x[sizeof($x)-1];
echo $data;
You should try :
$url = "http://".$_SERVER[HTTP_HOST].$_SERVER[REQUEST_URI];
You need to join
http:// string with $_SERVER[HTTP_HOST] and then $_SERVER[REQUEST_URI] using .(dot).
I'm working on a project where I need to scrape some content from the same site, but a subfolder, and store it. I know it's not ideal, but it's sadly the best approach for the client.
I need to change all references from relative to absolute URLs
All the references (images, css, js) are referred relatively with both:
"../../imgs/"
"/js/"
... which means they don't work in my sub-folder. I need a function that matches the regex on these references and replaces the path.
When I try this:
function getRelativeContent($url) {
$page = file_get_contents($url);
//url needs trailing /
if (substr($url, -1, 1) != "/")
$url .= "/";
$page = preg_replace('/src="(\/)?([\w_\-\/\.\?&=#%#]*)"/i','src="' . $url . '$2"', $page);
$page = preg_replace('/href="(\/)?([\w_\-\/\.\?&=#%#]*)"/i','href="' . $url . '$2"', $page);
return $page;
}
echo getRelativeContent($url);
Then these URLs doesn't work:
<link href="/cassette.axd/stylesheet/fdbdaa59cb97b35f06f65fd41cb60caa3975cc0f/forbrug-rwd_(max-width 767px)" type="text/css" rel="stylesheet" media="(max-width: 767px)">
<img src="https://www.domain.dk/~/media/2561BD6AFBD64402877E4ACED01F97FD.ashx" />
function getRelativeContent($url) {
$page = file_get_contents($url);
//url needs trailing /
if (substr($url, -1, 1) != "/")
$url .= "/";
$page = preg_replace('/src="(\/)?([\w_\-\/\.\?&=#%#]*)"/i','src="' . $url . '$2"', $page);
$page = preg_replace('/href="(\/)?([\w_\-\/\.\?&=#%#]*)"/i','href="' . $url . '$2"', $page);
return $page;
}
echo getRelativeContent($url);
i have a location menu that has to change location, the good thing is every url exist in every city,, and every city is a subdomain
city1.domain.com.uk/index.php?page=category/238/12
city2.domain.com.uk/index.php?page=category/238/12
Im trying this. Im trying to break the URL to remove subdomain , so i can replace it for each item in menu
I want to get index.php?page=category/238/12
<?PHP
$protocol = strpos(strtolower($_SERVER['SERVER_PROTOCOL']),'https')=== FALSE ? 'http' : 'https';
$host = $_SERVER['HTTP_HOST'];
$script = $_SERVER['SCRIPT_NAME'];
$params = $_SERVER['QUERY_STRING'];
$url = $protocol . '://' . $host . $script . '?' . $params;
// break it up using the "."
$urlb = explode('.',$url);
// get the domain
$dns = $urlb[count($urlb)-1];
// get the extension
$ext = $urlb[count($urlb)+0];
//put it back together
$fullDomain = $dns.'.'.$ext;
echo $fullDomain;
?>
But i Get this php?page=category/238/12
Also i havent think in a solution for an issue i will be facing with this..
If im looking at a product the url change to something like
city2.domain.com.uk/index.php?page=item/preview/25
But, the products dont exist in every city , so my user will get a 404.
=(
How can i make a conditional in the process so if page=item/preview/25 i do replace this for
page=index/index
You can split the domain as:
$url = "city1.domain.com.uk/index.php?page=category/238/12";
list($subDomain, $params) = explode('?', $url);
list($domain, $sub) = explode('/', $subDomain);
$newUrl = $sub . "?" . $params;
echo $newUrl;
Cheers!
How about this:
<?php
$protocol = strpos(strtolower($_SERVER['SERVER_PROTOCOL']),'https')=== FALSE ? 'http' : 'https';
$host = $_SERVER['HTTP_HOST'];
$script = $_SERVER['SCRIPT_NAME'];
$params = $_SERVER['QUERY_STRING'];
$url = $protocol . '://' . $host . $script . '?' . $params;
$url=(parse_url($url));
$dns = substr($url['host'],stripos($url['host'],'.')+1);
$fullDomain =$url['scheme']."://".$dns.$url['path']."?".$url['query'].$url['fragment'];
if (substr($url['query'],stripos($url['query'],'=')+1,stripos($url['query'],'/')-stripos($url['query'],'=')-1)=='item') {
echo "redirect";
} else {
echo "don't redirect";
}
echo "<br>".$fullDomain;
?>
I have a string like this:
http://www.downlinegoldmine.com/viralmarketing
I need to remove http://www. from the string if it exists, as well as http:// if www is not included.
In few words I just need the domain name without any protocol.
parse_url is the perfect tool for the job. You would first call it to split the url in parts, then check the hostname part to see if it starts with www. and strip it, then assemble the url back.
Update: code
echo normalize_url('http://www.downlinegoldmine.com/viralmarketing');
function normalize_url($url) {
$parts = parse_url($url);
unset($parts['scheme']);
if (substr($parts['hostname'], 0, 4) == 'www.') {
$parts['hostname'] = substr($parts['hostname'], 4);
}
if (function_exists('http_build_url')) {
// This PECL extension makes life a lot easier
return http_build_url($parts);
}
// Otherwise it's the hard way
$result = null;
if (!empty($parts['username'])) {
$result .= $parts['username'];
if (!empty($parts['password'])) {
$result .= ':'.$parts['password'];
}
$result .= '#';
}
$result .= $parts['host'].$parts['path'];
if (!empty($parts['query'])) {
$result .= '?'.$parts['query'];
}
if (!empty($parts['fragment'])) {
$result .= '#'.$parts['fragment'];
}
return $result;
}
See it in action.
Just use parse_url (see: http://php.net/manual/de/function.parse-url.php ). It will also incorporate different protocols and paths etc.
$nvar = preg_replace("#http://(www\.)?#i", "", "http://www.downlinegoldmine.com/viralmarketing");
Test:
php> echo preg_replace("#http://(www\.)?#i", "", "http://www.downlinegoldmine.com/viralmarketing");
downlinegoldmine.com/viralmarketing
php> echo preg_replace("#http://(www\.)?#i", "", "http://downlinegoldmine.com/viralmarketing");
downlinegoldmine.com/viralmarketing
There's probably a better way, but:
$url = preg_replace("#^(http://)?(www\\.)?#i", "", $url);
$url = strncmp('http://', $url, 7) ? $url : substr($url, 7);
$url = strncmp('www.', $url, 4) ? $url : substr($url, 4);
You can use the following to remove the https://, http://, and www. from a url.
$url = 'http://www.downlinegoldmine.com/viralmarketing';
echo preg_replace('/https?:\/\/|www./', '', $url);
above returns downlinegoldmine.com/viralmarketing
and you can use the following to remove the urls path as well as the https://, http://, and www..
$url = 'http://www.downlinegoldmine.com/viralmarketing';
echo implode('/', array_slice(explode('/',preg_replace('/https?:\/\/|www./', '', $url)), 0, 1));
above returns downlinegoldmine.com