Trying to get contents of a url, but to avoid getting blocked i want to use a proxy every request.
But both ways do not seem to work...
EDIT:
Now i tried this, but my server log keeps logging my real IP.
$page = file_get_contents("https://free-proxy-list.net/");
preg_match_all("/[0-9]{1,3}\.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}<\/td><td>[0-9]{1,5}/", $page, $matches);
$randomproxy = $matches[0][array_rand($matches[0])];
$randomproxy = "tcp://".str_replace("</td><td>", ":", $randomproxy);
echo $randomproxy;
// configure default context to use proxy
$opts = array(
'tcp' => array(
'proxy' => $randomproxy
)
);
$context = stream_context_create($opts);
$sFile = file_get_contents("https://www.website.tld/inner.html", False, $context);
var_dump($sFile);
Related
I’ve been trying to access this particular REST service from a PHP page I’ve created on our server. I narrowed the problem down to these two lines. So my PHP page looks like this:
$websiteUrl = "https://www.doofootball.com/";
$dom = file_get_html($websiteUrl);
var_dump($dom);
enter image description here
I remember having a similar problem with simple_html_dom. Suddenly it didn't work any longer without passing a context variable. I don't recall where I found this solution but it has been working for quite some time now. Just try this please and let me know whether this resolves your problem.
$context = stream_context_create(
array(
// 'http' => array(
// 'follow_location' => false
// ),
'ssl' => array(
"verify_peer"=>false,
"verify_peer_name"=>false,
),
)
);
$websiteUrl = "https://www.doofootball.com/";
$dom = file_get_html($websiteUrl, false, $context);
file_get_html expects these parameters:
function file_get_html(
$url,
$use_include_path = false,
$context = null,
$offset = 0,
$maxLen = -1,
$lowercase = true,
$forceTagsClosed = true,
$target_charset = DEFAULT_TARGET_CHARSET,
$stripRN = true,
$defaultBRText = DEFAULT_BR_TEXT,
$defaultSpanText = DEFAULT_SPAN_TEXT)
Don't remember why I commented out the lines with "follow_location" ... You'll figure it out. Good luck!
I have a very simple script that works perfectly on most sites but not the main site I want it to work with - the code below accesses a sample site perfectly. However when I use it on a site I want to access http://www.livescore.com I get an error
This works.
<?php
$url = "http://www.cambodia.me.uk";
$page = file_get_contents($url);
$outfile = "contents.html";
file_put_contents($outfile, $page);
?>
This does not work.....
<?php
$url = "http://www.livescore.com";
$page = file_get_contents($url);
$outfile = "contents.html";
file_put_contents($outfile, $page);
?>
and gives the following error
Warning: file_get_contents(http://www.livescore.com)
[function.file-get-contents]: failed to open stream: HTTP request
failed! HTTP/1.0 404 Not Found in C:\Program Files
(x86)\EasyPHP-5.3.8.1\www\Livescore\attempt-1-read-page.php on line 3
Thanks for any assistance
In common case you can just say to file_get_contents to follow redirects:
$context = stream_context_create(
array(
'http' => array(
'follow_location' => true
)
)
);
$html = file_get_contents('http://www.example.com/', false, $context);
This site tries to analyze User-agent http header, and fails if it's not found. Try to add some user-agent header:
<?php
$context = stream_context_create(
array(
'http' => array(
'header' => "User-agent: chrome",
'ignore_errors' => true,
'follow_location' => true
)
)
);
$html = file_get_contents('http://www.livescore.com/', false, $context);
echo substr($html, 0, 200)."\n";
Most likely www.livescore.com is doing a hidden redirect which file_get_contents is too basic to catch.
Do you have lynx installed on your server?
$page= shell_exec("lynx -source 'http://www.livescore.com'");
lynx is a full browser and can 'bypass' certain redirects.
I would like to stop a simplexml_load_file if it takes too long to load and/or isn't reachable (occasionally the site with the xml goes down) seeing as I don't want my site to completely lag if theirs aren't up.
I tried to experiment a bit myself, but haven't managed to make anything work.
Thank you so much in advance for any help!
You can't have an arbitrary function quit after a specified time. What you can do instead is to try to load the contents of the URL first - and if it succeeds, continue processing the rest of the script.
There are several ways to achieve this. The easiest is to use file_get_contents() with a stream context set:
$context = stream_context_create(array('http' => array('timeout' => 5)));
$xmlStr = file_get_contents($url, FALSE, $context);
$xmlObj = simplexml_load_string($xmlStr);
Or you could use a stream context with simplexml_load_file() via the libxml_set_streams_context() function:
$context = stream_context_create(array('http' => array('timeout' => 5)));
libxml_set_streams_context($context);
$xmlObj = simplexml_load_file($url);
You could wrap it as a nice little function:
function simplexml_load_file_from_url($url, $timeout = 5)
{
$context = stream_context_create(
array('http' => array('timeout' => (int) $timeout))
);
$data = file_get_contents($url, FALSE, $context);
if(!$data) {
trigger_error("Couldn't get data from: '$url'", E_USER_NOTICE);
return FALSE;
}
return simplexml_load_string($data);
}
Alternatively, you can consider using the cURL (available by default). The benefit of using cURL is that you get really fine grained control over the request and how to handle the response.
You should be using a stream context with a timeout option coupled with file_get_contents
$context = stream_context_create(array('http' => array('timeout' => 5))); //<---- Setting timeout to 5 seconds...
and now map that to your file_get_contents
$xml_load = file_get_contents('http://yoururl', FALSE, $context);
$xml = simplexml_load_string($xml_load);
function PostRequest($url) {
$opts = array('http' =>
array(
'method' => 'POST',
'header' => 'Cookie: testcookie=blah; testcookie2=haha;'
)
);
//$context = stream_context_create($opts);
$context = stream_context_create($opts);
$result = file_get_contents($url, false, $context);
return $result;
}
After I sent out the cookies, I still return by a message non login. but when I surf the pages with browser, I am login.
I sent request with localhost then I tried to used ajax to sent the request, but return status 0......
Is there any way to sent out the request?
If you want to play with HTTP scripting, i have library which you can use. https://github.com/toopay/CI-Proxy-Library, its orriginally written for CodeIgniter, but with little tweak, you should can use it on any PHP script.
At work we have to use a proxy to basically access port 80 for example, we have our own custom logins for each user.
My temporary workaround is using curl to basically login as myself through a proxy and access the external data I need.
Is there some sort of advanced php setting I can set so that internally whenever it tries to invoke something like file_get_contents() it always goes through a proxy? I'm on Windows ATM so it'd be a pain to recompile if that's the only way.
The reason my workaround is temporary is because I need a solution that's generic and works for multiple users instead of using one user's credentials ( Ive considered requesting a separate user account solely to do this but passwords change often and this technique needs to be deployed throughout a dozen or more sites ). I don't want to hard-code credentials basically to use the curl workaround.
To use file_get_contents() over/through a proxy that doesn't require authentication, something like this should do :
(I'm not able to test this one : my proxy requires an authentication)
$aContext = array(
'http' => array(
'proxy' => 'tcp://192.168.0.2:3128',
'request_fulluri' => true,
),
);
$cxContext = stream_context_create($aContext);
$sFile = file_get_contents("http://www.google.com", False, $cxContext);
echo $sFile;
Of course, replacing the IP and port of my proxy by those which are OK for yours ;-)
If you're getting that kind of error :
Warning: file_get_contents(http://www.google.com) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.0 407 Proxy Authentication Required
It means your proxy requires an authentication.
If the proxy requires an authentication, you'll have to add a couple of lines, like this :
$auth = base64_encode('LOGIN:PASSWORD');
$aContext = array(
'http' => array(
'proxy' => 'tcp://192.168.0.2:3128',
'request_fulluri' => true,
'header' => "Proxy-Authorization: Basic $auth",
),
);
$cxContext = stream_context_create($aContext);
$sFile = file_get_contents("http://www.google.com", False, $cxContext);
echo $sFile;
Same thing about IP and port, and, this time, also LOGIN and PASSWORD ;-) Check out all valid http options.
Now, you are passing an Proxy-Authorization header to the proxy, containing your login and password.
And... The page should be displayed ;-)
Use stream_context_set_default function. It is much easier to use as you can directly use file_get_contents or similar functions without passing any additional parameters
This blog post explains how to use it. Here is the code from that page.
<?php
// Edit the four values below
$PROXY_HOST = "proxy.example.com"; // Proxy server address
$PROXY_PORT = "1234"; // Proxy server port
$PROXY_USER = "LOGIN"; // Username
$PROXY_PASS = "PASSWORD"; // Password
// Username and Password are required only if your proxy server needs basic authentication
$auth = base64_encode("$PROXY_USER:$PROXY_PASS");
stream_context_set_default(
array(
'http' => array(
'proxy' => "tcp://$PROXY_HOST:$PROXY_PORT",
'request_fulluri' => true,
'header' => "Proxy-Authorization: Basic $auth"
// Remove the 'header' option if proxy authentication is not required
)
)
);
$url = "http://www.pirob.com/";
print_r( get_headers($url) );
echo file_get_contents($url);
?>
Depending on how the proxy login works stream_context_set_default might help you.
$context = stream_context_set_default(
array(
'http'=>array(
'header'=>'Authorization: Basic ' . base64_encode('username'.':'.'userpass')
)
)
);
$result = file_get_contents('http://..../...');
There's a similar post here: http://techpad.co.uk/content.php?sid=137 which explains how to do it.
function file_get_contents_proxy($url,$proxy){
// Create context stream
$context_array = array('http'=>array('proxy'=>$proxy,'request_fulluri'=>true));
$context = stream_context_create($context_array);
// Use context stream with file_get_contents
$data = file_get_contents($url,false,$context);
// Return data via proxy
return $data;
}