simple_html_dom does not take data from some websites.
For the website www.google.pl, it downloads the source of the page,
but for other such as: gearbest.com, stooq.pl does not download any data.
require('simple_html_dom.php');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.google.com/"); // work
/*
curl_setopt($ch, CURLOPT_URL, "https://www.gearbest.com/"); // dont work
curl_setopt($ch, CURLOPT_URL, "https://stooq.pl/"); // dont work
*/
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
curl_close($ch);
$html = new simple_html_dom();
$html->load($response);
echo $html;
What should I change in the code to receive data from websites?
The root problem here (at least on my computer, maybe different with
your version...) is that site returns gzipped data, and it isn't being
uncompressed properly by php and curl before being passed to the dom
parser. If you are using php 5.4, you can use gzdecode and
file_get_contents to uncompress it yourself.
<?php
// download the site
$data = file_get_contents("http://www.tsetmc.com/loader.aspx?ParTree=151311&i=49776615757150035");
// decompress it (a bit hacky to strip off the gzip header)
$data = gzinflate(substr($data, 10, -8));
include("simple_html_dom.php");
// parse and use
$html = str_get_html($data);
echo $html->root->innertext();
Note that this hack will not work on most sites. The main reason
underlying this seems to me that curl doesn't announce that it accepts
gzip data... but the web server on that domain doesn't pay attention
to that header, and gzips it anyway. Then neither curl nor php
actually checks the Content-Encoding header on the response, and
assumes it isn't gzipped so it passes it through without an error nor
calling gunzip. Bugs in both the server and the client here!
For a more robust solution, maybe you can use curl to get the headers
and inspect them yourself to determine if you need to decompress it.
Or you can just use this hack for this site and the normal method for
others to keep things simple.
It might still also help to set the character encoding on your output.
Add this before you echo anything to ensure the data you read isn't
recorrupted in the user's browser by being read as the wrong charset:
header('Content-Type: text/html; charset=utf-8');
Related
I want to be able to go to mydoma.in/tunnel.php?file=http://otherdoma.in/music.mp3, and then get the data of http://otherdoma.in/music.mp3 streamed to the client.
I tried doing this via Header();, but it redirects instead of "tunelling" the data.
How can i do this?
Use cURL for streaming:
<?php
$url = $_GET["file"];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_BUFFERSIZE, 256);
curl_exec($ch);
curl_close($ch);
?>
If they are small, you might be able to use file_get_contents(). Otherwise, you should probably use cURL. You would want to cURL the URL from the get variable "file". Then save it to a local temporary location with PHP. Then, use header() to direct yourself to the local file. Deleting the temporary file is the only issue, as there isn't really a way to determine when you have finished downloading it or not. So you might be able to sleep or delay the file removal, but you might find it's a better option to use a cron job to clean up all of the temporary files later.
Have your PHP script pull the remote content:
$data = file_get_contents($remote_url);
And then just spit it out:
echo $data;
Or simply:
echo file_get_contents($remote_url);
You might have to add some headers to indicate the content type.
Alternatively, you could configure a proxy with something like nginx -- this will allow you to rewrite particular URLs to a remote site and then serve them as local, no coding required.
I want to be able to allow user to enter in variable URL which file they would like to download from remote server URL e.g /download.php?url=fvr_anim_foxintro_V4_01.jpg
<?php
$url = $_GET['url'];
header("Location: http://fvr.homestead.com/files/animation/" . $url);
?>
The above is purely an example I grabbed from google images. The problem is I do not want the end user to be allowed to see where the file is originally coming from so it would need to get the file download to the server and the server passes it along to the end user. Is there a method of doing this?
I find many examples for files hosted on the server but no examples for serving files hosted on a remote server. In other words I would be passing them along. The files would be quite large (up to 100MB)
Thanks in advance!
You can use cURL for this:
<?php
$url = "http://share.meebo.com/content/katy_perry/wallpapers/3.jpg";
$ch = curl_init();
$timeout = 0;
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
// Getting binary data
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
$image = curl_exec($ch);
curl_close($ch);
// output to browser
header("Content-type: image/jpeg");
echo $image;
?>
Source: http://forums.phpfreaks.com/topic/120308-solved-curl-get-image/
Of course, this example is just for an image (as you've suggested) but you can use cURL for all kinds of remote data retrieval via HTTP GET, POST, PUT, DELETE, etc. Search around the web for "php curl" and you'll find an endless supply of information.
The ideal solution would be to use PHP's cURL Library, but if you're using shared hosting keep in mind this library may be disabled.
Assuming you can use cURL, you simply echo the Content-type header with the appropriate MIME Type and echo the results from curl_exec().
To get a basic idea of how to use the cURL library, look at the example under the curl_init() function.
I'm currenlty trying to gather some datas from politifact using simple html dom, but a lot of the time I have weirds errors instead of the html expected.
The goal is not to bruteforce the site but to request it once or twice a day and cache the result.
Here most of the returns I get :
‹������í]{wÛ6²ÿ»=g¿ªn#»1EËJœÄ–µ×vœ&ÙÄñÚn²{r{|( ’S$ÇeuÛï~3न‡c'ÛísNÄ`f0˜Úß=}sxþ¯“#1ŠÆŽ8ùùàÕ‹CQ3Ló]ëÐ4Ÿž?ÿ|~þú•h66Åy`¹¡Ùžk9¦yt\µQù;¦9™L“...
And here's the super simple code :
$html = file_get_html('http://www.politifact.com/personalities/barack-obama');
print_r($html->plaintext);
Do you have any ideas why ?
Some sort of protection/redirection on the website side ?
Thank you very much !
You received the expected page, but in gzip format. It looks like the server doesn't mind if the accept-encoding header is not included in the request and instead of sending a default plain text response, sends a gzipped data anyway.
I don't think simple-html-dom can unzip the data, but you can use cURL for that purpose:
$ch = curl_init('http://www.politifact.com/personalities/barack-obama/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip');
$data = curl_exec($ch);
$html = str_get_html($data);
I am trying to set cookie through cURL in PHP but it fails. my php code looks like this
$ch=curl_init();
$url="http://localhost/javascript%20cookies/test_cookies.html";
curl_setopt($ch, CURLOPT_COOKIE, 'user=1');
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$contents = curl_exec($ch);
curl_close($ch);
echo $contents;
?>
the file test_cookies.html contains javascript that checks for a cookie and if it finds it dispalys the content with additional user content.
but when i use the above script it displays the contents of the page test_cookies.html but not with the additional user content which means that it is not setting the cookie.
i tried writing another script like this
<?php
header("Set-Cookie:user=1");
header("Location:test_cookies.html");
?>
this works and sets the cookie and shows the additional user content too.
I also tried using
curl_setopt($ch,CURLOPT_COOKIEFILE,"cookie.txt");
curl_setopt($ch,CURLOPT_COOKIEJAR,"cookie.txt");
this is writing the cookie information to the file but not reading it when fetching the page.
can somebody help?
Since javascript conducts the check in the browser, you should set the cookie before sending the output to the browser. So you need to combine both scripts:
$ch=curl_init();
$url="http://localhost/javascript%20cookies/test_cookies.html";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$contents = curl_exec($ch);
curl_close($ch);
header("Set-Cookie:user=1");
echo $contents;
Explanation:
Please note that we are looking at two transfers here:
data is fetched by curl, and then
it is sent to the browser.
This is a special case where you are using curl to get the content from localhost, but in real-life uses you'd use curl to get content from a 3rd host.
If you receive different content based on whether a cookie is sent in the request or not, then you should set the cookie with curl. In most cases you can then send the content with no additional tweaking to the browser. But here, you make the decision with checking for a cookie in the browser, so you don't need the first cookie setting, and you do need the second one.
JavaScript will not work by getting page from curl.
I want to use simeplexml class in PHP5 to handle a small XML file. But to obtain that file, script has to send a specific POST request to a remote server that will "give" me an XML file in return. So I believe I can't use the "simplexml_load_file" method. This file is needed just for processing, then it can, or even should, be gone/deleted.
I've got HTTP HEADER of this type
$header = 'POST '.$gateway.' HTTP/1.0'."\r\n" .
'Host: '.$server."\r\n".
'Content-Type: application/x-www-form-urlencoded'."\r\n".
'Content-Length: '.strlen($param)."\r\n".
'Connection: close'."\r\n\r\n";
And not much idea of what to do next with that. There is fsockopen but I'm not sure if that would be appropriate or how to go with it.
My advice would be use something like Zend_Http_Client library or cURL. Getting everything right with fsockopen will be a pain to debug.
Zend_Http_Client has a nice interface and would work fabulously.
CURL isn't too much of a pain either and is already a part of most PHP builds.
Example below:
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/"); // Replace with your URL
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch) // Return the XML string of data
// Parse output to Simple XML
// You'll probably want to do some validation here to validate that the returned output is XML
$xml = simplexml_load_string($output);
I'd use an HTTP client library like Zend_Http_Client (or cURL if you're a masochist) to create the POST request, then feed the response body into simplexml_load_string or SimpleXMLElement::__construct()