php - characters instead of html - php

Why does the code below code not print html content?
$url = 'http://clashofclans.com';
echo file_get_contents($url);
It works in all websites except for $url. I get this:
‹í}}{ÛƱïÿùÛ[É-á…IÛrŽÍØqzœØO¤º·'ÍÕ ˆ˜$ÔK÷;¿™¾åö9µËÅîìÌì¼íìxúå×o»çÿx÷Rd£á³/žâ¢ ƒñÕi#7P½g_hÚÓQ”Z8¦i”6fYßh7NæwY61¢_fñõiãÿ{nt“Ñ$ÈâËaÔÐÂdœEcêöíËÓ¨w•;Ž

Because the response content is gzipped.
Try gzdecode:
gzdecode(file_get_contents($url));
Consider using cURL instead, which does the decompression for you and should be more robust, as described in this SO answer.

Related

PHP - file_get_html not returning anything

I am trying to scrape data from this site, using "inspect" I am checking the class of the div, but when I try to get it, it doesn't display anything:
Trying to get the "Diamond" below "Supremacy".
What I am using:
<?php
include('simple_html_dom.php');
$memberName = $_GET['memberName'];
$html = file_get_html('https://destinytracker.com/d2/profile/pc/'.$memberName.'');
preg_match("/<div id=\"dtr-rating\".*span>/", $html, $data);
var_dump($data);
?>
FYI, simple_html_dom is a package available on SourceForge at http://simplehtmldom.sourceforge.net/. See the documentation.
file_get_html(), from simple_html_dom, does not return a string; it returns an object that has methods you can call to traverse the HTML document. To get a string from the object, do:
$url = https://destinytracker.com/d2/profile/pc/'.$memberName;
$html_str = file_get_html($url)->plaintext;
But if you are going to do that, you might as well just do:
$html_str = file_get_contents($url);
and then run your regex on $html_str.
BUT ... if you want to use the power of simple_html_dom ...
$html_obj = file_get_html($url);
$the_div = $html_obj->find('div[id=dtr-rating]', 0);
$inner_str = $the_div->innertext;
I'm not sure how to do exactly what you want, because when I look at the source of the web link you provided, I cannot find a <div> with id="dtr-rating".
My other answer is about using simple_html_dom. After looking at the HTML doc in more detail, I see the problem is different than I first thought (I'll leave it there for pointers on better use of simple_html_dom).
I see that the web page you are scraping is a VueJS application. That means the HTML sent by the web server causes Javascript to run and build the dynamic contents of the web page that you see displayed. That means, the <div> your are looking for with regex DOES NOT EXIST in the HTML sent by the server. Your regex cannot find anything but its not there.
In Chrome, do Ctl+U to see what the web server sent (no "Supremacy"). Do Ctl+Shift+I and look under the "Elements" tab to see the HTML after the Javascript has done is magic (this does have "Supremacy").
This means you won't be able to get the initial HTML of the web page and scrape it to get the data you want.

file_get_html() not working with airbnb

I have a problem with file_get_html(), i don't understand why it doesn't work can you help me? my code
$html = file_get_html('https://www.airbnb.fr/');
if ($html) {
echo "good";
}
Have a good day!
I think, server just blocks your request, you will not be able to fetch data from it, using simple HTTP requests.
You can try using curl, proxies, or both (there are ready to use solutions for this, like: AngryCurl, or RollingCurl)
It doesnt work because you have to include the simple_dom_html class to make it work. You can find the code on their official page:
http://simplehtmldom.sourceforge.net/
Then you can simply get the HTML and output it like this:
// Dump contents (without tags) from HTML
echo file_get_html('http://www.google.com/')->outertext;
or if you want to save the result in a variable
// Dump contents (without tags) from HTML
$html = file_get_html('http://www.google.com/')->outertext;
More info: http://simplehtmldom.sourceforge.net/

How to use snoopy class in PHP?

I'm beginner of php, I'm making simple program, and that use some crawling web site (not private information). The result that I expected is HTML CODE, like a
<html><head><title>blabla blabla</title></head>...................
But I checked the result, the screen shown up. not a raw code, for example,
include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->fetch("http://stackoverflow.com/");
echo $snoopy->results;
How to I get information to HTML Code? And Do you have another good parsing library in PHP? (like a beautifulsoup on Python, and Jsoup on Java)
** The result of above code : not a html code, but screen **
To see the source code using your browser instead of it rendering the HTML your last line should be:
echo htmlspecialchars($snoopy->results);
It's very simple
// Add snoopy class and initiate it
require "snoopy/Snoopy.class.php";
$snoopy = new Snoopy;
// THis fetches the html
$snoopy->fetch("http://www.php.net/");
$text = $snoopy->results;
// This fetches the text with html tags stripped
$snoopy->fetchtext("http://www.php.net/");
$text = $snoopy->results;
// This fetches all the links
$snoopy->fetchlinks('http://www.php.net/');
$linksarray = $snoopy->results;
Snoopy works great for me. So hope that helps
If you want to fetch the html from the URL you can do this simple do this by file_get_contents function of php.
$url = 'http://stackoverflow.com/';
$html = file_get_contents($url);
// echo $url -> wrong
echo $html;

How to extract source code from https://twitter.com in PHP

I try to download sourcecode of a twitter webpage with a php code:
$continut_pp = file_get_contents('https://twitter.com/');
echo $continut_pp;
The problem is that result is null. I think the problem comes from the https, well how I can extract an https source coude in PHP code?
Try using the file_get_contents function. Just give it the full web address and it should return the HTML source. I hope this helps.
As the first function I suggested did not work, you could try this one: var markup = document.documentElement.innerHTML;. However, it is in Javascript and not PHP.

Encoding and decoding in PHP doesn't work

In my html file I am encoding an URL like this:
encodeURIComponent(url);
In my PHP file I use GET to grab the url and sanitize:
$url = filter_var(filter_var(($_GET['url']), FILTER_SANITIZE_URL),FILTER_SANITIZE_STRING);
I'm experiencing some trouble when adding other parameters to the URL. When I take out the javascript encodeURIComponent the whole thing starts working but I really would like to use encodeURIComponent, so I tried to decode the URL on PHP side:
$url = filter_var(filter_var((urldecode($_GET['url'])), FILTER_SANITIZE_URL),FILTER_SANITIZE_STRING);
Doesn't work, I tried other things with no luck. I don't see how the above wouldn't work. What else can I try, what am I doing wrong?

Categories