scraping a webpage returns encrypted characters - php

I have tried quite a few methods of downloading the page below$url = ''; using PHP. However, I always receive a page with encrypted characters.
I've tried searching for possible solutions prior to posting, and have tried out a few, however, I haven't been able to get any to work yet.
Please see the methods I have tried below and suggest a solution. I am looking for a PHP solution for the same.
Approach 1 - using file_get_contents - returns encrypted characters
//$contents = file_get_contents($url, $use_include_path, $context, $offset);
$url = '';
$html = str_get_html(utf8_encode(file_get_contents($url)));
echo $html;
Approach 2 - using file_get_html - returns encrypted characters
$url = '';
$encoded = htmlentities(utf8_encode(file_get_html($url)));
echo $encoded;
Approach 3 - using gzread - returns blank page
$url = '';
$fp = gzopen($url,'r');
$contents = '';
while($html = gzread($fp , 256000))
$contents .= $html;
Approach 4 - using gzinflate - returns empty page
//function gzdecode($data)
// return gzinflate(substr($data,10,-8));
//$contents = file_get_contents($url, $use_include_path, $context, $offset);
$url = '';
$html = str_get_html(utf8_encode(file_get_contents($url)));
echo gzinflate(substr($html,10,-8));
Approach 5 - using fopen and fgets - returns encrypted characters
$handle = fopen($url, "r");
if ($handle)
while (($line = fgets($handle)) !== false)
echo $line;
// error opening the file.
echo "could not open the wikipedia URL!";
Approach 6 - adding ob_start at the beginning of script - page does not load
$url = '';
$handle = fopen($url, "r");
if ($handle)
while (($line = fgets($handle)) !== false)
echo $line;
// error opening the file.
echo "could not open the wikipedia URL!";
Approach 7 - using curl - returns empty page
$url = '';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_TIMEOUT,5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
$return = curl_exec($ch);
$info = curl_getinfo($ch);
$html = str_get_html("$return");
echo $html;
Approach 8 - using R - returns encrypted characters
> thepage = readLines('')
There were 29 warnings (use warnings() to see them)
> thepage[1:5]
[1] "\037‹\b"
[2] "+SC®\037\035ÕpšÐ\032«F°{¼…àßá$\030±ª\022ù˜ú×Gµ."
[3] "\023\022&ÒÅdDjÈÉÎŽj\t¹Iꬩ\003ä\fp\024“ä(M<©U«ß×Ðy2\tÈÂæœ8ž­\036â!9ª]ûd<¢QR*>öÝdpä’kß!\022?ÙG~è'>\016¤ØÁ\0019Re¥†\0264æ’؉üQâÓ°Ô^—\016\t¡‹\\:\016\003Š]4¤aLiˆ†8ìS\022Ão€'ðÿ\020a;¦Aš`‚<\032!/\"DF=\034'EåX^ÔˆÚ4‰KDCê‡.¹©¡ˆ\004Gµ4&8r\006EÍÄO\002r|šóóZðóú\026?\0274Š ½\030!\týâ;W8Ž‹k‡õ¬™¬ÉÀ\017¯2b1ÓA< \004„š€&J"
[4] "#ƒˆxGµz\035\032Jpâ;²C‡u\034\004’Ñôp«e^*Wz-Óz!ê\022\001èÌI\023ä;LÖ\v›õ‡¸O⺇¯Y!\031þ\024-mÍ·‡G#°›„¦Î#º¿ÉùÒò(ìó¶³f\177¤?}\017½<Cæ_eÎ\0276\t\035®ûÄœ\025À}rÌ\005òß$t}ï/IºM»µ*íÖšh\006\t#kåd³¡€âȹE÷CÌG·!\017ý°èø‡x†ä\a|³&jLJõìè>\016ú\t™aᾞ[\017—z¹«K¸çeØ¿=/"
[5] "\035æ\034vÎ÷Gûx?Ú'ûÝý`ßßwö¯v‹bÿFç\177F\177\035±?ÿýß\177þupþ'ƒ\035ösT´°ûï¢<+(Òx°Ó‰\"<‘G\021M(ãEŽ\003pa2¸¬`\aGýtÈFíî.úÏîAQÙ?\032ÉNDpBÎ\002Â"
Approach 9 - using BeautifulSoup (python) - returns encrypted characters
import urllib
htmltext = urllib.urlopen("").read()
print htmltext
Approach 10 - using wget on the linux terminal - gets a page with encrypted characters
wget -O page
Approach 11 -
tried manually by pasting the url to the below service - works
Approach 12 -
tried manually by pasting the url to the below service - works


PHP file_get_contents and cURL raise http error 500 internal server

I am using a function inside a PHP class for reading images from array of URLs and writing them on local computer.
Something like below:
function ImageUpload($urls)
$image_urls = explode(',', $urls);
foreach ($image_urls as $url)
$url = trim($url);
$img_name = //something
$source = file_get_contents($url);
$handle = fopen($img_name, "w");
fwrite($handle, $source);
It successfully read and write 1 or 2 images but raise 500 Internal severs for reading 2nd or 3rd image.
There is nothing important in Apache log file. Also i replace file_get_contents command with following cURL statements, but result is the same (it seems cURL reads one more image than file_get_contents).
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,false);
$source = curl_exec($ch);
Also the problem is only for reading from http URLs, and if I have images on somewhere local, there is no problem for reading and writing them.
I don't see any handler for reading in the loop , your $handle = fopen($img_name, "w"); is just for writing , you also need $handle = fopen($img_name, "r"); for reading ! because you can't read handle (fread () ) for fopen($img_name, "w");.
Additional answer :
Could you modify to (and see if it works):
$img_name = //something
$context = stream_context_create($image_urls );
$source= file_get_contents( $url ,false,$context);
I have made some changed to your code, hope that helps :)
$opts = array(
'http' => array(
'header'=>"Content-Type: text/html; charset=utf-8"
$context = stream_context_create($opts);
$image_urls = explode(',', $urls);
foreach ($image_urls as $url) {
$result = file_get_contents(trim($url),TRUE,$context);
if($result === FALSE) {
print "Error with this URL : " . $url . "<br />";
$handle = fopen($img_name, "a+");
fwrite($handle, $result);

PHP Get metadata of remote .mp3 file (from URL)

I am trying to get song name / artist name / song length / bitrate etc from a remote .mp3 file such as .
I have tried getID3 script but from what i understand it doesn't work for remote files as i got this error: "Remote files are not supported - please copy the file locally first"
Also, this code:
$tag = id3_get_tag( "" );
did not work either.
"Fatal error: Call to undefined function id3_get_tag() in /home4/shiro/public_html/scr/index.php on line 2"
As you haven't mentioned your error I am considering a common error case undefined function
The error you get (undefined function) means the ID3 extension is not enabled in your PHP configuration:
If you dont have Id3 extension file .Just check here for installation info.
Firstly, I didn’t create this, I’ve just making it easy to understand with a full example.
You can read more of it here, but only because of
To begin, download this library from here:
When you open the zip folder, you’ll see ‘getid3’. Save that folder in to your working folder.
Next, create a folder called “temp” in that working folder that the following script is going to be running from.
Basically, what it does is download the first 64k of the file, and then read the metadata from the file.
I enjoy a simple example. I hope this helps.
$url_media = ""
echo $a['tags']['id3v2']['album'][0] . "\n";
echo $a['tags']['id3v2']['artist'][0] . "\n";
echo $a['tags']['id3v2']['title'][0] . "\n";
echo $a['tags']['id3v2']['year'][0] . "\n";
echo $a['tags']['id3v2']['year'][0] . "\n";
echo "\n-----------------\n";
echo "-----------------\n";
function getfileinfo($remoteFile)
$uuid=uniqid("designaeon_", true);
$ch = curl_init($remoteFile);
//==============================Get Size==========================//
$contentLength = 'unknown';
$ch1 = curl_init($remoteFile);
curl_setopt($ch1, CURLOPT_NOBODY, true);
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch1, CURLOPT_HEADER, true);
curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, true); //not necessary unless the file redirects (like the PHP example we're using here)
$data = curl_exec($ch1);
if (preg_match('/Content-Length: (\d+)/', $data, $matches)) {
$contentLength = (int)$matches[1];
//==============================Get Size==========================//
if (!$fp = fopen($file, "wb")) {
echo 'Error opening temp file for binary writing';
return false;
} else if (!$urlp = fopen($url, "r")) {
echo 'Error opening URL for reading';
return false;
try {
$to_get = 65536; // 64 KB
$chunk_size = 4096; // Haven't bothered to tune this, maybe other values would work better??
$got = 0; $data = null;
// Grab the first 64 KB of the file
while(!feof($urlp) && $got < $to_get) { $data = $data . fgets($urlp, $chunk_size); $got += $chunk_size; } fwrite($fp, $data); // Grab the last 64 KB of the file, if we know how big it is.
if ($size > 0) {
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RESUME_FROM, $size - $to_get);
// Now $fp should be the first and last 64KB of the file!!
catch (Exception $e) {
echo 'Error transfering file using fopen and cURL !!';
return false;
$getID3 = new getID3;
$ThisFileInfo = $getID3->analyze($filename);
return $ThisFileInfo;

php my crawler crash after some time segmentation fault error

i am a newbie in PHP and with my knownledge i build a script in PHP but after some time it crash.
I tested it on 5-6 different Linux OS, debian, ubuntu, redhat, fedora,etc. Only on fedora don't crash but after 3-4 h of working he stops and don't give me any error. The process still remain open, he don't crash, just stop of working, but this only on fedora.
Here's my script code:
ini_set('max_execution_time', 0);
$file = fopen("t.txt", "r");
while(!feof($file)) {
$line = fgets($file);
$line = trim($line);
$line = crawler($line);
function crawler($line) {
$site = $line;
// Check target.
$agent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; pt-pt) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27";
curl_setopt ($ch, CURLOPT_URL,$line);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if($httpcode>=200 && $httpcode<=300) {
$check2 = $html = #file_get_html($site);
if($check2 === false) {
return $line;
} else {
foreach($html->find('a') as $element) {
$checkurl = parse_url($element->href);
$checkline = parse_url($line);
if(isset($checkurl['scheme'], $checkurl['host'])) {
if($checkurl['host'] !== $checkline['host']) {
$split = str_split($checkurl['host']);
$replacethis = ".";
$replacewith = "dot";
for($i=0;$i<count($split);$i++) {
if($split[$i] == $replacethis) {
$split[$i] = $replacewith;
foreach($split as $element2) {
if(!chdir($element2)) { mkdir($element2); chdir($element2); };
$save = fopen('results.txt', 'a'); $txt = "$line,$element->innertext\n"; fwrite($save,$txt); fclose($save);
So my script crawl all backlinks from the targets i specified in t.txt, but only outgoing backlinks... then he scale on directories and save the information.
Here are the errors I got:
Allowed memory size of 16777216 bytes exhausted (tried to allocate 24 bytes)
Segmentation fault (core dumped)
It seems somewhere is a bug.. something is wrong... any ideea? Thanks.
Such error can be thrown when you haven't free memory. I believe it happens inside your simple_html_dom. You need to use
void clear () Clean up memory.
while using it in loop according to its documentation
Also you perform two http request for each line. But it's enough only one curl request. Just save responce
$html = curl_exec($ch);
and than use str_get_html($html) instead of file_get_html($site);
also it's bad practice to use error suppression operator #. If it can throw an exception you better handle it by try ... catch construction.
Also you don't need to do such things
$site = $line;
just use $line
and finally instead of your long line $save = fopen('results.txt', 'a');............... you can use simple file_put_contents()
And i suggest you to output to console what you actually doing now. Like
echo "getting HTML from URL ".$line
echo "parsing text..."
so you can control process somehow

Youtube channel subscriber count

I'm trying to pull the count of subscribers for a particular youtube channel. I referred some links on Stackoverflow as well as external sites, came across links like this. Almost all the links suggested me to use youtube gdata api and pull the count from subscriberCount but the following code
$data = file_get_contents("");
$xml = simplexml_load_string($data);
returns no such subscriberCount. Is there any other way of getting subscribers count or am I doing something wrong?
The YouTube API v2.0 is deprecated. Here's how to do it with 3.0. OAuth is not needed.
1) Log in to a Google account and go to You may have to start a new project.
2) Navigate to APIs & auth and go to Public API Access -> Create a New Key
3) Choose the option you need (I used 'browser applications') This will give you an API key.
4) Navigate to your channel in YouTube and look at the URL. The channel ID is here:
5) Use the API key and channel ID to get your result with this query:
Great success!
Documentation is actually pretty good, but there's a lot of it. Here's a couple of key links:
Channel information documentation:
"Try it" page:
Try this ;)
$data = file_get_contents('');
$xml = new SimpleXMLElement($data);
$stats_data = (array)$xml->children('yt', true)->statistics->attributes();
$stats_data = $stats_data['#attributes'];
/********* OR **********/
$data = file_get_contents('');
$data = json_decode($data, true);
$stats_data = $data['entry']['yt$statistics'];
echo 'lastWebAccess = '.$stats_data['lastWebAccess'].'<br />';
echo 'subscriberCount = '.$stats_data['subscriberCount'].'<br />';
echo 'videoWatchCount = '.$stats_data['videoWatchCount'].'<br />';
echo 'viewCount = '.$stats_data['viewCount'].'<br />';
echo 'totalUploadViews = '.$stats_data['totalUploadViews'].'<br />';
I could do it with regex for my page , not sure does it work for you or not . check following codes:
$channel = '';
$t = file_get_contents($channel);
$pattern = '/yt-uix-tooltip" title="(.*)" tabindex/';
preg_match($pattern, $t, $matches, PREG_OFFSET_CAPTURE);
echo $matches[1][0];
//this code was written by Abdu ElRhoul
//If you have any questions please contact me at
//My website is
function retrieveContent($url){
$file = fopen($url,"rb");
if (!$file)
return "";
while (feof ($file)===false) {
$line = fgets ($file, 1024);
$salida .= $line;
return $salida;
$content = retrieveContent(""); //replace rhoula with the channel name
$start = strpos($content,'<span class="about-stat"><b>');
$end = strpos($content,'</b>',$start+1);
$output = substr($content,$start,$end-$start);
echo "Number of Subscribers = $output";
echo get_subscriber("UCOshmVNmGce3iwozz55hpww");
function get_subscriber($channel,$use = "user") {
(int) $subs = 0;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "".$use."/".$channel."/about?disable_polymer=1");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt($ch, CURLOPT_POST, 0 );
curl_setopt($ch, CURLOPT_REFERER, '');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0');
$result = curl_exec($ch);
$R = curl_getinfo($ch);
if($R["http_code"] == 200) {
$pattern = '/yt-uix-tooltip" title="(.*)" tabindex/';
preg_match($pattern, $result, $matches, PREG_OFFSET_CAPTURE);
$subs = intval(str_replace(',','',$matches[1][0]));
if($subs == 0 && $use == "user") return get_subscriber($channel,"channel");
return $subs;

Caching JSON output in PHP

Got a slight bit of an issue. Been playing with the facebook and twitter API's and getting the JSON output of status search queries no problem, however I've read up further and realised that I could end up being "rate limited" as quoted from the documentation.
I was wondering is it easy to cache the JSON output each hour so that I can at least try and prevent this from happening? If so how is it done? As I tried a youtube video but that didn't really give much information only how to write the contents of a directory listing to a cache.php file, but it didn't really point out whether this can be done with JSON output and certainly didn't say how to use the time interval of 60 minutes or how to get the information then back out of the cache file.
Any help or code would be very much appreciated as there seems to be very little in tutorials on this sorta thing.
Here a simple function that adds caching to getting some URL contents:
function getJson($url) {
// cache files are created like cache/abcdef123456...
$cacheFile = 'cache' . DIRECTORY_SEPARATOR . md5($url);
if (file_exists($cacheFile)) {
$fh = fopen($cacheFile, 'r');
$size = filesize($cacheFile);
$cacheTime = trim(fgets($fh));
// if data was cached recently, return cached data
if ($cacheTime > strtotime('-60 minutes')) {
return fread($fh, $size);
// else delete cache file
$json = /* get from Twitter as usual */;
$fh = fopen($cacheFile, 'w');
fwrite($fh, time() . "\n");
fwrite($fh, $json);
return $json;
It uses the URL to identify cache files, a repeated request to the identical URL will be read from the cache the next time. It writes the timestamp into the first line of the cache file, and cached data older than an hour is discarded. It's just a simple example and you'll probably want to customize it.
It's a good idea to use caching to avoid the rate limit.
Here's some example code that shows how I did it for Google+ data,
in some php code I wrote recently.
private function getCache($key) {
$cache_life = intval($this->instance['cache_life']); // minutes
if ($cache_life <= 0) return null;
// fully-qualified filename
$fqfname = $this->getCacheFileName($key);
if (file_exists($fqfname)) {
if (filemtime($fqfname) > (time() - 60 * $cache_life)) {
// The cache file is fresh.
$fresh = file_get_contents($fqfname);
$results = json_decode($fresh,true);
return $results;
else {
return null;
private function putCache($key, $results) {
$json = json_encode($results);
$fqfname = $this->getCacheFileName($key);
file_put_contents($fqfname, $json, LOCK_EX);
and to use it:
// $cacheKey is a value that is unique to the
// concatenation of all params. A string concatenation
// might work.
$results = $this->getCache($cacheKey);
if (!$results) {
// cache miss; must call out
$results = $this->getDataFromService(....);
$this->putCache($cacheKey, $results);
I know this post is old, but it show in google so for everyone looking, I made this simple one that curl a JSON url and cache it in a file that is in a specific folder, when json is requested again if 5min passed it will curl it if the 5min didnt pass yet, it will show it from file, it uses timestamp to track time and yea, enjoy
function ccurl($url,$id){
$path = "./private/cache/$id/";
$files = scandir($path);
$files = array_values(array_diff(scandir($path), array('.', '..')));
if(count($files) > 1){
foreach($files as $file){
$files = scandir($path);
$files = array_values(array_diff(scandir($path), array('.', '..')));
$c = curl_init();
curl_setopt($c, CURLOPT_URL, $url);
curl_setopt($c, CURLOPT_TIMEOUT, 15);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_USERAGENT,
'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0');
$response = curl_exec($c);
curl_close ($c);
$fp = file_put_contents($path.time().'.json', $response);
return $response;
}else {
if(time() - str_replace('.json', '', $files[0]) > 300){
$c = curl_init();
curl_setopt($c, CURLOPT_URL, $url);
curl_setopt($c, CURLOPT_TIMEOUT, 15);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_USERAGENT,
'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0');
$response = curl_exec($c);
curl_close ($c);
$fp = file_put_contents($path.time().'.json', $response);
return $response;
}else {
return file_get_contents($path. $files[0]);
for usage create a directory for all cached files, for me its /private/cache then create another directory inside for the request cache like x for example, and when calling the function it should be like htis
where x is the id, if u have question pls ask me ^_^ also enjoy (i might update it later so it doesn't use a directory for id
