Trying to get a large xml file through api.php - php

I'm trying to get the contents of a large xml file from api.My problem is that it takes forever to response actually i waited an hours and still nothing.Even if i try the url to my browser it just time out.Is there a better way to get the file?
Here is my code
$fp = fopen('a.xml','wb');
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec($ch);
curl_close($ch);
fclose($fp);
Update:
I used this code but its the same it keeps loading it think its some memory problem it tries to load the whole file before send it to simplexml.If i put RETURNTRANSFER to false it will lower memory usage?
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0", // something like Firefox
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 0, // timeout on connect
CURLOPT_TIMEOUT => 0, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$curl = curl_init('someurl.com');
curl_setopt_array( $curl, $options );
$content = curl_exec($curl);
curl_close($curl);

One problem I notice immediately with your code snippet is that you open a.xml by assigning the file handle to $fp, but you reference it in your curl_setopt as $fh.
How big is the XML? Depending on how much memory you allow php to consume, you could get the entire contents of the file with something as simple as:
$xmlString = file_get_contents('a.xml');
Or if you plan to process the xml further, you could load the file into a SimpleXMLElement:
$xml = simplexml_load_file('a.xml');
or into a DOMDocument:
$xmlDom = new DOMDocument();
$xmlDom->load('a.xml');

Related

CURL is not working to send GET request in php

I am using CURL to send get request to a server, but it is not producing the required output.
The script works fine when loaded in browser.
It is script which combines given PNG images to GIF.
<?php
//generate GIF
$name = $_GET['name_of_final_gif'];
$images = $_GET['images'];
$images = json_decode($images, true);
//include GIF maker class based on GD library
include('GIFEncoder.class.php');
/******************************************************/
foreach($images as $image_link) {
// Open the source image and add the text.
$image = imagecreatefrompng($image_link);
// Generate GIF from the $image
// We want to put the binary GIF data into an array to be used later,
// so we use the output buffer.
ob_start();
imagegif($image);
$frames[]=ob_get_contents();
$framed[]=300; // Delay in the animation.
ob_end_clean();
}
// Generate the animated gif and save it
$gif = new GIFEncoder($frames,$framed,0,2,0,0,0,'bin');
$fp = fopen("gifs/$name", 'w');
fwrite($fp, $gif->GetAnimation());
fclose($fp);
?>
Update:
Below is my CURL code which is on another server and is sending GET request to this script which is hosted on another server:
$images = $class_name->get_images_links(); // get image links from database in JSON FORMAT
$name = "something.gif"; //name for output GIF image
$url = "http://example.com/make_gif.php?images=$images&&name_of_final_gif=$name";
// Get cURL resource
$curl = curl_init();
// Set some options
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_URL => $url
));
// Send the request & save response to $resp
$resp = curl_exec($curl);
// Close request to clear up some resources
curl_close($curl);
Try this
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://example.com/make_gif.php?images=$images&&name_of_final_gif=$name");
curl_exec($ch);
curl_close($ch);
Try this:
function curl($url) {
// Assigning cURL options to an array
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE, // Setting cURL's option to return the webpage data
CURLOPT_FOLLOWLOCATION => TRUE, // Setting cURL to follow 'location' HTTP headers
CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
CURLOPT_TIMEOUT => 120, // Setting the maximum amount of time for cURL to execute queries
CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8", // Setting the useragent
CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
);
$ch = curl_init(); // Initialising cURL
curl_setopt_array($ch, $options); // Setting cURL's options using the previously assigned array data in $options
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
$page = curl($link);

How to parse website content received from a website with curl

I am trying to read the content of a website using cURL to compare some data. I accomplished to receive the content of the webpage with cURL but when I want to extract some data out of the content is it not working. I parse the content with DOMDocument but it seems that characters like & and € and so on does not get converted in a good way, so it crashes. that is why I put htmlentities with it but that also does not work.
This is one of the errors i receive:
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 37 in URL on line 40
Can anyone suggest me what I should do different?
This is how I get the content of a website:
function get_web_page( $url )
{
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => false, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
$html = get_web_page("url of a website");
And this is how i tought i should parse it:
$dom = new DOMDocument;
$dom->loadHTML(mb_convert_encoding($html["content"], 'HTML-ENTITIES', 'UTF- 8'));
foreach($dom->getElementsByTagName('div') as $div){
echo $div->nodeValue."<br>";
}
But actually I am looking for a value from a specific div with a class, only that value do you know how I am able to get that ?
I use SimpleHTMLDom, it is quite easy and well documented.
You can even find a bunch of questions here in StackOverflow

XAMPP echoing part of my PHP code

I'm working within the XAMPP environment on a windows 7 64-bit machine. I have Apache 2.4 service installed. The issue I'm having has baffled me for about a day now.
My php files have all executed as expected up to this point. Recently, I've created a file which begins with the following:
function get_web_page($url,$attempt=1){
if($attempt <4){
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120427 Firefox/15.0a1", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 30, // timeout on connect
CURLOPT_TIMEOUT => 30, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
if($err == 0){
return $content;
}else{
return get_web_page( $url, $attempt + 1 );
}
}else{
return FALSE;
}
}
A simple function to retrieve a web page, and it doesn't echo anything, either.
But when I visit this page in a browser (which at this point ONLY defines a function and nothing else), it prints to the page everything following the first instance of "=>" (without quotes). I don't understand why this is. All of my other php files in the same directory behave as expected.
Please help me understand why this is happening and what steps I should take to resolve it.
Look at the source of the page given to your browser and you'll probably see the entire php source in plaintext. It's only rendering what's after the first => because that's likely the first closing > it finds after the opening < in <?php. The first part doesn't render because your browser thinks it's inside some strange HTML tag.
Check your apache config, because it's not routing requests for *.php pages through the PHP interpreter.

CURL script in PHP for blacklist of an ip using XPATH

I want to make a little script that returns me a result depending of how much a ip has been blacklisted.
Result must be like 23/100 means that 23 has blacklisted that ip or 45/100 2/100 ... and so on.
First of all i fetch trough CURL from http://whatismyipaddress.com/blacklist-check sending a post request some data :
<?php
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page($url,$argument1)
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (FM Scene 4.6.1)", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => "LOOKUPADDRESS=".$argument1,
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
echo "<pre>";
$result = get_web_page("http://whatismyipaddress.com/blacklist-check","75.122.17.117");
// print_r($result['content']);
// in $result['content'] we have the whole pag
// Creating xpath and fill it with data
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTMLFile($result['content']); // loads your html
$xpath = new DOMXPath($doc);
// Get that table
$value = $xpath->evaluate("string(/html/body/div/div/div/table/text())");
echo "Table with blacklists: [$value]\n"; // prints your location
die;
?>
Now what i want is to parse the data with XPATH /html/body/div/div/div/table/text() and where i see the image (!) mark it as blacklisted, otherwise do nothing.
Can anyone help me?
I also observed that vewing the (!) image requires a token, i might switch to another site, but i like that particular website because it has all the websites.
Thank you!
definitely you need this :)
Simple DOM Parser

How to proxy another page in PHP

I'm looking for the fastest and easiest way to proxy a page in PHP. I don't want the user to be redirected, I just want my script to return the same content, response code and headers as another remote URL.
echo file_get_contents('proxypage');
Would that work?
EDIT:
First answer was a bit short, and I don't believe it will handle headers as you would like.
However you can also do this:
function get_proxy_site_page( $url )
{
$options = [
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
];
$ch = curl_init($url);
curl_setopt_array($ch, $options);
$remoteSite = curl_exec($ch);
$header = curl_getinfo($ch);
curl_close($ch);
$header['content'] = $remoteSite;
return $header;
}
This will return you an array containing lots of information on the remote page. $header['content'] will have both the content of the website and the headers, $header[header_size] will contain the length of that header so you can use substr to split those up.
Then it's just a matter of using echoand header to proxy the page.
You can use the PHP cURL functions to achieve this functionality:
http://www.php.net/curl
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// grab URL and pass it to the browser
$urlContent = curl_exec($ch);
From this point, you would grab the response header information using http://www.php.net/curl-getinfo. (There are several values you can grab, all listed in the documentation).
// Check if any error occured
if(!curl_errno($ch))
{
$info = curl_getinfo($ch);
header('Content-Type: '.$info['content_type']);
echo $urlContent;
}
Make sure to close out the cURL handle.
// close cURL resource, and free up system resources
curl_close($ch);
You can get the html of the next page with curl, and then echo the response.

Categories