Why my pdf document generated from html looks weird? - php

I have made a simple laravel app to create a pdf document from a page through an url. But my pdf doesn't get the right style from the page and sometimes looks weird. Am I doing it wrong?
This is google.com
This is what I'm doing with dompdf
$pdf->loadHTML($content); <--- HTML getted with curl
$pdf->setPaper('A2', 'portrait');
$output = $pdf->output();
This is how I get the html on a string.
protected function get_web_page( $url )
{
/**
* Send a GET requst using cURL
* #param string $url to request
* #param array $user_agent values to send
* #param array $options for cURL
* #return string
*/
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 10, // timeout on connect
CURLOPT_TIMEOUT => 10, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}

You're probably not getting the style of the page via your curl.
See if this helps providing everything on your "$content" variable.
PHP: Get all CSS files of an HTML web page

Related

php curl_exec doesn't outupt anything [duplicate]

I found this function that does an AWESOME job (IMHO): http://nadeausoftware.com/articles/2007/06/php_tip_how_get_web_page_using_curl
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
The only problem I have is that it doesn't work for https://. Anny ideas what I need to do to make this work for https? Thanks!
Quick fix, add this in your options:
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false)
Now you have no idea what host you're actually connecting to, because cURL will not verify the certificate in any way. Hope you enjoy man-in-the-middle attacks!
Or just add it to your current function:
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_SSL_VERIFYPEER => false // Disabled SSL Cert checks
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
I was trying to use CURL to do some https API calls with php and ran into this problem. I noticed a recommendation on the php site which got me up and running: http://php.net/manual/en/function.curl-setopt.php#110457
Please everyone, stop setting CURLOPT_SSL_VERIFYPEER to false or 0. If
your PHP installation doesn't have an up-to-date CA root certificate
bundle, download the one at the curl website and save it on your
server:
http://curl.haxx.se/docs/caextract.html
Then set a path to it in your php.ini file, e.g. on Windows:
curl.cainfo=c:\php\cacert.pem
Turning off CURLOPT_SSL_VERIFYPEER allows man in the middle (MITM)
attacks, which you don't want!
Another option like Gavin Palmer answer is to use the .pem file but with a curl option
download the last updated .pem file from https://curl.haxx.se/docs/caextract.html and save it somewhere on your server(outside the public folder)
set the option in your code instead of the php.ini file.
In your code
curl_setopt($ch, CURLOPT_CAINFO, $_SERVER['DOCUMENT_ROOT'] . "/../cacert-2017-09-20.pem");
NOTE: setting the cainfo in the php.ini like #Gavin Palmer did is better than setting it in your code like I did, because it will save a disk IO every time the function is called, I just make it like this in case you want to test the cainfo file on the fly instead of changing the php.ini while testing your function.
One important note, the solution mentioned above will not work on local host, you have to upload your code to server and then it will work. I was getting no error, than bad request, the problem was I was using localhost (test.dev,myproject.git). Both solution above work, the solution that uses SSL cert is recommended.
Go to https://curl.haxx.se/docs/caextract.html, download the latest cacert.pem. Store is somewhere (not in public folder - but will work regardless)
Use this code
".$result;
//echo "Path:".$_SERVER['DOCUMENT_ROOT'] . "/ssl/cacert.pem";
// this is for troubleshooting only ?>
Upload the code to live server and test.

How to parse website content received from a website with curl

I am trying to read the content of a website using cURL to compare some data. I accomplished to receive the content of the webpage with cURL but when I want to extract some data out of the content is it not working. I parse the content with DOMDocument but it seems that characters like & and € and so on does not get converted in a good way, so it crashes. that is why I put htmlentities with it but that also does not work.
This is one of the errors i receive:
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 37 in URL on line 40
Can anyone suggest me what I should do different?
This is how I get the content of a website:
function get_web_page( $url )
{
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => false, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
$html = get_web_page("url of a website");
And this is how i tought i should parse it:
$dom = new DOMDocument;
$dom->loadHTML(mb_convert_encoding($html["content"], 'HTML-ENTITIES', 'UTF- 8'));
foreach($dom->getElementsByTagName('div') as $div){
echo $div->nodeValue."<br>";
}
But actually I am looking for a value from a specific div with a class, only that value do you know how I am able to get that ?
I use SimpleHTMLDom, it is quite easy and well documented.
You can even find a bunch of questions here in StackOverflow

Using CURL To Run PHP Scripts

I have created an internal billing system where i need to generate invoices for a customer based on their billing schedule however i have run into a problem when running PHP scripts from CURL and was wondering if there is any way round it
I currently have a CRON task that runs a php script called crontask.php
crontask.php then calculates if the customer needs an invoice generated and sent to them via email. If it calculates that it does then it will try and call an url that will create the Invoice and send the email using CURL i.e (www.internal.co.uk/invoicing/geninvoice.php?CUST=10)
function get_web_page($url)
{
$ua = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13';
echo "curl:url<pre>".$url."</pre><BR>";
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => $ua, // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 15, // timeout on connect
CURLOPT_TIMEOUT => 15, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init($url);
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch,CURLINFO_EFFECTIVE_URL );
curl_close( $ch );
if(isset($header['errno'])) {
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
}
//change errmsg here to errno
if (isset($errno)) {
echo "CURL:".$errmsg."<BR>";
}
return $content;
}
When running this i am getting access denied when trying to run from curl in PHP,
The server is running on virtualmin/webmin and i have root access, is there something i need to change or add authentication to the script?

CURL script in PHP for blacklist of an ip using XPATH

I want to make a little script that returns me a result depending of how much a ip has been blacklisted.
Result must be like 23/100 means that 23 has blacklisted that ip or 45/100 2/100 ... and so on.
First of all i fetch trough CURL from http://whatismyipaddress.com/blacklist-check sending a post request some data :
<?php
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page($url,$argument1)
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (FM Scene 4.6.1)", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => "LOOKUPADDRESS=".$argument1,
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
echo "<pre>";
$result = get_web_page("http://whatismyipaddress.com/blacklist-check","75.122.17.117");
// print_r($result['content']);
// in $result['content'] we have the whole pag
// Creating xpath and fill it with data
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTMLFile($result['content']); // loads your html
$xpath = new DOMXPath($doc);
// Get that table
$value = $xpath->evaluate("string(/html/body/div/div/div/table/text())");
echo "Table with blacklists: [$value]\n"; // prints your location
die;
?>
Now what i want is to parse the data with XPATH /html/body/div/div/div/table/text() and where i see the image (!) mark it as blacklisted, otherwise do nothing.
Can anyone help me?
I also observed that vewing the (!) image requires a token, i might switch to another site, but i like that particular website because it has all the websites.
Thank you!
definitely you need this :)
Simple DOM Parser

Capture a redirect URL using PHP

I want to use PHP to get the URL of the page to which the following address redirects:
http://peacecorpsjournals.com/journal/6731
The script should return the following URL to which the URL above redirects:
http://ghanakimsuri.blogspot.com/
One way (of many) to do this is to open the URL with fopen, then use stream_get_meta_data to grab the headers. This is a quick snippet I grabbed from something I wrote a while back:
$fh = fopen($uri, 'r');
$details = stream_get_meta_data($fh);
foreach ($details['wrapper_data'] as $line) {
if (preg_match('/^Location: (.*?)$/i', $line, $m)) {
// There was a redirect to $m[1]
}
}
Note you can have multiple redirections, and they can be relative as well as absolute.
You can do this using cURL.
<?php
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
//$header['errno'] = $err;
// $header['errmsg'] = $errmsg;
//$header['content'] = $content;
print($header[0]);
return $header;
}
$thisurl = "http://www.example.com/redirectfrom";
$myUrlInfo = get_web_page( $thisurl );
echo $myUrlInfo["url"];
?>
Code found here: http://forums.devshed.com/php-development-5/curl-get-final-url-after-inital-url-redirects-544144.html
I've found this resource to be the most complete, thought-out approach and explanation. The code isn't the shortest snippit, but you'll end up being able to track multiple redirects with a couple lines like this:
$result = get_all_redirects('http://bit.ly/abc123');
print_r($result);
I found out that you may simply use the following code to get the redirect URL on a simple redirection. This will not work on recursive redirections.
$headers = get_headers("https://graph.facebook.com/me/picture?access_token=__token__", 1);
$image_url = $headers['Location'];
** The example above is to capture the Facebook profile image url from the Graph API call, which is issued along with HTTP 302 header.

Categories