I have created an internal billing system where i need to generate invoices for a customer based on their billing schedule however i have run into a problem when running PHP scripts from CURL and was wondering if there is any way round it
I currently have a CRON task that runs a php script called crontask.php
crontask.php then calculates if the customer needs an invoice generated and sent to them via email. If it calculates that it does then it will try and call an url that will create the Invoice and send the email using CURL i.e (www.internal.co.uk/invoicing/geninvoice.php?CUST=10)
function get_web_page($url)
{
$ua = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13';
echo "curl:url<pre>".$url."</pre><BR>";
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => $ua, // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 15, // timeout on connect
CURLOPT_TIMEOUT => 15, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init($url);
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch,CURLINFO_EFFECTIVE_URL );
curl_close( $ch );
if(isset($header['errno'])) {
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
}
//change errmsg here to errno
if (isset($errno)) {
echo "CURL:".$errmsg."<BR>";
}
return $content;
}
When running this i am getting access denied when trying to run from curl in PHP,
The server is running on virtualmin/webmin and i have root access, is there something i need to change or add authentication to the script?
Related
I have made a simple laravel app to create a pdf document from a page through an url. But my pdf doesn't get the right style from the page and sometimes looks weird. Am I doing it wrong?
This is google.com
This is what I'm doing with dompdf
$pdf->loadHTML($content); <--- HTML getted with curl
$pdf->setPaper('A2', 'portrait');
$output = $pdf->output();
This is how I get the html on a string.
protected function get_web_page( $url )
{
/**
* Send a GET requst using cURL
* #param string $url to request
* #param array $user_agent values to send
* #param array $options for cURL
* #return string
*/
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 10, // timeout on connect
CURLOPT_TIMEOUT => 10, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
You're probably not getting the style of the page via your curl.
See if this helps providing everything on your "$content" variable.
PHP: Get all CSS files of an HTML web page
I am trying to read the content of a website using cURL to compare some data. I accomplished to receive the content of the webpage with cURL but when I want to extract some data out of the content is it not working. I parse the content with DOMDocument but it seems that characters like & and € and so on does not get converted in a good way, so it crashes. that is why I put htmlentities with it but that also does not work.
This is one of the errors i receive:
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 37 in URL on line 40
Can anyone suggest me what I should do different?
This is how I get the content of a website:
function get_web_page( $url )
{
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => false, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
$html = get_web_page("url of a website");
And this is how i tought i should parse it:
$dom = new DOMDocument;
$dom->loadHTML(mb_convert_encoding($html["content"], 'HTML-ENTITIES', 'UTF- 8'));
foreach($dom->getElementsByTagName('div') as $div){
echo $div->nodeValue."<br>";
}
But actually I am looking for a value from a specific div with a class, only that value do you know how I am able to get that ?
I use SimpleHTMLDom, it is quite easy and well documented.
You can even find a bunch of questions here in StackOverflow
I am trying to make a cURL request. The problem I am facing is that the page have different text depending on which country it is. So I would like the cURL request to have the language en_US (English). So it will get the English text on the website.
Currently I have this code, but its not getting the US text.
$url = 'http://testurl.com'; // Not the real URL
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_HTTPHEADER => array("Accept-Language: en-US;q=0.6,en;q=0.4"),
CURLOPT_USERAGENT => Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15"),
);
$ch = curl_init($url);
curl_setopt_array($ch, $options);
$content = curl_exec($ch);
$err = curl_errno($ch);
$errmsg = curl_error($ch);
$header = curl_getinfo($ch);
curl_close($ch);
echo htmlspecialchars($content);
So to make this simple, I would like the cURL request to send the request with the US language, if possible.
Right now it has the language 'dutch' I think this is because my hosting server is located in Netherlands. So therefore it is deutch. But I would like to change it to English.
I'm working within the XAMPP environment on a windows 7 64-bit machine. I have Apache 2.4 service installed. The issue I'm having has baffled me for about a day now.
My php files have all executed as expected up to this point. Recently, I've created a file which begins with the following:
function get_web_page($url,$attempt=1){
if($attempt <4){
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120427 Firefox/15.0a1", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 30, // timeout on connect
CURLOPT_TIMEOUT => 30, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
if($err == 0){
return $content;
}else{
return get_web_page( $url, $attempt + 1 );
}
}else{
return FALSE;
}
}
A simple function to retrieve a web page, and it doesn't echo anything, either.
But when I visit this page in a browser (which at this point ONLY defines a function and nothing else), it prints to the page everything following the first instance of "=>" (without quotes). I don't understand why this is. All of my other php files in the same directory behave as expected.
Please help me understand why this is happening and what steps I should take to resolve it.
Look at the source of the page given to your browser and you'll probably see the entire php source in plaintext. It's only rendering what's after the first => because that's likely the first closing > it finds after the opening < in <?php. The first part doesn't render because your browser thinks it's inside some strange HTML tag.
Check your apache config, because it's not routing requests for *.php pages through the PHP interpreter.
I want to make a little script that returns me a result depending of how much a ip has been blacklisted.
Result must be like 23/100 means that 23 has blacklisted that ip or 45/100 2/100 ... and so on.
First of all i fetch trough CURL from http://whatismyipaddress.com/blacklist-check sending a post request some data :
<?php
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page($url,$argument1)
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (FM Scene 4.6.1)", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => "LOOKUPADDRESS=".$argument1,
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
echo "<pre>";
$result = get_web_page("http://whatismyipaddress.com/blacklist-check","75.122.17.117");
// print_r($result['content']);
// in $result['content'] we have the whole pag
// Creating xpath and fill it with data
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTMLFile($result['content']); // loads your html
$xpath = new DOMXPath($doc);
// Get that table
$value = $xpath->evaluate("string(/html/body/div/div/div/table/text())");
echo "Table with blacklists: [$value]\n"; // prints your location
die;
?>
Now what i want is to parse the data with XPATH /html/body/div/div/div/table/text() and where i see the image (!) mark it as blacklisted, otherwise do nothing.
Can anyone help me?
I also observed that vewing the (!) image requires a token, i might switch to another site, but i like that particular website because it has all the websites.
Thank you!
definitely you need this :)
Simple DOM Parser