I have been struggling for almost three days now on this task, and I guess I am missing on some basic cURL skills.
I start with:
In the F12 of IE I see 2 POSTs on the first page: (I notice the first one is getting a 302 which is supposed to be a redirect, and with cURL I only get 200)
Filling up the captcha:
on the second page (after captcha):
traffic:
This is my code (and I cannot move on with it because it doesn't work for the early stages):
I Built a special form that submits to my own page with GET (with the cURL) which in turn is accessing the website:
$id=$_GET['id']; // getting the biznumber
$humanCode=$_GET['nobot'];
$curl = curl_init();
curl_setopt ($curl, CURLOPT_URL, "https://www.*******.******.***");
// setting some https to be able to access the website from my local computer.
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($curl, CURLOPT_CAINFO, "c:/xampp/htdocs/CAcerts/curl-ca-bundle.crt");
// I know the values for the ASPX vars like __EVENTTARGET, __EVENTARGUMENT, __VIEWSTATE are arbitrary now. I need to take care of that but I don't yet know how.
$postarr= array (
"__EVENTTARGET"=>"",
"__VIEWSTATE=" =>"%2FwEPDwULLTEzMzI2OTg4NDYPZBYCZg9kFgQCBA8PZBYCHgdvbmNsaWNrBQxnb1RvTWl2emFrKClkAgYPD2QWAh8ABQxnb1RvTWl2emFrKClkZM6iZZ0Qaf2CpfXoJJdZ0IqaWsDO",
"__EVENTARGUMENT=" =>"",
"__EVENTVALIDATION" =>"%2FwEWBQKgysLGCwL2r7SGDQLh4ri%2BAwLWws7NDwLWwpLPD%2F1HuCAFYzs2seaziWbYEXjDfigP",
"hidUrlFileIshurim"=>"https%3A%2F,
"cod"=>"3322"
);
$fields_string='';
foreach($postarr as $key=>$value) { $fields_string .= $key.'='.$value.'&'; }
rtrim($fields_string,'&');
curl_setopt($curl, CURLOPT_POST ,1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $fields_string);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
curl_setopt ($curl, CURLOPT_USERAGENT, "User-Agent Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; MAAU)");
// I made a cookie file and it seems to work
$cookiefile = "d:/cookie.txt";
curl_setopt($curl, CURLOPT_COOKIEJAR, $cookiefile);
curl_setopt($curl, CURLOPT_COOKIEFILE, $cookiefile);
curl_setopt($curl, CURLOPT_FRESH_CONNECT , 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION ,1);
curl_setopt($curl, CURLOPT_HEADER ,1); // DO NOT RETURN HTTP HEADERS
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$temp=curl_exec($curl);
$info = curl_getinfo($curl);
$html = mb_convert_encoding($temp, 'HTML-ENTITIES', 'utf-8');
echo "ERRCODE: ".curl_error($curl);
echo '<br /><br />';
echo "INFO : ";
print_r($info);
echo '<br /><br />';
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
echo "CODE: ".$httpcode;
echo '<br /><br />';
echo "CODE: ".$httpcode;
echo '<br /><br />';
echo "VARS: ".$vars;
echo '<br /><br />';
//echo $html;
curl_setopt ($curl, CURLOPT_URL, "https://www.*******.******.***");
curl_setopt($curl, CURLOPT_FRESH_CONNECT , 0);
echo "<br /><br /><b>2nd</b><br /><br />";
$temp=curl_exec($curl);
$info = curl_getinfo($curl);
$html = mb_convert_encoding($temp, 'HTML-ENTITIES', 'utf-8');
echo "ERRCODE: ".curl_error($curl);
echo '<br /><br />';
echo "INFO : ";
print_r($info);
echo '<br /><br />';
echo $html;
Can't get that to even start to work. It starts with returning me a 200 OK, instead of 302, and sometimes I also get a 500.
I know the ASPX vars might actually be crucial, but if my browser can make these vars and send them to the server, can't cURL do the same ?
Thanks for any help !!
Problem solved.
It was a matter of using the correct headers.
Following the reports from the browser, I went through all steps and the result showed up.
I went through each step by using:
curl_init
curl_setopt()
..
curl_setopt()
curl_exec()
curl_close()
This way I had to manually set each request and go through the settings. It made the code longer, but much easier to understand.
I had thoughts about the site using some javascript special code to make the site work, so I was troubled a lot by all the extra, javascript code, which turned out unnecessary.
It was all about being alot more organized and following the correct header settings.
Moreover, since this was an ASPX site, I had to read and memorize the VIEWSTATE and VALIDATION of the last page in each iteration. That is the first and very reason for the 500 internal error server message I used to get all the time.
I used Firebug and LiveHttpHeaders to concolude each step.
"Can't get that to even start to work. It starts with returning me a 200 OK, instead of 302, and sometimes I also get a 500."
curl_setopt($curl, CURLOPT_FOLLOWLOCATION ,1);
You have Curl set to follow any 302 redirects. These will be followed internally inside of Curl and won't be seen by PHP.
Also:
curl_setopt($curl, CURLOPT_HEADER ,1); // DO NOT RETURN HTTP HEADERS
The comment does exactly the opposite of what the code does....which seems wrong.
before u made cURL, u need to review the requeste field used. usually HTTP 500 from aspx is not found the field send..
foreach($postarr as $key=>$value) {
$fields_string .= $key.'='.$value.'&';
echo" $fields_string <br> ";
}
make sure, that field are not dinamic when u r sending request..
Hope this helpfull..
I used this:
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]);
It kind of simulates the curl like it has a browser name and version.
Related
Why is it not working code, why - I do not understand. Code gets a response from curl and looking (must look) in this response word yes, if it is found - that displays the text - if not, then the other. The code:
<?PHP
// CURL
$ch = curl_init('http://dev.local/phpwhois-4.2.2/example.php?query=domain.ru&output=object');
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0");
curl_setopt ($ch, CURLOPT_HEADER, false);
$curl = curl_exec($ch);
echo $curl;
curl_close($ch);
if(preg_match('~\s*yes\s*~u', $curl))
echo 'Ok';
else
echo 'Else text';
?>
Error strange, more precisely, its not quite there, but - if curl sends text yes, that does not work, then writes that else, and if it does not give a text - too else. If all the text that simply gives curl himself put in the variable it works.
That's what gives the script to curl `e (this answer in writing what else):
regrinfo->Array disclaimer->Array 0->By submitting a query to RIPN's
Whois Service 1->you agree to abide by the following terms of use:
2->#3.2 (in Russian)
3-#3.2 (in English).
domain->Array name->hashcode.ru nserver->Array
ns1.nameself.com->81.176.95.18 ns2.nameself.com->88.212.207.45
status->REGISTERED, DELEGATED, VERIFIED created->2010-11-05
expires->2014-11-05 source->TCI registered->yes regyinfo->Array
referrer-> registrar->RUCENTER-REG-RIPN
servers->Array 0->Array server->ru.whois-servers.net
args->hashcode.ru port->43 type->domain rawdata->Array 0->% By
submitting a query to RIPN's Whois Service 1->% you agree to abide by
the following terms of use: 2->%
(in Russian) 3->%
(in English). 4->
5->domain: 6->nserver: . 7->nserver:
. 8->state: REGISTERED, DELEGATED, VERIFIED
9->person: Private Person 10->registrar: REGTIME-REG-RIPN
11->admin-contact: 12->created: 2010.11.05
13->paid-till: 2014.11.05 14->free-date: 2014.12.06 15->source: TCI
16-> 17->Last updated on 2014.07.27 12:31:31 MSK 18->
You have forgotten to set flag return transfer
<?PHP
// CURL
$ch = curl_init('http://dev.local/phpwhois-4.2.2/example.php?query=domain.ru&output=object');
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$curl = curl_exec($ch);
echo $curl;
curl_close($ch);
if(preg_match('~\s*yes\s*~u', $curl))
echo 'Ok';
else
echo 'Else text';
?>
Take care also about timeouts in the future. Good luck.
I am developing a script involving Php Curl to send sms using http://www.gysms.com/freesms.php
The page stores a cookie PHPSESSID and also a hidden field named token is passed during the posting.
I have written a script involving two curl requests. 1st curl request parse the page and obtain the token value .
Here is the code for that:
<?php
$phone = '9197xxxxxxx';
$msg = 'Hi this is curlpost';
$get_cookie_page = 'http://www.gysms.com/freesms.php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $get_cookie_page);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$sabin = curl_exec($ch);
$html=explode('<input type="hidden" name="trigger" value="',$sabin);
$html=explode('"/>',$html[1]);
//store the token value to $html[0]
?>
Curl post is done using the following code:
<?php
$fields = array(
'trigger'=>urlencode($html[0]), //token value
'number'=>urlencode($phone), //phone no
'message'=>urlencode($msg) //message
);
//posting curl request
foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value.'&'; }
rtrim($fields_string,'&');
$url = 'http://www.gysms.com/freesms.php';
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields_string);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
//execute post
$result = curl_exec($ch);
echo $result;
curl_close($ch);
?>
The sms is not sending Using the above code.
If the sms is sent It should show sms is send to-No.
I don't Know where I went wrong. Please help, I am new to PHP.
Finally this attempt is only for my educational purpouse.
Here is some code I came up with that worked. Hope it helps. Some explanations and feedback about your code follow.
<?php
$number = '14155556666';
$message = 'This is my text in all its glory.';
$url = 'http://www.gysms.com/freesms.php';
$cookieFile = tempnam(null, 'SMS');
$userAgent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0';
if (strlen($message) > 100) {
die('Message length cannot exceed 100 characters.');
}
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); // empty user agents probably not accepted
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1); // enable this - they check referer on POST
$html = curl_exec($ch);
// <input type="hidden" name="trigger" value="CXXrtmqVC7KbUnJ22UBodFy1kBj4ign5PsQ3qNR91nH2055307b4xP4"/>
if (!preg_match('/name=.trigger.\s+value=.([^\'"]+)/i', $html, $trigger)) {
die('Failed to locate hidden input value');
}
sleep(5); // without a slight delay, i often would not receive sms
$trigger = $trigger[1];
// build array of post values - all are important
$post = array('number' => $number,
'trigger' => $trigger,
'message' => $message,
'remLen' => 100 - strlen($message),
$trigger => 'Send Message');
// switch request to POST, use http_build_query to encode post data for us
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($post));
$html = curl_exec($ch);
if (strpos($html, '<b>Message sent to</b>') !== false) {
echo "Message sent!";
} else {
echo "<b>Message not sent :(</b><br /><br />";
echo $html;
}
I think you may have had trouble for several reasons:
A User-Agent should be specified in the request, they seem to reject if you leave it empty
I used http_build_query to build the POST string (preference)
You were missing 2 fields in the request, remLen, and the trigger value as the submit button
I often would not receive the messages if I didn't sleep a few seconds before sending the message after getting the trigger value.
In most of the cases where I didn't get the message, it still showed the "Message sent to phone #" on the screen even though it never came. Once I combined all the right things (sleep time, user agent, valid post fields) I would see the success message but also get the response.
I think the most critical thing left out from your code was that on the first request where you grab the trigger value, they also set a cookie (PHPSESSID) that you are required to capture. Without sending that on the POST request it was probably an automatic reject.
To get around this, make sure you capture cookies on the first request as well as subsequent requests. I chose to re-use the same curl handle for both requests. You don't have to do it that way, but you would have to use the same cookie file and cookie jar between requests.
Hope that helps.
I have the following code for cURL using PHP;
$product_id_edit="Playful Minds (1062)";
$item_description_edit="TEST";
$rank_edit="0";
$price_type_edit="2";
$price_value_edit="473";
$price_previous_value_edit="473";
$active_edit="1";
$platform_edit="ios";
//set POST variables
$url = 'https://www.domain.com/adm_test/phpgen/offline_items.php?operation=insert';
$useragent = 'Mozilla/5.0 (Windows NT 6.1; rv:8.0.1) Gecko/20100101 Firefox/8.0.1';
$fields = array(
'product_id_edit'=>urlencode($product_id_edit),
'item_description_edit'=>urlencode($item_description_edit),
'rank_edit'=>urlencode($rank_edit),
'price_type_edit'=>urlencode($price_type_edit),
'price_value_edit'=>urlencode($price_value_edit),
'price_previous_value_edit'=>urlencode($price_previous_value_edit),
'active_edit'=>urlencode($active_edit),
'platform_edit'=>urlencode($platform_edit)
);
$fields_string="";
//url-ify the data for the POST
foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value.'&'; }
rtrim($fields_string,'&');
//open connection
$ch = curl_init();
//set the url, number of POST vars, POST data
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
//add useragent
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt ($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch,CURLOPT_POSTFIELDS,$fields_string);
curl_setopt($ch,CURLOPT_POST,count($fields));
//execute post
$result = curl_exec($ch);
if(curl_errno($ch)){
print "" . curl_error($ch);
}else{
//print_r($result);
}
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
//echo "HTTP Response Code: " . curl_error($ch);
echo $httpCode;
//close connection
curl_close($ch);
I have $httpCode printed; I get the code 200; I presume this is OK as I have read in the Manual Pages, however, when I check against the site, the POSTed values does not exist,
does this have something to do with cross-domains as I am not posting it on the same domain?, I'm doing it on 127.0.0.1/site/scrpt.php but how do I get the response code 200 if its not successful?
I also tried to get a 404 which I did by removing a part on the request URL it did return a 404 (this means that cURL is working properly in my assumption)
Does having the url https://www.domain.com/adm_test/phpgen/offline_items.php?operation=insert with the "?operation=insert" has something to do with it?
Let's presume(tho not implied), I'm from another site and I want post values into the form of another website sort'a a robot. tho my objective does not imply any evil intentions, is it that I have to encode thousand lines of info if this is not doable.
Likewise, I don't need a response from the server (but if one is available, then its just fine)
The operation should be passed with CURLOPT_POSTFIELDS. Along with other paramters.
Cross-domain issue happens in case of browser. And your code seems to be a php server side code so this should not be an issue.
Not sure if this is the solution or the problem is different, but this line:
rtrim($fields_string,'&');
Should be this:
$fields_string = rtrim($fields_string,'&');
curl_setopt($ch,CURLOPT_POST,TRUE);
CURLOPT_POST - boolean, it's not a count of values, it's use post flag.
Code 200 indicates that the connection is set up correctly and received a response from the server, but it does not mean that the requested action has been implemented.
Print $result after request to see the response from a web server.
define('COOKIE', './cookie.txt');
define('MYURL', 'https://register.pandi.or.id/main');
function getUrl($url, $method='', $vars='', $open=false) {
$agents = 'Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16';
$header_array = array(
"Via: 1.1 register.pandi.or.id",
"Keep-Alive: timeout=15,max=100",
);
static $cookie = false;
if (!$cookie) {
$cookie = session_name() . '=' . time();
}
$referer = 'https://register.pandi.or.id/main';
$ch = curl_init();
if ($method == 'post') {
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "$vars");
}
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header_array);
curl_setopt($ch, CURLOPT_USERAGENT, $agents);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 5);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIE);
curl_setopt($ch, CURLOPT_COOKIEFILE, COOKIE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
$buffer = curl_exec($ch);
if (curl_errno($ch)) {
echo "error " . curl_error($ch);
die;
}
curl_close($ch);
return $buffer;
}
function save_captcha($ch) {
$agents = 'Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16';
$url = "https://register.pandi.or.id/jcaptcha";
static $cookie = false;
if (!$cookie) {
$cookie = session_name() . '=' . time();
}
$ch = curl_init(); // Initialize a CURL session.
curl_setopt($ch, CURLOPT_URL, $url); // Pass URL as parameter.
curl_setopt($ch, CURLOPT_USERAGENT, $agents);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIE);
curl_setopt($ch, CURLOPT_COOKIEFILE, COOKIE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // Return stream contents.
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1); // We'll be returning this
$data = curl_exec($ch); // // Grab the jpg and save the contents in the
curl_close($ch); // close curl resource, and free up system resources.
$captcha_tmpfile = './captcha/captcha-' . rand(1000, 10000) . '.jpg';
$fp = fopen($tmpdir . $captcha_tmpfile, 'w');
fwrite($fp, $data);
fclose($fp);
return $captcha_tmpfile;
}
if (isset($_POST['captcha'])) {
$id = "yudohartono";
$pw = "mypassword";
$postfields = "navigation=authenticate&login-type=registrant&username=" . $id . "&password=" . $pw . "&captcha_response=" . $_POST['captcha'] . "press=login";
$url = "https://register.pandi.or.id/main";
$result = getUrl($url, 'post', $postfields);
echo $result;
} else {
$open = getUrl('https://register.pandi.or.id/main', '', '', true);
$captcha = save_captcha($ch);
$fp = fopen($tmpdir . "/cookie12.txt", 'r');
$a = fread($fp, filesize($tmpdir . "/cookie12.txt"));
fclose($fp);
<form action='' method='POST'>
<img src='<?php echo $captcha ?>' />
<input type='text' name='captcha' value=''>
<input type='submit' value='proses'>
</form>";
if (!is_readable('cookie.txt') && !is_writable('cookie.txt')) {
echo "cookie fail to read";
chmod('../pandi/', '777');
}
}
this cookie.txt
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.
register.pandi.or.id FALSE / FALSE 0 JSESSIONID 05CA8241C5B76F70F364CA244E4D1DF4
after i submit form just display
HTTP/1.1 200 OK Date: Wed, 27 Apr 2011 07:38:08 GMT Server: Apache-Coyote/1.1 X-Powered-By: Servlet 2.4; Tomcat-5.0.28/JBoss-4.0.0 (build: CVSTag=JBoss_4_0_0 date=200409200418) Content-Length: 0 Via: 1.1 register.pandi.or.id Content-Type: text/plain X-Pad: avoid browser bug
if not error "Captcha invalid"
always failed login to pandi
what wrong in my script?
I'm not want to Break Captcha but i want display captcha and user input captcha from my web page, so user can registrar domain dotID from my web automaticaly
A captcha is intended to differentiate between humans and robots (programs). Seems like you are trying to log in with a program. The captcha seems to do its job :).
I don't see a legal way around.
It happens because,
You took your captcha image from first getURL (ie first curl_exec) and processed the captcha but to submit your captcha you are requested getURL (ie again curl_exec) which means to a new page with a new captcha again.
So you are placing the old captcha and putting it in the new captcha. I'm having the same problem & resolved it.
Captcha is a dynamic image created by the server when you hit the page. It will keep changing, you must extract the captcha from the page and then parse it and then submit your page for a login. Captcha will keep changing as and when the page is triggered to load!
Using a headless browsing solution this is possible. ie: zombie.js coffee.js on Node.. Also it may be possible to extract the "image" from the captcha and, using image recognition, "read" the image and convert it to text, which is then posted with the form.
As of today, the only surefire method to "trick" a captcha is to use headless browsing.
Yes, Andro Selva is right. On the second request it gives new captcha. Once it loads captcha with getUrl function and the second load is from the save_captcha function, so this are 2 different images.
It must do something like this:
Download the captcha image before close the curl and before post and tell the script to wait untill you provide captcha answer - I will use preg_match. It will require some javascript as well.
If the captcha image is generated from javascript, you need to execute this javascript with the same cookie or token. In this situation, the easier solution is to record the headers with e.g. livehttpheaders addon for mozila ffox.
With PHP I do not know how to do it, you have to get the captcha and find a way to solve it. It has a lot of algorithms to do it for you, but if you want to use java, I already hacked the source code from this link to get the code to solve the captcha and it works very well for a lot of captcha systems.
So, you could try to implement your own captcha solver, that will take a lot of time, try to find an existing implementation for PHP, or, IMHO, the best option, to use the JDownloader code base.
I am using PHP to try and scrape a page that seems to dynamically load content just milliseconds after the parent page finishes loading.
I am using curl to parse the page, and simpleHtmlDom to snatch things from the parsed html.
My efforts to traverse the DOM and explode() things out of the html return nothing. My only ideas were that it was loading the content after the parent page was loaded.
Here is my code.
<?
$url = 'http://www.facebook.com/OneAndroidAppaDay';
$scrapeUrl = 'http://www.facebook.com/OneAndroidAppaDay';
include_once('simple_html_dom.php');
require_once("bitly.php");
$userAgent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$scrapeUrl);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}
$appBitlyUrl = $html->find('div[class=UIStoryAttachment_Title]',0)->find('a',0)->href; // fail :(
echo 'Bitly Url: ' . $appBitlyUrl;
?>
It's bombing out at line 24 (denoted with the inline comment) with this error:
Fatal error: Call to a member function find() on a non-object in /home/xxxxxxxx/public_html/xxx.xx/xxxx.php on line 24
Is there a way to make it wait a second or two before it snatches the page's html? Or maybe someone has some better insight?
Thanks
Mark
to do a simple delay
sleep(2); // 2 second delay before continuing
You should really re-read the error message. It doesn't stem from a timing issue.
You get a $html string from curl. But you cannot invoke phphtmldom functions ->find on that right away. You'll have to parse it before traversing. Also it's unclear why you are using curl in the first place. Either use just $dom = str_get_html($html) or try:
$dom = file_get_html('http://www.facebook.com/OneAndroidAppaDay');
$bituurl = $dom->find('div[class=UIStoryAttachment_Title]',0)->...