this was working up until quite recently and i cannot seem to crack the case.
if you manually visit the url hit against in the script, the results are there..but if i do it in the code, i am having an issue.
you can see in my output test that i am no longer getting any output...
any ideas?
<?
//$ticker=urldecode($_GET["ticker"]);
$ticker='HYG~FBT~';
echo $ticker;
$tickerArray=preg_split("/\~/",$ticker);
// create curl resource
$ch = curl_init();
// set urlm
curl_setopt($ch, CURLOPT_URL, "http://www.batstrading.com/market_data/symbol_data/csv/");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$a='';
$output = curl_exec($ch);
echo "<br><br>OUTPUT TEST: ".($output);
$lineCt=0;
$spaceCt=0;
$splitOutput=preg_split("[\n|\r]",$output);
for($ii=0;$ii<sizeof($tickerArray);$ii++){
$i=0;
$matchSplit[$ii]=-1;
while($i<sizeof($splitOutput) && $matchSplit[$ii]==-1){
$splitOutput2=preg_split("/\,/",$splitOutput[$i]);
if($i>0){
if(strcasecmp($splitOutput2[0],strtoupper($tickerArray[$ii]))==0){
$matchSplit[$ii]=$splitOutput[$i]."#";
}
}
$i++;
}
if($matchSplit[$ii]==-1){
echo "notFound#";
}else{
echo $matchSplit[$ii];
}
}
//echo ($output);
curl_close($ch);
?>
I added a user agent to your script and it seems to work fine here:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.batstrading.com/market_data/symbol_data/csv/");
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15); //time out of 15 seconds
$output = curl_exec($ch);
curl_close($ch);
//Then your output paring code
The output I get:
HYG~FBT~
OUTPUT TEST: Name,Volume,Ask Size,Ask Price,Bid Size,Bid Price,Last Price,Shares Matched,Shares Routed SPY,35641091,0,0.0,0,0.0,134.38,34256509,1384582 BAC,22100508,0,0.0,0,0.0,7.78,20407265,1693243 QQQ,12085707,0,0.0,0,0.0,62.65,11703725,381982 XLF,11642347,0,0.0,0,0.0,14.47,11429581,212766 VXX,9838310,0,0.0,0,0.0,28.2,9525266,313044 EEM,9711498,0,0.0,0,0.0,43.28,9240820,470678 IWM,8272528,0,0.0,0,0.0,81.19,7930349,342179 AAPL,6145951,0,0.0,0,0.0,498.24,4792854,1353097
It is also good practice to close the CURL connection once you are done. I believe that might also play a part in your issues.
If you are still getting issues, check to see if the server the script runs on can access that site.
Update
Upon further investigation, here's what I believe is the root of the problem.
The problem lies with the provider of the CSV file. Perhaps due to some issues on their end, the CSV are generated, but only contain the headers. There were instances where there were indeed data in there.
The data is only avaliable during set hours of the day.
In any case, fetching the empty file = the parser would print #notFound, leading us to assume that there was an issue with CURL.
So my suggestion is to add some further checking to the script to check whether the CSV file actually contains any data at all and is not a file containing just the headings.
Finally, setting a timeout for CURL should fix it as the CSV takes a while to be generated by the provider.
Related
I set code to hit the links to the proxy list in php. The hit is generated succesfully. and I am getting the output in html. but this html is not in display proper on browswer. I want exact html in return from the proxy. any body know how to do it please give me some idea about it here is the code which I am using
<?php
$curl = curl_init();
$timeout = 30;
$proxies = file("proxy.txt");
$r="https://www.abcdefgth.com";
// Not more than 2 at a time
for($x=0;$x<2000; $x++){
//setting time limit to zero will ensure the script doesn't get timed out
set_time_limit(30);
//now we will separate proxy address from the port
//$PROXY_URL=$proxies[$getrand[$x]];
echo $proxies[$x];
curl_setopt($curl, CURLOPT_URL,$r);
curl_setopt($curl , CURLOPT_PROXY , preg_replace('/\s+/', '',$proxies[$x]));
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5");
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($curl, CURLOPT_REFERER, "http://google.com/");
$text = curl_exec($curl);
echo "Hit Generated:";
}
?>
A simple look into the documentation of the function you use would have answered your question:
On http://php.net/manual/en/function.curl-exec.php it clearly states right in the "Return value" section that you receive back either a boolean value from that function. Except if you have specified the CURLOPT_RETURNTRANSFER flag which you did not do you in code.
So have a try adding
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
Followed by any attempt to actually output the result you receive in $text, which you also forgot.
I have a repetitive task that I do daily. Log in to a web portal, click a link that pops open a new window, and then click a button to download an Excel spreadsheet. It's a time consuming task that I would like to automate.
I've been doing some research with PHP and cUrl, and while it seems like it should be possible, I haven't found any good examples. Has anyone ever done something like this, or do you know of any tools that are better suited for it?
Are you familiar with the basics of HTTP requests? Like, do you know the difference between a POST and a GET request? If what you're doing amounts to nothing more than GET requests, then it's actually super simple and you don't need to use cURL at all. But if "clicking a button" means submitting a POST form, then you will need cURL.
One way to check this is by using a tool such as Live HTTP Headers and watching what requests happen when you click on your links/buttons. It's up to you to figure out which variables need to get passed along with each request and which URLs you need to use.
But assuming that there is at least one POST request, here's a basic script that will post data and get back whatever HTML is returned.
<?php
if ( $ch = curl_init() ) {
$data = 'field1=' . urlencode('somevalue');
$data .= '&field2[]=' . urlencode('someothervalue');
$url = 'http://www.website.com/path/to/post.asp';
$userAgent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)';
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
$html = curl_exec($ch);
curl_close($ch);
} else {
$html = false;
}
// write code here to look through $html for
// the link to download your excel file
?>
try this >>>
$ch = curl_init();
$csrf_token = $this->getCSRFToken($ch);// this function to get csrf token from website if you need it
$ch = $this->signIn($ch, $csrf_token);//signin function you must do it and return channel
curl_setopt($ch, CURLOPT_HTTPGET, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 300);// if file large
curl_setopt($ch, CURLOPT_URL, "https://your-URL/anything");
$return=curl_exec($ch);
// the important part
$destination ="files.xlsx";
if (file_exists( $destination)) {
unlink( $destination);
}
$file=fopen($destination,"w+");
fputs($file,$return);
if(fclose($file))
{
echo "downloaded";
}
curl_close($ch);
I have a PHP script which downloads all of the content at a URL via cURL:
<?php
function get_curl_output($link)
{
$channel = curl_init();
curl_setopt($channel, CURLOPT_URL, $link);
curl_setopt($channel, CURLOPT_HEADER, 0);
curl_setopt($channel, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($channel, CURLOPT_CONNECTTIMEOUT, 10000);
curl_setopt($channel, CURLOPT_TIMEOUT, 10000);
curl_setopt($channel, CURLOPT_VERBOSE, true);
curl_setopt($channel, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219');
curl_setopt($channel, CURLOPT_FOLLOWLOCATION, true);
$output = curl_exec($channel);
if(curl_errno($channel))
{
file_put_contents('curlerror.txt', curl_error($channel) . PHP_EOL, FILE_APPEND);
}
curl_close($channel);
return $output;
?>
<?php
function downloader($given_url){
//download all the url's content in the $given_url.
//if this $given_url is an array with 10 urls inside it, download all of their content.
while(as long as i find a url inside $given_url){
$content_stored_here = get_curl_output($link);
//and put that content to a file
}
}
?>
Now every thing goes fine until there is no connection loss or IP changes. However, my connection randomly gets a new IP address after some hours, as I don't have a static IP address.
And i use mod_php in apache using WinNT MPM thread worker.
Once I get the new IP address, my code stops working, but throws no errors
EDIT :
i made the same program on c++ (of course changing some functions name and tweaking compiler settings and linker settings) c++ too stops at the middle of the programs once i got the new IP address or a connection loss.
Any insights on this?
You don't need such huge timeouts; when there is no connection cUrl tries to connect for as long as 10000 seconds accroding to your code. Set it to something more reasonable - just 10, for example
The problem is, I get some parts of the contents but did not get the user's reviews. by Firebug I saw contents but when I checked the source codes NO contents inside HTML tags / no same HTML tags. Here is my code:
<?php
//Headers
include('simple_html_dom.php');
function getPage($page, $redirect = 0, $cookie_file = '')
{
$ch = curl_init();
$headers = array("Content-type: application/json");
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
if($redirect)
{
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
}
curl_setopt($ch, CURLOPT_URL, $page);
if($cookie_file != '') {
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
}
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6');
$return = curl_exec($ch); //Mozilla/4.0 (compatible;)
curl_close($ch);
return $return;
}//EO Fn
//Source
$url = 'http://www.vitals.com/doctor/profile/1982660171/reviews/1982660171';
//Parsing ...
$contents = getPage($url, 1, 'cookies.txt');
$html = str_get_html($contents);
//Output
echo $html->outertext;
?>
Can anyone please help me - what I should do to get the whole page so that I can grab reviews?enter code here
They're just stored as JSON in a <script> block towards the top of the page. Parse it out with RegEx or Simple HTML DOM and run it through json_decode.
var json = {"provider":{"id":"1982660171","display_name":"Stephen R Guy, MD","last_name":"Guy","first_name":"Stephen","middle_name":"Russell","master_name":"Stephen_Guy","degree_types":"MD","familiar_name":"Stephen","years_experience":"27","birth_year":"1956","birth_month":"5","birth_day":"23","gender":"M","is_limited":"false","url_deep":"http:\/\/www.vitals.com\/doctor\/profile\/1982660171\/Stephen_Guy","url_public":"http:\/\/www.vitals.com\/doctors\/Dr_Stephen_Guy.html","status_code":"A","client_ids":"1","quality_indicator_set":[{"type":"quality-indicator\/consumer-feedback","count":"2","suboverall_set":[{"name_short":"Promptness","overall":"3"},{"name_short":"Courteous Staff","overall":"4"},{"name_short":"Bedside Manner","overall":"4"},{"name_short":"Spends Time with Me","overall":"4"},{"name_short":"Follow Up","overall":"4"}],"name":"Consumer Reviews","overall":"4.0","measure_set":[{"feedback_response_id":"1756185","input_source_ids":"{0}","date":"1301544000","value":"4","scale":{"best":"1","worst":"4"},"review":{"type":"review\/consumer","comment":"I will never birth with another dr. Granted that's not saying much as I don't like dr's but I actually find him as valuable as the midwives who I adore. I liked Horlacher but when Kitty left I followed the midwives and then followed again....Dr. Guy is GREAT. I honestly don't know who I'd rather support me at my birth; Margie and Lisa or Dr. Guy. ....I wonder if I can just get all of them.Guy's great. Know what you want. Tell him. Be strong and he'll support you.I give him 10 stars. Oh...my baby's 3 years old now. He's GREAT! ","date":"1301544000"},"sub_measure":[{"name":"Waiting time during a visit","name_short":"Promptness","value":"3","scale":{"best":"4","worst":"1"}},{"name":"Courtesy and professionalism of office staff ","name_short":"Courteous Staff","value":"4","scale":{"best":"4","worst":"1"}},{"name":"Bedside manner (caring)","name_short":"Bedside Manner","value":"4","scale":{"best":"4","worst":"1"}},{"name":"Spending enough time with me","name_short":"Spends Time with Me","value":"4","scale":{"best":"4","worst":"1"}},{"name":"Following up as needed after my visit","name_short":"Follow Up","value":"4","scale":{"best":"4","worst":"1"}}]},{"feedback_response_id":"420734","input_source_ids":"{76}","link":"http:\/\/local.yahoo.com\/info-15826842-guy-stephen-r-md-university-women-s-health-center-dayton","date":"1142398800","value":"4","scale":{"best":"1","worst":"4"},"review":{"type":"review\/consumer","comment":"Excellent Doctor: I really like going to this office. They are truely down to earth people and talk my \"non-medical\" language. I have been using thier office since 1997 and they have seen me through 2 premature pregnancies!","date":"1142398800"}}],"wait_time":"50"}]}};
But again, make sure you have permissions to do this...
I'm struggling getting an array of LS cities... file_get_contents() returns an empty dropdown on their roadblock requiring you to select cities. Unfortunately it's empty... so then I thought it was coming from an ajax request. But looking at the page I don't see any ajax requests on the page. Then I tried CURL, thinking that maybe simulating a browser would help... the below code had no affect.
$ch = curl_init("http://www.URL.com/");
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$result=curl_exec($ch);
var_dump($result);
Does anyone have any ideas on how I can get a solid list of available areas?
I have found out how they populate the list of cities and created some sample code below you can use.
The list of cities is stored as a JSON string in one of their javascript files, and the list is actually populated from a different javascript file. The names of the files appear to be somewhat random, but the root name remains the same.
An example of the JS file with the city JSON is hXXp://a3.ak.lscdn.net/deals/system/javascripts/bingy-81bf24c3431bcffd317457ce1n434ca9.js The script that populates the list is hXXp://a2.ak.lscdn.net/deals/system/javascripts/confirm_city-81bf24c3431bcffd317457ce1n434ca9.js but for us this is inconsequential.
We need to load their home page with a new curl session, look for the unique javascript URL that is the bingy script and fetch that with curl. Then we need to find the JSON and decode it to PHP so we can use it.
Here is the script I came up with that works for me:
<?php
error_reporting(E_ALL); ini_set('display_errors', 1); // debugging
// set up new curl session with options
$ch = curl_init('http://livingsocial.com');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13');
$res = curl_exec($ch); // fetch home page
// regex string to find the bingy javascript file
$matchStr = '/src="(https?:\/\/.*?(?:javascripts)\/bingy-?[^\.]*\.js)"/i';
if (!preg_match($matchStr, $res, $bingyMatch)) {
die('Failed to extract URL of javascript file!');
}
// this js file is now our new url
$url = $bingyMatch[1];
curl_setopt($ch, CURLOPT_URL, $url);
$res = curl_exec($ch); // fetch bingy js
$pos = strpos($res, 'fte_cities'); // search for the fte_cities variable where the list is stored
if ($pos === false) {
die('Failed to locate cities JSON in javascript file!');
}
// find the beginning of the json string, and the end of the line
$startPos = strpos($res, '{', $pos + 1);
$endPos = strpos($res, "\n", $pos + 1);
$json = trim(substr($res, $startPos, $endPos - $startPos)); // snip out the json
if (substr($json, -1) == ';') $json = substr($json, 0, -1); // remove trailing semicolon if present
$places = json_decode($json, true); // decode json to php array
if ($places == null) {
die('Failed to decode JSON string of cities!');
}
// array is structured where each country is a key, and the value is an array of cities
foreach($places as $country => $cities) {
echo "Country: $country<br />\n";
foreach($cities as $city) {
echo ' '
."{$city['name']} - {$city['id']}<br />\n";
}
echo "<br />\n";
}
Some important notes:
If they decide to change the javascript file names, this will fail to work.
If they rename the variable name that holds the cities, this will fail to work.
If they modify the json to span multiple lines, this will not work (this is unlikely because it uses extra bandwidth)
If they change the structure of the json object, this will not work.
In any case, depending on their modifications it may be trivial to get working again, but it is a potential issue. They may also be unlikely to make these logistical changes because it would require modifications to a number of files, and then require more testing.
Hope that helps!
Perhaps a bit late, but you don't need to couple to our JavaScript to obtain the cities list. We have an API for that:
https://sites.google.com/a/hungrymachine.com/livingsocial-api/home/cities