I use curl_multi_exec() to request several websites in parallel. Say, URL1, URL2, and URL3. As soon as one of these websites returns a result, I can process it and then wait for the next response.
Now I need to know, based on the response of the request, which URL this result comes from. I cannot simply check the URL from the response as there might be redirections. So what is the best way to identify from which URL (URL1, URL2, or URL3) the response came from? Can the information from curl_multi_info_read() or curl_getinfo() somehow be used for that? Is there a cURL Option that I can set and request for that?
I also tried storing the cURL handlers before requesting the URLs and compare them with curl_multi_info_read($curlMultiHandle)['handle'] but as this is a resource, it is not really comparable.
Any ideas?
It is possible to attach custom data to handle
curl_setopt($handle, \CURLOPT_PRIVATE, json_encode(['id' => $query_id]));
and then fetch this data
curl_getinfo($handle, \CURLINFO_PRIVATE);
Suppose you have multiple Image objects for which you need to load the data. You run your requests in parallel and don't know the order of download completion. So you have to identify somehow your concrete Image object when you receive the data. Instead of using urls (which might change after redirection) as keys in an associative array of Image objects I recommend the following simple approach.
$mh = curl_multi_init();
$activeHandles = array();
$loadingImages = array();
function loadImage(Image $image) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $image->getUrl());
curl_multi_add_handle($mh, $ch);
...
$this->loadingImages[] = $image;
$activeHandles[] = $ch;
}
function retrieveImages() {
// Somewhere you run curl_multi_exec($mh, $running).
// Here you get the results.
while ($result = curl_multi_info_read($mh)) {
// How to get the data is out of our scope.
// We are interested in identifying the image object.
$ch = $result['handle'];
$idx = array_search($ch, $activeHandles);
$image = $loadingImages[$idx];
if ($success) {
// Don't remember to free resources!
unset($activeHandles[$idx]);
unset($loadingImages[$idx]);
curl_multi_remove_handle($mh, $ch);
........
}
}
}
Related
I have a php script that loads this webpage to extract some data from it's tables.
The following methods failed to get it's table contents:
Using file_get_contents:
$document -> file_get_contents("http://www.webpage.com/");
print_r($document);
Using cURL:
$document = curl_init('http://www.webpage.com/');
curl_setopt($document, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($document);
print_r($html);
Using loadHTMLFile:
$document->loadHTMLFile('http://www.webpage.com/');
print_r($document);
I'm not an expert in php and except the first method, the other ones are copied from StackOverflow's answers.
What am I doing wrong?
and How they do block some contents from loading?
Not the answer you're likely to want to hear, but none of the methods you describe will evaluate JavaScript and other browser resources as a normal browser client would. Instead, each of those methods retrieves the contents of only the file you've specified. A quick glance at the site you're targeting clearly shows this table in question being populated as the result of an AJAX call, which none of the methods you've tried are able to evaluate.
You'll need to lean on a library or script that has the capability for this type of emulation; namely laravel/dusk, the PHP bindings for Selenium webdriver, or something similar.
This is what I did to scrape data from a webpage using php curl:
// Defining the basic cURL function
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
// Defining the basic scraping function
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$target_url = "https://www.somesite.com";
$scraped_website = curl($target_url);
$data_set_1 = scrape_between($scraped_website, "%before%", "%after%");
$data_set_2 = scrape_between($scraped_website, "%before%", "%after%");
The %before% and %after% is data that always shows up on the webpage before and after the data you wish to grab. Could be div tags or some other html tags that are unique to the data you wish to grab.
So maybe look into using curl and and imitate the same ajax request that the site is using? When I searched for that, this is what I found:
Mimicking an ajax call with Curl PHP
I'm programming in PHP.
An article I've found useful until now was mainly about how to CURL through one site with a lot of information, but what I really need is how to cURL on multiple sites with not so much information - a few lines, as a matter of fact!
Another part is, the article focus is mainly at storing it at the FTP server in a txt file, but I have loaded around 900 addresses into mysql, and want to load them from there, and enrich the table with the information stored in the links - Which I will provided beneath!
We have some open public libraries with addresses and information about these and an API.
Link to the main site:
The function I would like to use: http://dawa.aws.dk/adresser/autocomplete?q=
SQL Structure:
Data example: http://i.imgur.com/jP1J26U.jpg
fx this addresse: Dornen 2 6715 Esbjerg N (called AdrName in databasen).
http://dawa.aws.dk/adresser/autocomplete?q=Dornen%202%206715%20Esbjerg%20N
This will give me the following output (which I want to store in the AdrID in the database):
[
{
"tekst": "Dornen 2, Tarp, 6715 Esbjerg N",
"adresse": {
"id": "0a3f50b8-d085-32b8-e044-0003ba298018",
"href": "http://dawa.aws.dk/adresser/0a3f50b8-d085-32b8-e044-0003ba298018",
"vejnavn": "Dornen",
"husnr": "2",
"etage": null,
"dør": null,
"supplerendebynavn": "Tarp",
"postnr": "6715",
"postnrnavn": "Esbjerg N"
}
}
]
How to store it all in a blob, as seen in the SQL structure?
If you want to make a cURL request in php use this method
function curl_download($Url){
// is cURL installed yet?
if (!function_exists('curl_init')){
die('Sorry cURL is not installed!');
}
// OK cool - then let's create a new cURL resource handle
$ch = curl_init();
// Now set some options (most are optional)
// Set URL to download
curl_setopt($ch, CURLOPT_URL, $Url);
// Set a referer
curl_setopt($ch, CURLOPT_REFERER, "http://www.example.org/yay.htm");
// User agent
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
// Include header in result? (0 = yes, 1 = no)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Should cURL return or print out the data? (true = return, false = print)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Timeout in seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
// Download the given URL, and return output
$output = curl_exec($ch);
// Close the cURL resource, and free system resources
curl_close($ch);
return $output;
}
And then you call it using
print curl_download('http://dawa.aws.dk/adresser/autocomplete?q=Melvej');
Or you can directly convert it jSON object
$jsonString=curl_download('http://dawa.aws.dk/adresser/autocomplete?q=Melvej');
var_dump(json_decode($jsonString));
The data you download is json, so you can store that in a varchar column rather than blog.
Also the site with the api does not seem bothered about http referrer, user agent etc so you can use file_get_contents in place of curl.
So simply get all the results from your db, iterate over them, making a call to the api, and update the appropriate row with the correct data:
//get all the rows from your database
$addresses = DB::exec('SELECT * FROM addresses'); //i dont know how you actually access your db, this is just an example
foreach($addresses as $address){
$searchTerm = $address['AdrName'];
$addressId = $address['Vid'];
//download the json
$apidata = file_get_contents('http://dawa.aws.dk/adresser/autocomplete?q=' . urlencode($searchTerm));
//save back to db
DB::exec('UPDATE addresses SET status=? WHERE id=?', [$apidata, $searchTerm]);
//if you want to access the data, you can use json_decode:
$data = json_decode($apidata);
echo $data[0]->tekst; //outputs Dornen 2, Tarp, 6715 Esbjerg N
}
I want to convert given postcode to latitude and longitude to integrate in my cart project.
But when I try to grab latitude and longitude with google api they are showing some error like,
"We're sorry... ... but your computer or network may be sending
automated queries. To protect our users, we can't process your request
right now."
What is wrong with my code? My code is shown below.
function getLatLong($code){
$mapsApiKey = 'AIzaSyC1Ky_5LFNl2zq_Ot2Qgf1VJJTgybluYKo';
$query = "http://maps.google.co.uk/maps/geo?q=".urlencode($code)."&output=json&key=".$mapsApiKey;
//---------
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $query);
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
$data = curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
//-----------
//$data = file_get_contents($query);
// if data returned
if($data){
// convert into readable format
$data = json_decode($data);
$long = $data->Placemark[0]->Point->coordinates[0];
$lat = $data->Placemark[0]->Point->coordinates[1];
return array('Latitude'=>$lat,'Longitude'=>$long);
}else{
return false;
}
}
print_r(getLatLong('SW1W 9TQ'));
Use useragent
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0');
Also check whether you missed or not to send any HTTP Request header. Also check whether you are using required parameters(GET or POST) with your request.
By the way, If you are using too many requests then you have nothing to do with this error. Just stop sending requests, or limit your requests so that it doesn't upset the server.
I am currently trying to fetch some facebook data, which I then want to access in Javascript. Specifically, I am trying to access some characteristics of the user's friends.
So I am getting the user's friend list using file_get_contents to his graph API URL.
This provides me with an array of friend ids.
As I need a characteristic from each friend, I am doing:
foreach($dataarray as $friend) {
$friendurl = "https://graph.facebook.com/".$friend->id."?access_token=".$token."";
$fdata = json_decode(file_get_contents($friendurl));
if($fdata->gender == "male") {
array_push($fulldata, $fdata->name);
}
}
Having this code piece seems to break the javascript code, as none of my alert instructions are ran.
Also, inserting a break after the if, so that only one file_get_contents is done, seems to make the code runnable (but I obviously need to go through all of the friends).
How can I solve this?
I would use jQuery or xmlHttpRequest to do the HTTP GET, but somehow I always seem to get back a status code of 0, with an empty response.
Edit:
Here is the JS code:
<script type="text/javascript">
function initialize() {
alert('Test1');
<?php
$fulldata = array();
$data = $result->data;
foreach($data as $friend) {
$friendurl = "https://graph.facebook.com/".$friend->id."?access_token=".$token."";
//echo("alert(\"".$friendurl."\");");
$fdata = json_decode(file_get_contents($friendurl));
if($fdata->hometown->name) {
array_push($fulldata, $fdata->hometown->name);
}
}
echo ("alert(\"".count($fulldata)."\")");
?>
}
</script>
I should've also added that this is being done on a page embedded into facebook using the canvas feature.
Try...
function curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
return curl_exec($ch);
curl_close ($ch);
}
foreach($dataarray as $friend){
$friendurl = "https://graph.facebook.com/".$friend->id."?access_token=".$token."";
$fdata = json_decode(curl($friendurl));
if($fdata->gender == "male"){
array_push($fulldata, $fdata->name);
}
}
Maybe FGC is disabled but you don't get any notifications/warnings.
Code from comment:
error_reporting(E_ALL); ini_set("display_errors", 1);
Note that you are doing cross-domain AJAX call which is prohibited for security reasons.
You can do the api call on the server and echo the data to the client side JS, or you can build a php proxy return the result of the Graph API call(As the proxy is at your own server, they are in the same domain).
I have an array containing the contents of a MySQL table. I need to put each of these contents into curl_multi_handles so that I can execute them all simultaneously
Here is the code for the array, in case it helps:
$SQL = mysql_query("SELECT url FROM urls") or die(mysql_error());
while($resultSet = mysql_fetch_array($SQL)){
$urls[]=$resultSet
}
So I need to put be able to send data to each url at the same time. I don't need to get any data back, and in fact I'll be having them time out after two seconds. It only needs to send the data and then close.
My code prior to this, was executing them one at a time. here is that code:
$SQL = mysql_query("SELECT url FROM shells") or die(mysql_error()); while($resultSet = mysql_fetch_array($SQL)){
$ch = curl_init($resultSet['url'] . $fullcurl); //load the urls and send GET data
curl_setopt($ch, CURLOPT_TIMEOUT, 2); //Only load it for two seconds (Long enough to send the data)
curl_exec($ch);
curl_close($ch);
So my question is: How can I load the contents of the array into curl_multi_handle, execute it, and then remove each handle and close the curl_multi_handle?
You still call curl_init and curl_setopt. Then you load it into a multi_handle, and keep calling execute until it's done. This is based on the documentation at curl_multi_init. Since you're timing out in two seconds, and not processing responses, I think you can just sleep for two seconds at a time. curl_multi_select might be better if you actually need to process the responses.
$SQL = mysql_query("SELECT url FROM shells") ;
$mh = curl_multi_init();
$handles = array();
while($resultSet = mysql_fetch_array($SQL)){
//load the urls and send GET data
$ch = curl_init($resultSet['url'] . $fullcurl);
//Only load it for two seconds (Long enough to send the data)
curl_setopt($ch, CURLOPT_TIMEOUT, 2);
curl_multi_add_handle($mh, $ch);
$handles[] = $ch;
}
// Create a status variable so we know when exec is done.
$running = null;
//execute the handles
do {
// Call exec. This call is non-blocking, meaning it works in the background.
curl_multi_exec($mh,$running);
// Sleep while it's executing. You could do other work here, if you have any.
sleep(2);
// Keep going until it's done.
} while ($running > 0);
// For loop to remove (close) the regular handles.
foreach($handles as $ch)
{
// Remove the current array handle.
curl_multi_remove_handle($mh, $ch);
}
// Close the multi handle
curl_multi_close($mh);
If i were you, i would write class mysql and a class curl.
Its very good at all.
First i would create a method witch would return all urls from a passed mysql result.
Something like
public function getUrls($mysql_fetch_array)
{
foreach($mysql_fetch_array as $result)
{
$urls[] = $result["url"];
}
}
then you could write a method like curlSend($url,$param)
//remember you have to edit i dont know your full code so its just
// a way you could do it
public function curlSend($url,$param="")
{
$ch = curl_init($resultSet['url'] . $fullcurl); //load the urls and send GET data
curl_setopt($ch, CURLOPT_TIMEOUT, 2); //Only load it for two seconds (Long enough to send the data)
curl_exec($ch);
curl_close($ch);
}
public function send()
{
$urls = getUrls($this->mysql->result($sql));
foreach($urls as $url)
{
$this->curlSend($url);
}
}
Now this is how you could do it.