php timeout with file_get_html - php

i been trying to fetch some data from wikia website by using simple_html_dom lib for php. basically what i do is to use the wikia api to convert into html render and extract data from there. After extracting, i will pump those data into mysql database to save. My problem is that, usually i will pull 300 records and i will stuck on 93 records with file_get_html being null which will cause my find() function to fail. I am not sure why is it stopping at 93 records but i have tried various solution such as
ini_set( 'default_socket_timeout', 120 );
set_time_limit( 120 );
basically i will have to access wikia page for 300 times to get those 300 records. But mostly i will manage to get 93 records before file_get_html gets to null. Any idea how can i tackle this issue?
i have test curl as well and have the same issue.
function test($url){
$ch=curl_init();
$timeout=5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$result=curl_exec($ch);
curl_close($ch);
return $result;
}
$baseurl = 'http://xxxx.wikia.com/index.php?';
foreach($resultset_wiki as $name){
// Create DOM from URL or file
$options = array("action"=>"render","title"=>$name['name']);
$baseurl .= http_build_query($options,'','&');
$html = file_get_html($baseurl);
if($html === FALSE) {
echo "issue here";
}
// this code for cURL but commented for testing with file_get_html instead
$a = test($baseurl);
$html = new simple_html_dom();
$html->load($a);
// find div stuff here and mysql data pumping here.
}
$resultsetwiki is an array with the list of title to fetch from wikia, basically resultsetwiki data set is load from db as well before performing the search.
practically i will it this type of error
Call to a member function find() on a non-object in

answered my own issue, seems to be the URL that i am using and i have changed to curl with post to post the action and title parameter instead

Related

Problems to extract data from an external web page in PHP

I have a script that is responsible for extracting names of people from an external web page by passing an ID as a parameter.
Note: The information provided by this external website is public access, everyone can check this data.
This is the code that I created:
function names($ids)
{
$url = 'https://www.exampledomain.com/es/query_data_example?name=&id='.$ids;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER,array("Accept-Lenguage: es-es,es"));
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$html = curl_exec($ch);
$error = curl_error($ch);
curl_close($ch);
preg_match_all('/<tr class="odd"><td><a href="(.*?)">/',$html ,$matches);
if (count($matches[1] == 0))
{
$result = "";
}
else if(count($matches[1] == 1))
{
$result = $matches[1][0];
$result = str_replace('/es/person/','', $result);
$result = substr($result, 0,-12);
$result = str_replace('-', ' ', $result);
$result = ucwords($result);
}
return $result;
}
Note2: in the variable $ url I have placed an example url, it is not the real url. It's just an exact example of the original URL that I use in my code.
I make the call to the function, and I show the result with an echo:
$info = names('8476756848');
echo $info;
and everything is perfect, I extracted the name of the person to whom that id belongs.
The problem arises when I try to query that function within a for(or while) loop, since I have an array with many ids
$myids = ["2809475460", "2332318975", "2587100534", "2574144252", "2611639906", "2815870980", "0924497817", "2883119946", "2376743158", "2387362041", "2804754226", "2332833975", "258971534", "2574165252", "2619016306", "2887098054", "2449781007", "2008819946", "2763767158", "2399362041", "2832047546", "2331228975", "2965871534", "2574501252", "2809475460", "2332318975", "2587100534", "2574144252", "2611639906", "2815870980", "0924497817", "2883119946", "2376743158", "2387362041", "2804754226", "2332833975", "258971534", "2574165252", "2619016306", "2887098054", "2449781007", "2008819946", "2763767158", "2399362041", "2832047546", "2331228975", "2965871534", "2574501252", "2809475460", "2332318975", "2587100534", "2574144252", "2611639906", "2815870980", "0924497817", "2883119946", "2376743158", "2387362041", "2804754226", "2332833975", "258971534", "2574165252", "2619016306", "2887098054", "2449781007", "2008819946", "2763767158", "2399362041", "2832047546", "2331228975", "2965871534", "2574501252"];
//Note: These data are for example only, they are not the real ids.
$size = count($myids);
for ($i=0; $i < $size; $i++)
{
//sleep(20);
$data = names($myids[$i]);
echo "ID IS: " . $myids[$i] . "<br> THE NAME IS: " . $data . "<br><br>";
}
The result is something like this:
ID IS: 258971534
THE NAME IS:
ID IS: 2883119946
THE NAME IS:
and so on. I mean, it shows me the Ids but the names do not extract them from the names function.
It shows me the whole list of ids but in the case of the names it does not show me any, as if the function names does not work.
If I put only 3 ids in the array and run the for loop again, then it gives me the names of those 3 ids, because they are few. But when the array contains many ids, then the function already returns no names. It is as if the multiple requests do not accept them or limit them, I do not know.
I have placed the function set_time_limit (0) at the beginning of my php file; to avoid that I get the error of excess time of 30 seconds.
because I thought that was why the function was not working, but it did not work. Also try placing a sleep (20) inside the cycle, before calling the function names to see if it was that it was making many requests very quickly to said web page but it did not work either.
This script is already in production on a server that I have hired and I have this problem that prevents my script from working properly.
Note: There may be arrays with more than 2000 ids or I am even preparing a script that will read files .txt and .csv that will contain more than 10000 ids, which I will extract from each file and call the function name, and then those ids and the names will be saved in a table from a mysql database.
Someone will know why names are not extracted when there are many ids but when they are few for example 1 or 10 the function name does work?

Passing updated value to function (twitter api max_id problems)

I am trying to work with the Twitter search API, I found a php library that does authentication with app-only auth and I added the max_id argument to it, however, I would like to run 450 queries per 15 minutes (as per the rate-limit) and I am not sure about how to pass the max_id. So I run it first with the default 0 value, and then it gets the max_id result from the API's response and runs the function again, but this time with the retrieved max_id value and does this 450 times. I tried a few things, and I can get the max_id result after calling the function, but I don't know how to pass it back and tell it to call the function with the updated value.
<?php
function search_for_a_term($bearer_token, $query, $result_type='mixed', $count='15', $max_id='0'){
$url = "https://api.twitter.com/1.1/search/tweets.json"; // base url
$q = $query; // query term
$formed_url ='?q='.$q; // fully formed url
if($result_type!='mixed'){$formed_url = $formed_url.'&result_type='.$result_type;} // result type - mixed(default), recent, popular
if($count!='15'){$formed_url = $formed_url.'&count='.$count;} // results per page - defaulted to 15
$formed_url = $formed_url.'&include_entities=true'; // makes sure the entities are included
if($max_id!='0'){$formed_url=$formed_url.'&max_id='.$max_id;}
$headers = array(
"GET /1.1/search/tweets.json".$formed_url." HTTP/1.1",
"Host: api.twitter.com",
"User-Agent: jonhurlock Twitter Application-only OAuth App v.1",
"Authorization: Bearer ".$bearer_token."",
);
$ch = curl_init(); // setup a curl
curl_setopt($ch, CURLOPT_URL,$url.$formed_url); // set url to send to
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers); // set custom headers
ob_start(); // start ouput buffering
$output = curl_exec ($ch); // execute the curl
$retrievedhtml = ob_get_contents(); // grab the retreived html
ob_end_clean(); //End buffering and clean output
curl_close($ch); // close the curl
$result= json_decode($retrievedhtml, true);
return $result;
}
$results=search_for_a_term("mybearertoken", "mysearchterm");
/* would like to get all kinds of info from here and put it into a mysql database */
$max_id=$results["search_metadata"]["max_id_str"];
print $max_id; //this gives me the max_id for that page
?>
I know that there are must be some existing libraries that do this, but I can't use any of the libraries, since none of them have updated to the app-only auth yet.
EDIT: I put a loop in the beginning of the script, to run e.g. 3 times, and then put a print statement to see what happens, but it only prints out the same max_id, doesn't access three different ones.
do{
$result = search_for_a_term("mybearertoken", "searchterm", $max_id);
$max_id = $result["search_metadata"]["max_id_str"];
$i++;
print ' '.$max_id.' ';
}while($i < 3);

Cannot retrieve JSON POST via PHP: cURL

I tried implementing the following PHP code to POST JSON via PHP: cURL (SOME FORCE.COM WEBSITE is a tag that signifies the URL that I want to POST):
$url = "<SOME FORCE.COM WEBSITE>";
$data =
'application' =>
array
(
'isTest' => FALSE,
key => value,
key => value,
key => value,
...
)
$ch = curl_init($url);
$data_string = json_encode($data);
curl_setopt($ch, CURLOPT_POST, true);
//Send blindly the json-encoded string.
//The server, IMO, expects the body of the HTTP request to be in JSON
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_string);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER,
array
(
'Content-Type:application/json',
'Content-Length: ' . strlen($data_string)
)
);
$result = curl_exec($ch);
curl_close($ch);
print_r($result);
echo '<pre>';
echo $_POST;
$jsonStr = file_get_contents('php://input'); //read the HTTP body.
var_dump($jsonStr);
var_dump(json_decode($jsonStr));
echo '</pre>';
The output of the above is the following:
"Your TEST POST is correct, please set the isTest (Boolean) attribute on the application to FALSE to actually apply."
Arraystring(0) ""
NULL
OK, the above confirms that I formatted the JSON data correctly by using json_encode, and the SOME FORCE.COM WEBSITE acknowledges that the value of 'isTest' is FALSE. However, I am not getting anything from "var_dump($jsonStr)" or "var_dump(json_decode($jsonStr))". I decided to just ignore that fact and set 'isTest' to FALSE, assuming that I am not getting any JSON data because I set 'isTest' to TRUE, but chaos ensues when I set 'isTest' to FALSE:
[{"message":"System.EmailException: SendEmail failed. First exception on row 0; first error: REQUIRED_FIELD_MISSING, Missing body, need at least one of html or plainText: []\n\nClass.careers_RestWebService.sendReceiptEmail: line 165, column 1\nClass.careers_RestWebService.postApplication: line 205, column 1","errorCode":"APEX_ERROR"}]
Arraystring(0) ""
NULL
I still do not get any JSON data, and ultimately, the email was unable to be sent. I believe that the issue is resulting from an empty email body because there is nothing coming from "var_dump($jsonStr)" or "var_dump(json_decode($jsonStr))". Can you help me retrieve the JSON POST? I would really appreciate any hints, suggestions, etc. Thanks.
I solved this question on my own. I was not sure if I was doing this correctly or not, but it turns out that my code was perfect. I kept refreshing my website, from where I am POSTing to SOME FORCE.COM WEBSITE. I believe that the people managing SOME FORCE.COM WEBSITE were having issues on their end. Nothing was wrong with what I did. For some reason, I got a code 202 and some gibberish text to go along with it. I would be glad to show the output, but I do not want to POST again for the sake of the people managing SOME FORCE.COM WEBSITE that I am POSTing to. Thank you guys for your help.

Multiple Queries in MQL on Freebase

I am trying to get a list of results from Freebase. I have an array of MIDs. Can someone explain how I would structure the query and pass it to the API in PHP?
I'm new to MQL - I can't even seem to get the example to work:
$simplequery = array('id'=>'/topic/en/philip_k_dick', '/film/writer/film'=>array());
$jsonquerystr = json_encode($simplequery);
// The Freebase API requires a query envelope (which allows you to run multiple queries simultaneously) so we need to wrap our original, simplequery structure in two more arrays before we can pass it to the API:
$queryarray = array('q1'=>array('query'=>$simplequery));
$jsonquerystr = json_encode($queryarray);
// To send the JSON formatted MQL query to the Freebase API use cURL:
#run the query
$apiendpoint = "http://api.freebase.com/api/service/mqlread?queries";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "$apiendpoint=$jsonquerystr");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$jsonresultstr = curl_exec($ch);
curl_close($ch);
// Decoding the JSON structure back into arrays is performed using json_decode as in:
$resultarray = json_decode($jsonresultstr, true); #true:give us the json struct as an array
// Iterating over the pieces of the resultarray containing films gives us the films Philip K. Dick wrote:
$filmarray = $resultarray["q1"]["result"]["/film/writer/film"];
foreach($filmarray as $film){
print "$film<br>";
}
You're doing everything right. If you weren't, you'd be getting back error messages in your JSON result.
I think what's happened is that the data on Philip K. Dick has been updated to identify him not as the "writer" of films, but as a "film_story_contributor". (He didn't, after all, actually write any of the screenplays.)
Change your simplequery from:
$simplequery = array('id'=>'/topic/en/philip_k_dick', '/film/writer/film'=>array());
To:
$simplequery = array('id'=>'/topic/en/philip_k_dick', '/film/film_story_contributor/film_story_credits'=>array());
You actually can use the Freebase website to drill down into topics to dig up this information, but it's not that easy to find. On the basic Philip K. Dick page (http://www.freebase.com/view/en/philip_k_dick), click the "Edit and Show details" button at the bottom.
The "edit" page (http://www.freebase.com/edit/topic/en/philip_k_dick) shows the Types associated with this topic. The list includes "Film story contributor" but not "writer". Within the Film story contributor block on this page, there's a "detail view" link (http://www.freebase.com/view/en/philip_k_dick/-/film/film_story_contributor/film_story_credits). This is, essentially, what you're trying to replicate with your PHP code.
A similar drill-down on an actual film writer (e.g., Steve Martin), gets you to a property called /film/writer/film (http://www.freebase.com/view/en/steve_martin/-/film/writer/film).
Multiple Queries
You don't say exactly what you're trying to do with an array of MIDs, but firing multiple queries is as simple as adding a q2, q3, etc., all inside the $queryarray. The answers will come back inside the same structure - you can pull them out just like you pull out the q1 data. If you print out your jsonquerystr and jsonresultstr you'll see what's going on.
Modified a bit to include answer into question, as this helped me I've upvoted each, just thought I would provide a more "compleat" answer, as it were:
$simplequery = array('id'=>'/topic/en/philip_k_dick', '/film/film_story_contributor/film_story_credits'=>array());
$jsonquerystr = json_encode($simplequery);
// The Freebase API requires a query envelope (which allows you to run multiple queries simultaneously) so we need to wrap our original, simplequery structure in two more arrays before we can pass it to the API:
$queryarray = array('q1'=>array('query'=>$simplequery));
$jsonquerystr = json_encode($queryarray);
// To send the JSON formatted MQL query to the Freebase API use cURL:
#run the query
$apiendpoint = "http://api.freebase.com/api/service/mqlread?queries";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "$apiendpoint=$jsonquerystr");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$jsonresultstr = curl_exec($ch);
curl_close($ch);
// Decoding the JSON structure back into arrays is performed using json_decode as in:
$resultarray = json_decode($jsonresultstr, true); #true:give us the json struct as an associative array
// Iterating over the pieces of the resultarray containing films gives us the films Philip K. Dick wrote:
if($resultarray['code'] == '/api/status/ok'){
$films = $resultarray['q1']['result']['/film/film_story_contributor/film_story_credits'];
foreach ($films as $film){
print "$film</br>";
}
}

Looping through query links via CURL and merging result arrays in PHP

The following code is supposed to search for a term on twitter, loop through all the result pages and return one big array with the results from each page appended at each step.
foreach($search_terms as $term){
//populate the obj array by going through all pages
//set up connection
$ch = curl_init();
// go through all pages and save in an object array
for($j=1; $j<16;$j++){
$url ='http://search.twitter.com/search.json?q=' . $term .'&rpp=100&page='.$j.'';
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
$var[$j] = curl_exec($ch);
curl_close($ch);
$obj = array_merge((array)$obj,(array)json_decode($var[$j], true));
}
}
It doesn't quite work though and am getting these errors:
curl_setopt(): 3 is not a valid cURL handle resource
curl_exec(): 3 is not a valid cURL handle resource
curl_close(): 3 is not a valid cURL handle resource
...... and this is repeated all the way from 3-> 7...
curl_setopt(): 7 is not a valid cURL handle resource
curl_exec(): 7 is not a valid cURL handle resource
curl_close(): 7 is not a valid cURL handle resource
//set up connection
$ch = curl_init();
// go through all pages and save in an object array
for($j=1; $j<16;$j++){
You need the call to curl_init() inside your loop since you close it at the end of each iteration.

Categories