Problems to extract data from an external web page in PHP

Problems to extract data from an external web page in PHP - php

I have a script that is responsible for extracting names of people from an external web page by passing an ID as a parameter.
Note: The information provided by this external website is public access, everyone can check this data.
This is the code that I created:
function names($ids)
{
$url = 'https://www.exampledomain.com/es/query_data_example?name=&id='.$ids;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER,array("Accept-Lenguage: es-es,es"));
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$html = curl_exec($ch);
$error = curl_error($ch);
curl_close($ch);
preg_match_all('/<tr class="odd"><td><a href="(.*?)">/',$html ,$matches);
if (count($matches[1] == 0))
{
$result = "";
}
else if(count($matches[1] == 1))
{
$result = $matches[1][0];
$result = str_replace('/es/person/','', $result);
$result = substr($result, 0,-12);
$result = str_replace('-', ' ', $result);
$result = ucwords($result);
}
return $result;
}
Note2: in the variable $ url I have placed an example url, it is not the real url. It's just an exact example of the original URL that I use in my code.
I make the call to the function, and I show the result with an echo:
$info = names('8476756848');
echo $info;
and everything is perfect, I extracted the name of the person to whom that id belongs.
The problem arises when I try to query that function within a for(or while) loop, since I have an array with many ids
$myids = ["2809475460", "2332318975", "2587100534", "2574144252", "2611639906", "2815870980", "0924497817", "2883119946", "2376743158", "2387362041", "2804754226", "2332833975", "258971534", "2574165252", "2619016306", "2887098054", "2449781007", "2008819946", "2763767158", "2399362041", "2832047546", "2331228975", "2965871534", "2574501252", "2809475460", "2332318975", "2587100534", "2574144252", "2611639906", "2815870980", "0924497817", "2883119946", "2376743158", "2387362041", "2804754226", "2332833975", "258971534", "2574165252", "2619016306", "2887098054", "2449781007", "2008819946", "2763767158", "2399362041", "2832047546", "2331228975", "2965871534", "2574501252", "2809475460", "2332318975", "2587100534", "2574144252", "2611639906", "2815870980", "0924497817", "2883119946", "2376743158", "2387362041", "2804754226", "2332833975", "258971534", "2574165252", "2619016306", "2887098054", "2449781007", "2008819946", "2763767158", "2399362041", "2832047546", "2331228975", "2965871534", "2574501252"];
//Note: These data are for example only, they are not the real ids.
$size = count($myids);
for ($i=0; $i < $size; $i++)
{
//sleep(20);
$data = names($myids[$i]);
echo "ID IS: " . $myids[$i] . "<br> THE NAME IS: " . $data . "<br><br>";
}
The result is something like this:
ID IS: 258971534
THE NAME IS:
ID IS: 2883119946
THE NAME IS:
and so on. I mean, it shows me the Ids but the names do not extract them from the names function.
It shows me the whole list of ids but in the case of the names it does not show me any, as if the function names does not work.
If I put only 3 ids in the array and run the for loop again, then it gives me the names of those 3 ids, because they are few. But when the array contains many ids, then the function already returns no names. It is as if the multiple requests do not accept them or limit them, I do not know.
I have placed the function set_time_limit (0) at the beginning of my php file; to avoid that I get the error of excess time of 30 seconds.
because I thought that was why the function was not working, but it did not work. Also try placing a sleep (20) inside the cycle, before calling the function names to see if it was that it was making many requests very quickly to said web page but it did not work either.
This script is already in production on a server that I have hired and I have this problem that prevents my script from working properly.
Note: There may be arrays with more than 2000 ids or I am even preparing a script that will read files .txt and .csv that will contain more than 10000 ids, which I will extract from each file and call the function name, and then those ids and the names will be saved in a table from a mysql database.
Someone will know why names are not extracted when there are many ids but when they are few for example 1 or 10 the function name does work?

Related

PHP Put request (update ECWID e-commerce order with tracking)

I am working with the Ecwid API, and now moving towards updating my order from our fulfillment site with tracking info and shipping status.
Fulfillment Operation is going to export a xml file of the order update.
I have first created the basic script to update a product and this works fine.
// Post Tracking number and change Status to shipped
// trackingNumber : ""
// fulfillmentStatus : "SHIPPED"
$storeID = "";
$myToken = "";
$data = array("trackingNumber" => "9405503699300250719362", "fulfillmentStatus" => "SHIPPED", "orderNumber" => "7074");
$data_string = json_encode($data);
$url = "https://app.ecwid.com/api/v3/".urlencode($storeID)."/orders/".$data['orderNumber']."?token=".$myToken;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "PUT");
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_string);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json','Content-Length: ' . strlen($data_string)));
$response = curl_exec($ch);
curl_close($ch);
I've also created the script to pull in the xml file and convert to json to 'put' the data over to the shopping cart.
<?php
// The file data.xml contains an XML document with a root element
// and at least an element /[root]/title.
if (file_exists('data.xml')) {
$xml = simplexml_load_file('data.xml');
print_r($xml);
} else {
exit('Failed to open data.xml.');
}
$data_string = json_encode($xml);
echo '<br> br>';
echo "<pre>";
print_r($data_string);
?>
Now this is where i am lost to put the two parts together so that it would loop through the xml file (json content) with multiple "orderNumber(s)" and update the trackingNumber and fulfillmentStatus of each order.

Vitaly from Ecwid team here.
I see that you want to update orders in your Ecwid store via API from an XML file.
So the whole process is:
get details of XML file
parse data in it, find out the total number of orders there
form a loop for each order in the file
make a request to Ecwid API to update order in each loop
In your second code snippet, I see print_r($data_string); - what does it print to the screen?
I imagine the next steps would be:
Manage to correctly find order details in the XML file (order
number, tracking number) while in the loop
Make each loop update specific order in the store
For the step 1, I suggest saving data from XML file to a convenient format for you in PHP, e.g. object or array.
For example, if it was an array, it will be something like this:
Array = [recordArray 1, recordArray 2, recordArray 3]
recordArray = [ orderNumber, trackingNumber ]
For the step 2: So each loop will go through an recordArray in the Array and then get the necessary orderNumber and trackingNumber for the request.
Then the request will use this data to update an order in your Ecwid store, just like you shown in the code snippet above. However the values: 9405503699300250719362 and 7074 will be dynamic and different for each loop.
If you have any questions, please feel free to contact me: http://developers.ecwid.com/contact
Thank you.

php timeout with file_get_html

i been trying to fetch some data from wikia website by using simple_html_dom lib for php. basically what i do is to use the wikia api to convert into html render and extract data from there. After extracting, i will pump those data into mysql database to save. My problem is that, usually i will pull 300 records and i will stuck on 93 records with file_get_html being null which will cause my find() function to fail. I am not sure why is it stopping at 93 records but i have tried various solution such as
ini_set( 'default_socket_timeout', 120 );
set_time_limit( 120 );
basically i will have to access wikia page for 300 times to get those 300 records. But mostly i will manage to get 93 records before file_get_html gets to null. Any idea how can i tackle this issue?
i have test curl as well and have the same issue.
function test($url){
$ch=curl_init();
$timeout=5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$result=curl_exec($ch);
curl_close($ch);
return $result;
}
$baseurl = 'http://xxxx.wikia.com/index.php?';
foreach($resultset_wiki as $name){
// Create DOM from URL or file
$options = array("action"=>"render","title"=>$name['name']);
$baseurl .= http_build_query($options,'','&');
$html = file_get_html($baseurl);
if($html === FALSE) {
echo "issue here";
}
// this code for cURL but commented for testing with file_get_html instead
$a = test($baseurl);
$html = new simple_html_dom();
$html->load($a);
// find div stuff here and mysql data pumping here.
}
$resultsetwiki is an array with the list of title to fetch from wikia, basically resultsetwiki data set is load from db as well before performing the search.
practically i will it this type of error
Call to a member function find() on a non-object in

answered my own issue, seems to be the URL that i am using and i have changed to curl with post to post the action and title parameter instead

How to properly store and lookup areacode from a phone number

How do I store areacodes (npa-nxx) in a database for fast lookup?
Here's the deal: I have a variable that contains a phone number and I need to look in a database for the city attached to that phone number.
The problem is, different countries have different formats.
Canada/USA: +19055551234 (+1 > Country, 905 > Area Code, 555 > City Code)
France: +33512345678 (+33 > Country, 5 > Areacode, 1 > City, Other numbers > Subscriber number)
and so on (infos based on wikipedia)
I created a table called 'npanxx' that contain the list of area codes, city code and city attached to each one (with the id of the country and the province/state id):
CountryId, RegionId, PrimaryCity, npa, nxx, fnpanxx
1 11 Acton Vale 450 236 +1450236
I am thinking about the following procedure:
Get all country codes from sql to php array
Go through each entry and check if there's a match from the beginning of the phone number
When (If there's) a match is found
Remove the beginning of the phone number
Get all npa-nxx that belong to that contry and put them in a php array
Go through each value of the array to find a matching beginning
When (If there's) a match is found
Remove the beginning of the phone number
Store data in different variables like: $country = 'Canada'; $city = 'Acton Vale'...
etc, etc.
First mistake (I think): To much database requests (the npanxx table contain 3000 records for only one province in Canada)
Second mistake: I'm pretty sure there's no need to go through each and every npa-nxx code
another problem: It's not sure that if the phone number is a France one that this procedure will work.
And... If there's an entry for, let's say 336 and another for 3364, it might give the wrong result.
Do you have any idea how I can solve this problem ? (I don't ask for any code, I don't want to to do the work for me, I would like some clues though)
This is for a personnel project to make donation for Multiple Sclerosis Society of Canada and would really like to finish that project :)

I would think maybe some kind of set of reg-exes or other pattern matches to whiddle down your options in terms of search. Just some basic way or "guessing" at the possibilities instead of searching all of them.

Here's a small script I wrote in PHP to return the NPA/NXX as a JSON object in real time from area-codes.com.
It returns some very useful data. It's only for the NANP, so it doesn't do so well trying to discern international calls. For that, I would suggest making a table of all international country codes and the appropriate methods to dial them, internationally.
Additionally, network exchange operators demand an international dial code (like 011 for the USA, or + for cell phones, in general) to figure out if the number is international, and then take the steps, above, to figure out where you're trying to go. You could add this constraint into the input field and be done with it.
If you're trying to just get NPA/NXX information in the North American Numbering Plan, though, this script should be very helpful.
Just an aside, the area-codes.com counts online lookups among their free services, and I have found nothing on the site to suggest that this code violates that policy. But this code can be retooled to gather data from other providers, none-the-less.
<?php
// Small script to return and format all data from the NPA/NXX info site www.area-codes.com
// Returns a JSON object.
error_reporting(E_NONE);
$npa = $_GET['npa'];
$nxx = $_GET['nxx'];
function parseInput($input) {
$v = new DOMDocument();
$v->formatOutput = true;
$v->preserveWhiteSpace = false;
$v->loadHTML($input);
$list = $v->getElementsByTagName("td");
$e = false;
$dataOut = array();
$p = "";
foreach($list as $objNode) {
if (!$e) {
$p = $objNode->nodeValue;
$p = strtolower($p);
$p = preg_replace("%[+ .:()\/_-]%", "", $p);
$p = str_replace("\xc2\xa0", "", $p);
$p = trim($p);
}
else {
if ($p != "") {
$d = trim($objNode->nodeValue);
if ($d != "") $dataOut[$p] = $d;
}
$p = "";
}
$e = !$e;
}
return $dataOut;
}
function getNPANXX($npa, $nxx) {
$url = "www.area-codes.com/exchange/exchange.asp?npa=$npa&nxx=$nxx";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible;)");
curl_setopt($ch, CURLOPT_URL, $url);
$response = curl_exec($ch);
curl_close($ch);
$i = strpos($response, "<h3>AreaCode/Prefix $npa-$nxx Details</h3>");
$i = strpos($response, "<table width=\"100%\" border=\"0\" cellpadding=\"2\" cellspacing=\"0\">", $i);
$e = strpos($response, "</table>", $i);
$scan = substr($response, $i, ($e-$i) + 8);
return parseInput($scan);
}
$result = getNPANXX($npa, $nxx);
if (!isset($result['npaareacode'])) {
$result = array("error" => "invalid");
}
echo json_encode($result);
die;
?>
For the query npanxx.php?npa=202&nxx=520 the JSON outputs as follows:
{
"npaareacode":"202",
"nxxusetype":"WIRELESS",
"nxxprefix":"520",
"nxxintroversion":"11\/16\/2007",
"city":"WASHINGTON",
"state":"DC",
"latitude":"38.901",
"county":"DISTRICT OF COLUMBIA",
"longitude":"-77.0315",
"countypopulation":"0",
"lata":"236",
"zipcode":"20005",
"zipcodecount":"0",
"ratecenter":"WSHNGTNZN1",
"zipcodefreq":"0",
"fips":"11001",
"ocn":"6664",
"observesdst":"Unknown",
"cbsacode":"47900",
"timezone":"Eastern (GMT -05:00)",
"cbsaname":"Washington-Arlington-Alexandria, DC-VA-MD-WV",
"carriercompany":"SPRINT SPECTRUM L.P."
}
For your example npanxx.php?npa=450&nxx=236 the data returned is a little bit limited because it's Canada and Canada doesn't provide all the FIPS and carrier data like the United States does, but the returned data still quite useful:
{
"npaareacode":"450",
"nxxusetype":"WIRELESS",
"nxxprefix":"236",
"nxxintroversion":"2002-08-04",
"city":"ACTON VALE",
"state":"QC",
"latitude":"45.6523",
"longitude":"-72.5671",
"countypopulation":"51400",
"lata":"850",
"zipcodecount":"0",
"zipcodefreq":"-1",
"observesdst":"Unknown",
"timezone":"Eastern (GMT -05:00)"
}

Passing updated value to function (twitter api max_id problems)

I am trying to work with the Twitter search API, I found a php library that does authentication with app-only auth and I added the max_id argument to it, however, I would like to run 450 queries per 15 minutes (as per the rate-limit) and I am not sure about how to pass the max_id. So I run it first with the default 0 value, and then it gets the max_id result from the API's response and runs the function again, but this time with the retrieved max_id value and does this 450 times. I tried a few things, and I can get the max_id result after calling the function, but I don't know how to pass it back and tell it to call the function with the updated value.
<?php
function search_for_a_term($bearer_token, $query, $result_type='mixed', $count='15', $max_id='0'){
$url = "https://api.twitter.com/1.1/search/tweets.json"; // base url
$q = $query; // query term
$formed_url ='?q='.$q; // fully formed url
if($result_type!='mixed'){$formed_url = $formed_url.'&result_type='.$result_type;} // result type - mixed(default), recent, popular
if($count!='15'){$formed_url = $formed_url.'&count='.$count;} // results per page - defaulted to 15
$formed_url = $formed_url.'&include_entities=true'; // makes sure the entities are included
if($max_id!='0'){$formed_url=$formed_url.'&max_id='.$max_id;}
$headers = array(
"GET /1.1/search/tweets.json".$formed_url." HTTP/1.1",
"Host: api.twitter.com",
"User-Agent: jonhurlock Twitter Application-only OAuth App v.1",
"Authorization: Bearer ".$bearer_token."",
);
$ch = curl_init(); // setup a curl
curl_setopt($ch, CURLOPT_URL,$url.$formed_url); // set url to send to
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers); // set custom headers
ob_start(); // start ouput buffering
$output = curl_exec ($ch); // execute the curl
$retrievedhtml = ob_get_contents(); // grab the retreived html
ob_end_clean(); //End buffering and clean output
curl_close($ch); // close the curl
$result= json_decode($retrievedhtml, true);
return $result;
}
$results=search_for_a_term("mybearertoken", "mysearchterm");
/* would like to get all kinds of info from here and put it into a mysql database */
$max_id=$results["search_metadata"]["max_id_str"];
print $max_id; //this gives me the max_id for that page
?>
I know that there are must be some existing libraries that do this, but I can't use any of the libraries, since none of them have updated to the app-only auth yet.
EDIT: I put a loop in the beginning of the script, to run e.g. 3 times, and then put a print statement to see what happens, but it only prints out the same max_id, doesn't access three different ones.
do{
$result = search_for_a_term("mybearertoken", "searchterm", $max_id);
$max_id = $result["search_metadata"]["max_id_str"];
$i++;
print ' '.$max_id.' ';
}while($i < 3);

Multiple Queries in MQL on Freebase

I am trying to get a list of results from Freebase. I have an array of MIDs. Can someone explain how I would structure the query and pass it to the API in PHP?
I'm new to MQL - I can't even seem to get the example to work:
$simplequery = array('id'=>'/topic/en/philip_k_dick', '/film/writer/film'=>array());
$jsonquerystr = json_encode($simplequery);
// The Freebase API requires a query envelope (which allows you to run multiple queries simultaneously) so we need to wrap our original, simplequery structure in two more arrays before we can pass it to the API:
$queryarray = array('q1'=>array('query'=>$simplequery));
$jsonquerystr = json_encode($queryarray);
// To send the JSON formatted MQL query to the Freebase API use cURL:
#run the query
$apiendpoint = "http://api.freebase.com/api/service/mqlread?queries";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "$apiendpoint=$jsonquerystr");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$jsonresultstr = curl_exec($ch);
curl_close($ch);
// Decoding the JSON structure back into arrays is performed using json_decode as in:
$resultarray = json_decode($jsonresultstr, true); #true:give us the json struct as an array
// Iterating over the pieces of the resultarray containing films gives us the films Philip K. Dick wrote:
$filmarray = $resultarray["q1"]["result"]["/film/writer/film"];
foreach($filmarray as $film){
print "$film<br>";
}

You're doing everything right. If you weren't, you'd be getting back error messages in your JSON result.
I think what's happened is that the data on Philip K. Dick has been updated to identify him not as the "writer" of films, but as a "film_story_contributor". (He didn't, after all, actually write any of the screenplays.)
Change your simplequery from:
$simplequery = array('id'=>'/topic/en/philip_k_dick', '/film/writer/film'=>array());
To:
$simplequery = array('id'=>'/topic/en/philip_k_dick', '/film/film_story_contributor/film_story_credits'=>array());
You actually can use the Freebase website to drill down into topics to dig up this information, but it's not that easy to find. On the basic Philip K. Dick page (http://www.freebase.com/view/en/philip_k_dick), click the "Edit and Show details" button at the bottom.
The "edit" page (http://www.freebase.com/edit/topic/en/philip_k_dick) shows the Types associated with this topic. The list includes "Film story contributor" but not "writer". Within the Film story contributor block on this page, there's a "detail view" link (http://www.freebase.com/view/en/philip_k_dick/-/film/film_story_contributor/film_story_credits). This is, essentially, what you're trying to replicate with your PHP code.
A similar drill-down on an actual film writer (e.g., Steve Martin), gets you to a property called /film/writer/film (http://www.freebase.com/view/en/steve_martin/-/film/writer/film).
Multiple Queries
You don't say exactly what you're trying to do with an array of MIDs, but firing multiple queries is as simple as adding a q2, q3, etc., all inside the $queryarray. The answers will come back inside the same structure - you can pull them out just like you pull out the q1 data. If you print out your jsonquerystr and jsonresultstr you'll see what's going on.

Modified a bit to include answer into question, as this helped me I've upvoted each, just thought I would provide a more "compleat" answer, as it were:
$simplequery = array('id'=>'/topic/en/philip_k_dick', '/film/film_story_contributor/film_story_credits'=>array());
$jsonquerystr = json_encode($simplequery);
// The Freebase API requires a query envelope (which allows you to run multiple queries simultaneously) so we need to wrap our original, simplequery structure in two more arrays before we can pass it to the API:
$queryarray = array('q1'=>array('query'=>$simplequery));
$jsonquerystr = json_encode($queryarray);
// To send the JSON formatted MQL query to the Freebase API use cURL:
#run the query
$apiendpoint = "http://api.freebase.com/api/service/mqlread?queries";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "$apiendpoint=$jsonquerystr");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$jsonresultstr = curl_exec($ch);
curl_close($ch);
// Decoding the JSON structure back into arrays is performed using json_decode as in:
$resultarray = json_decode($jsonresultstr, true); #true:give us the json struct as an associative array
// Iterating over the pieces of the resultarray containing films gives us the films Philip K. Dick wrote:
if($resultarray['code'] == '/api/status/ok'){
$films = $resultarray['q1']['result']['/film/film_story_contributor/film_story_credits'];
foreach ($films as $film){
print "$film</br>";
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Problems to extract data from an external web page in PHP - php

Related

PHP Put request (update ECWID e-commerce order with tracking)

php timeout with file_get_html

How to properly store and lookup areacode from a phone number

Passing updated value to function (twitter api max_id problems)

Multiple Queries in MQL on Freebase

Categories

Resources