How to get Wikipedia content section by section using Wikipedia API - PHP - php

Is there any better way to fetch text contents of particular sections from wikipedia. I have the below code to skip some sections but the process is taking too long to fetch data what am looking for.
for($i=0;$i>10;$i++){
if($i != 2 || $i != 4){
$url = 'http://en.wikipedia.org/w/api.php?action=parse&page=ramanagara&format=json&prop=text&section='.$i;
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript");
$c = curl_exec($ch);
$json = json_decode($c);
$content = $json->{'parse'}->{'text'}->{'*'};
print preg_replace('/<\/?a[^>]*>/','',$content);
}
}

For starters, you're telling this to loop until $i is greater than 10, which in practice, will loop until the server request times out. Change it to $i<10, or if you need only a handful of sections, try:
foreach (array(1,3,5,6,7) as $i)
//your code
Second, decoding JSON into an associative array like this:
$json = json_decode($c, true);
And referencing it like $json['parse']['text']['*'] is easier to work with, but that's up to you.
And third, you'll find that strip_tags() will likely function faster and more accurately than stripping tags with regular expressions.

Related

How to get a specified row using cUrl PHP

Hey guys I use curl to communicate web external server, but the type of response is html, I was able to convert it to json code (more than 4000 row) but I have no idea how to get specified row which contains my result. Any idea ?
Here is my cUrl code :
require_once('getJson.php');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.reputationauthority.org/domain_lookup.php?ip=website.com&Submit.x=9&Submit.y=5&Submit=Search');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$data = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
$data = '<<<EOF'.$data.'EOF';
$json = new GetJson();
header("Content-Type: text/plain");
$res = json_encode($json->html_to_obj($data), JSON_PRETTY_PRINT);
$myArray = json_decode($res,true);
For getJson.php
class GetJson{
function html_to_obj($html) {
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
return $this->element_to_obj($dom->documentElement);
}
function element_to_obj($element) {
if ($element->nodeType == XML_ELEMENT_NODE){
$obj = array( "tag" => $element->tagName );
foreach ($element->attributes as $attribute) {
$obj[$attribute->name] = $attribute->value;
}
foreach ($element->childNodes as $subElement) {
if ($subElement->nodeType == XML_TEXT_NODE) {
$obj["html"] = $subElement->wholeText;
}
else {
$obj["children"][] = $this->element_to_obj($subElement);
}
}
return $obj;
}
}
}
My idea is instead of Browsing rows to achieve lign 2175 (doing something like : $data['children'][2]['children'][7]['children'][3]['children'][1]['children'][1]['children'][0]['children'][1]['children'][0]['children'][1]['children'][2]['children'][0]['children'][0]['html'] is not a good idea to me), I want to go directly to it.
If the HTML being returned has a consistent structure every time, and you just want one particular value from one part of it, you may be able to use regular expressions to parse the HTML and find the part you need. This is an alternative you trying to put the whole thing into an array. I have used this technique before to parse a HTML document and find a specific item. Here's a simple example. You will need to adapt it to your needs, since you haven't specified the exact nature of the data you're seeking. You may need to go down several levels of parsing to find the right bit:
$data = curl_exec($ch);
//Split the output into an array that we can loop through line by line
$array = preg_split('/\n/',$data);
//For each line in the output
foreach ($array as $element)
{
//See if the line contains a hyperlink
if (preg_match("/<a href/", "$element"))
{
...[do something here, e.g. store the data retrieved, or do more matching to find something within it]...
}
}

How to change values in php?

How can I change value of price (in wordpress) which is set for numeric values? I want to change the value to display text or numeric from url (scraping api)
right now my class_core.php file shows this:
Price Display
========================================================================== */
function PRICE($val){
// RETURN IF NOT NUMERIC
if(!is_numeric($val) && defined('WLT_JOBS') ){ return $val; }
if(isset($GLOBALS['CORE_THEME']['currency'])){
$seperator = "."; $sep = ","; $digs = 2;
if(is_numeric($val)){
$val = number_format($val,$digs, $seperator, $sep);
}
$val = hook_price_filter($val);
// RETURN IF EMPTY
if($val == ""){ return $val; }
// LEFT/RIGHT POSITION
if(isset($GLOBALS['CORE_THEME']['currency']['position']) && $GLOBALS['CORE_THEME']['currency']['position'] == "right"){
if(substr($val,-3) == ".00"){ $val = substr($val,0,-3); }
$val = $val.$GLOBALS['CORE_THEME']['currency']['symbol'];
}else{
$val = $GLOBALS['CORE_THEME']['currency']['symbol'].$val;
}
}
php is a scripting language. you dont have to declare what kind of variable you will be using. You just declare the name and the type of the variable change automatically depending on what data are you storing.
If you have a url that contains some information, like (www.xyz.com/dddddd/ddddd) you can use CURL to obtain a result...
(ref: http://www.jonasjohn.de/snippets/php/curl-example.htm)
function curl_download($Url){
// is cURL installed yet?
if (!function_exists('curl_init')){
die('Sorry cURL is not installed!');
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_REFERER, "http://www.example.org/yay.htm");
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
and then in your code...
$url_for_value = "www.xyz.com/dddddd/ddddd";
// remember to add http colon and two slashes in front of url...
// stackoverflow tools won't let me do that here...
$val = curl_download($url_for_value);
function PRICE($val){
if(!is_numeric($val) && defined('WLT_JOBS') ){
// if not numeric, e.g. $100 , strip off non-numeric characters.
preg_match_all('/([\d]+)/', $val, $match);
// Do we have a valid number now?
if (!is_numeric($match[0]){
// perform other tests on return info from the CURL function?
return $val;
}
$val = $match[0];
}
if(isset($GLOBALS['CORE_THEME']['currency'])){ ....
Note: Its certainly admirable to have a need for a specific function, and then use that need to motivate you to learn new skills. This project assumes a certain experience in HTML, PHP and WordPress. If you don't feel comfortable in that stuff yet, that's okay, we all started knowing nothing.
Here's a possible learning roadmap:
--HTML Learn the organization of a website, elements, and how to create forms, buttons, etc...
--PHP This is a scripting language, runs on a server.
--CSS You will need this for WordPress. (Why? Because we insist on you using a child theme, and that will require to understand how CSS works. )
--JavaScript, although not absolutely required, lots of existing tools use this.
There are a lot of free tutorials on this stuff. I'd probably start at http://html.net/ or somewhere like that. Do all the tutorials.
After that you get to jump into WordPress. Start small, modify a few sites, then grow to writing your own plugins. At that point, I think you should be able to easily create the functionality you are looking for.
If not, it could well be quicker to hire the job out. eLance is your friend.

PHP file_get_contents error, wouldn't populate from an array?

I've been trying to write a simple script in PHP to pull off data from a ISBN database site. and for some reason I've had nothing but issues using the file_get_contents command.. I've managed to get something working for this now, but would just like to see if anyone knows why this wasn't working?
The below would not populate the $page with any information so the preg matches below failed to get any information. If anyone knows what the hell was stopping this would be great?
$links = array ('
http://www.isbndb.com/book/2009_cfa_exam_level_2_schweser_practice_exams_volume_2','
http://www.isbndb.com/book/uniform_investment_adviser_law_exam_series_65','
http://www.isbndb.com/book/waterworks_a02','
http://www.isbndb.com/book/winning_the_toughest_customer_the_essential_guide_to_selling','
http://www.isbndb.com/book/yale_daily_news_guide_to_fellowships_and_grants'
); // array of URLs
foreach ($links as $link)
{
$page = file_get_contents($link);
#print $page;
preg_match("#<h1 itemprop='name'>(.*?)</h1>#is",$page,$title);
preg_match("#<a itemprop='publisher' href='http://isbndb.com/publisher/(.*?)'>(.*?)</a>#is",$page,$publisher);
preg_match("#<span>ISBN10: <span itemprop='isbn'>(.*?)</span>#is",$page,$isbn10);
preg_match("#<span>ISBN13: <span itemprop='isbn'>(.*?)</span>#is",$page,$isbn13);
echo '<tr>
<td>'.$title[1].'</td>
<td>'.$publisher[2].'</td>
<td>'.$isbn10[1].'</td>
<td>'.$isbn13[1].'</td>
</tr>';
#exit();
}
My guess is you have wrong (not direct) URLs. Proper ones should be without the www. part - if you fire any of them and inspect the returned headers, you'll see that you're redirected (HTTP 301) to another URL.
The best way to do it in my opinion is to use cURL among curl_setopt with options CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS.
Of course you should trim your urls beforehands just to be sure it's not the problem.
Example here:
$curl = curl_init();
foreach ($links as $link) {
curl_setopt($curl, CURLOPT_URL, $link);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_MAXREDIRS, 5); // max 5 redirects
$result = curl_exec($curl);
if (! $result) {
continue; // if $result is empty or false - ignore and continue;
}
// do what you need to do here
}
curl_close($curl);

php - xml data parsing

I am working on getting some xml data into a php variable so I can easily call it in my html webpage. I am using this code below:
$ch = curl_init();
$timeout = 5;
$url = "http://www.dictionaryapi.com/api/v1/references/collegiate/xml/define?key=0b03f103-f6a7-4bb1-9136-11ab4e7b5294";
$definition = simplexml_load_file($url);
echo $definition->entry[0]->def;
However my results are: .
I am not sure what I am doing wrong and I have followed the php manual, so I am guessing it is something obvious but I am just not understanding it correctly.
The actual xml results from that link used in cURL are visible by clicking the link below , I did not post it because it is rather long:
http://www.dictionaryapi.com/api/v1/references/sd3/xml/test?key=9d92e6bd-a94b-45c5-9128-bc0f0908103d
<?php
$ch = curl_init();
$timeout = 5;
$url = "http://www.dictionaryapi.com/api/v1/references/collegiate/xml/define?key=0b03f103-f6a7-4bb1-9136-11ab4e7b5294";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch); // you were missing a semicolon
$definition = new SimpleXMLElement($data);
echo '<pre>';
print_r($definition->entry[0]->def);
echo '</pre>';
// this returns the SimpleXML Object
// to get parts, you can do something like this...
foreach($definition->entry[0]->def[0] as $entry) {
echo $entry[0] . "<br />";
}
// which returns
transitive verb
14th century
1 a
:to determine or identify the essential qualities or meaning of
b
:to discover and set forth the meaning of (as a word)
c
:to create on a computer
2 a
:to fix or mark the limits of :
b
:to make distinct, clear, or detailed especially in outline
3
:
intransitive verb
:to make a
Working Demo

Get Random Url from website

I want to search number of links or URL on http://public-domain-content.com
and store them in an array and then just randomly select any one from array and just display or echo
How can i do that in php
If I understood what you're asking, you can achieve this using file_get_contents();
After using file_get_contents($url), which gives you a string, you can loop through the result string searching for spaces to tell the words apart. Count the number of words, and store the words in an array accordingly. Then just choose a random element from the array using array_rand()
However, sometimes there are security problems with file_get_contents().
You can override this using the following function:
function get_url_contents($url)
{
$crl = curl_init();
$timeout = 5;
curl_setopt ($crl, CURLOPT_URL,$url);
curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
$ret = curl_exec($crl);
curl_close($crl);
return $ret;
}
http://php.net/manual/en/function.curl-setopt.php <--- Explanation about curl
Example code:
$url = "http://www.xxxxx.xxx"; //Set the website you want to get content from
$str = file_get_contents($url); //Get the contents of the website
$built_str = ""; //This string will hold the valid URLs
$strarr = explode(" ", $str); //Explode string into array(every space a new element)
for ($i = 0; $i < count($strarr); $i++) //Start looping through the array
{
$current = #parse_url($strarr[$i]) //Attempt to parse the current element of the array
if ($current) //If parse_url() returned true(URL is valid)
{
$built_str .= $current . " "; //Add the valid URL to the new string with " "
}
else
{
//URL invalid. Do something here
}
}
$built_arr = explode(" ", $built_str) //Same as we did with $str_arr. This is why we added a space to $built_str every time the URL was valid. So we could use it now to split the string into an array
echo $built_arr[array_rand($built_arr)]; // Display a random element from our built array
There is also a more extended version to checking URLs, which you can explore here:
http://forums.digitalpoint.com/showthread.php?t=326016
Good luck.

Categories