Querying Content from Wikipedia - php

I am trying to fetch the first para of the Wikipedia article using the following script. When i query with multiple words it doesn't work.
<?php
$query = urlencode($_GET['query']);
$url = "http://en.wikipedia.org/w/api.php?action=parse&page=$query&format=json&prop=text&section=0";
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript"); // required by wikipedia.org server; use YOUR user agent with YOUR contact information. (otherwise your IP might get blocked)
$c = curl_exec($ch);
$json = json_decode($c);
$content = $json->{'parse'}->{'text'}->{'*'}; // get the main text content of the query (it's parsed HTML)
// pattern for first match of a paragraph
$pattern = '#<p>(.*)</p>#Us'; // http://www.phpbuilder.com/board/showthread.php?t=10352690
if(preg_match($pattern, $content, $matches))
{
// print $matches[0]; // content of the first paragraph (including wrapping <p> tag)
$cont = strip_tags($matches[1]); // Content of the first paragraph without the HTML tags.
}
$pattern = '/\[([^\[\]]|(?R))*]|\(([^()]|(?R))*\)/';
echo $my = preg_replace($pattern, '', $cont);
?>
Demo 1: Bangalore
Demo 2: Los Angeles
Is there anyway to query for the results from Wikipedia and by default select the first Result.

You need to url encode your query string before passing it to curl.
<?php $query = urlencode($_GET['query']); ?>
EDIT: I tried your code and it worked by replacing whitespaces by the character '+'.
The url encode did not work because it replaced them by '%20'.
Try this
$query = str_replace(' ', '+', $_GET['query']);
Here is the output I get with Los Angeles and New Delhi
iMac-de-Valentin:so valentin$ php so.php
Los Angeles , officially the City of Los Angeles, often known by its initials L.A., is the most populous city in the U.S. state of California and the second-most populous in the United States, after New York City, with a population at the 2010 United States Census of 3,792,621. It has a land area of 469 square miles , and is located in Southern California.
iMac-de-Valentin:so valentin$ php so.php
New Delhi i/ˈnjuː dɛli/ is the capital of India and seat of the executive, legislative, and judiciary branches of the Government of India. It is also the centre of the Government of the National Capital Territory of Delhi. New Delhi is situated within the metropolis of Delhi and is one of the eleven districts of Delhi National Capital Territory.
iMac-de-Valentin:so valentin$

Related

How do you uppercase only certain parts of a single string that's in format "Town comma Initials"?

I have a single location field where people can enter whatever they want, but generally they will enter something in the format of "Town, Initials". So for example, these entries...
New york, Ny
columbia, sc
charleston
washington, DC
BISMARCK, ND
would ideally become...
New York, NY
Columbia, SC
Charleston
Washington, DC
Bismarck, ND
Obviously I can use ucfirst() on the string to handle the first character, but these are things I'm not sure how to do (if they can be done at all)...
Capitalizing everything after a comma
Lowercasing everything before the comma (aside from the first character)
Is this easily doable or do I need to use some sort of regex function?
You could simply chop it up and fix it.
<?php
$geo = 'New york, Ny
columbia, sc
charleston
washington, DC
BISMARCK, ND';
$geo = explode(PHP_EOL, $geo);
foreach ($geo as $str) {
// chop
$str = explode(',', $str);
// fix
echo
(!empty($str[0]) ? ucwords(strtolower(trim($str[0]))) : null).
(!empty($str[1]) ? ', '.strtoupper(trim($str[1])) : null).PHP_EOL;
}
https://3v4l.org/ojl2M
Though you should not trust the user to enter the correct format. Instead find a huge list of all states and auto complete them in. Perhaps something like https://gist.github.com/maxrice/2776900 - then validate against it.

Parsing Wikipedia API json in PHP

I'm successfully returning json from Wikipedia but am not having any luck grabbing the value I need in PHP (trying to do this in a Drupal site).
Here is the code I'm using, you can substitute $safeurl for this value:
Squantz%20Pond%20State%20Park
<?php
$safeurl = str_replace(' ', '%20', $title);
$json_string = file_get_contents("http://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exintro=&titles=" . $safeurl);
$parsed_json = json_decode($json_string, true);
$text = $parsed_json->{'extract'};
print "What I need:" . $text;
?>
If I print $json_string out in my HTML I see the following text which contains what I'm going after, the "extract" value. I just can't figure out what $text needs to be to grab that paragraph.
{"query":{"pages":{"1332160":{"pageid":1332160,"ns":0,"title":"Squantz Pond State Park","extract":"Squantz Pond State Park is a state park located 10 miles (16\u00a0km) north of Danbury in the town of New Fairfield, Connecticut. The park offers opportunities for swimming, fishing, hiking and boating.\n"}}}}
You need to change your json_decode to
$parsed_json = json_decode($json_string);
Since, you pass the true the $parsed_json will become an array. So remove the true flag.
and access it like ...
$text = $parsed_json->query->pages->{1332160}->extract;
What if 1332160 is not known ?
Proceed like this..
foreach($parsed_json->query->pages as $k)
{
echo $k->extract;
}

PHP line parsed into separate objects

I have a line of code in my wordpress widget that outputs from an RSS feed:
<?php echo $entry->title ?>
and when displayed it looks like:
$220,000 :: 504 Freemason St, Unit 2B, Norfolk VA, 23510
or
$274,900 :: 1268 Bells Road, Virginia Beach VA, 23454
What is the easiest way to break this up into different objects?
For example, I'd like to have the price, street name, and city state zip in different objects. The problem is that some of the addresses have unit numbers and it's complicating things. Below is an example of how I would like it to work:
<?php echo $entry->price ?>
<?php echo $entry->street ?>
<?php echo $entry->citystatezip ?>
$220,000
504 Freemason St, Unit 2B
Norfolk VA, 23510
or
$274,900
1268 Bells Road
Virginia Beach VA, 23454
Here is a very crude regex that seems able to parse your string. I'm not the best with regexes, but it seems to work.
/^(\$(?:\d{1,3},?)*) :: (\d* [\w\s,\d]*), ([\w\s]* \w{2}, \d{5})$/
Use this with preg_match; the 1st group is the price, the 2nd is the address, and 3rd is the city/state/zip.
Example:
<?php
$ptn = '/^(\$(?:\d{1,3},?)*) :: (\d* [\w\s,\d]*), ([\w\s]* \w{2}, \d{5})$/';
if(preg_match($ptn, $entry->title, $match) === 1){
$price = $match[1];
$street = $match[2];
$citystatezip = $match[3];
}
What you need is a regular expression , check http://php.net/manual/en/function.preg-match.php
Use f.e. array explode ( string $delimiter , string $string [, int $limit ] ) which will give you array with strings if you use correct delimiter
The code below will fill your $entry object as required:
$string = '$274,900 :: 1268 Bells Road, Virginia Beach VA, 23454';
$pricePart = explode('::', $string);
$addressPart = explode(',', $pricePart[1]);
$entry = new stdClass();
$entry->price = trim($pricePart[0]);
if ( count($addressPart) == 3 ) {
$entry->street = trim($addressPart[0]);
$entry->citystatezip = trim($addressPart[1]) . ', ' . trim($addressPart[2]);
} else {
$entry->street = trim($addressPart[0]) . ', ' . trim($addressPart[1]);
$entry->citystatezip = trim($addressPart[2]) . ', ' . trim($addressPart[3]);
}
Updated answer to handle the unit bit
Update: changed array names, I hate $array.. names.. even if its just a mockup
(Note: this code isn't the prettiest, but its ment to give a base to work on. It should be cleaned up and improved a bit)

PHP Complex String Parse, JSON'able?

So I have the following PHP string:
$output = {"playerId":1178,"percentChange":0.1,"averageDraftPosition":260,"percentOwned":0.1,"mostRecentNews":{"news":"Accardo was called up from Columbus on Monday, the Indians' official Twitter feed reports.","spin":"He'll replace Dan Wheeler on the active roster after carrying a 2.76 ERA over 13 appearances with the Clippers to start the season.","date":"Mon May 14"},"fullName":"Jeremy Accardo"}
What I need is: "Accardo was called up from Columbus on Monday, the Indians' official Twitter feed reports." and "He'll replace Dan Wheeler on the active roster after carrying a 2.76 ERA over 13 appearances with the Clippers to start the season." as substrings. But I can't seem to figure out the best and most elegant way to do that. I tried to JSON_decode the string but I get nothing returned. Any ideas? (I am using PHP)
That's not a string. Try like this:
$output = '{"playerId":1178,"percentChange":0.1,"averageDraftPosition":260,"percentOwned":0.1,"mostRecentNews":{"news":"Accardo was called up from Columbus on Monday, the Indians\' official Twitter feed reports.","spin":"He\'ll replace Dan Wheeler on the active roster after carrying a 2.76 ERA over 13 appearances with the Clippers to start the season.","date":"Mon May 14"},"fullName":"Jeremy Accardo"}';
$object = json_decode($output);
$array = json_decode($output, true);
$string = json_encode($array);
you have few unescaped string, that is causing the error. a simple formatting could have saved you the time.
$output = '{
"playerId":1178,
"percentChange":0.1,
"averageDraftPosition":260,
"percentOwned":0.1,
"mostRecentNews": {
"news":"Accardo was called up from Columbus on Monday, the Indians official Twitter feed reports",
"spin":"Hell replace Dan Wheeler on the active roster after carrying a 2.76 ERA over 13 appearances with the Clippers to start the season.",
"date":"Mon May 14"
},
"fullName":"Jeremy Accardo"
}';
$json = json_decode($output);
Did you try this?
$output = '{"playerId":1178,"percentChange":0.1,"averageDraftPosition":260,"percentOwned":0.1,"mostRecentNews":{"news":"Accardo was called up from Columbus on Monday, the Indians\' official Twitter feed reports.","spin":"He\'ll replace Dan Wheeler on the active roster after carrying a 2.76 ERA over 13 appearances with the Clippers to start the season.","date":"Mon May 14"},"fullName":"Jeremy Accardo"}';
$array = json_decode($output, true);
echo $array['mostRecentNews']['news'];
echo $array['mostRecentNews']['spin'];
json_encode with with only UTF8. are you using allthings with utf8? and you hava synstax error. if you define json variable manually it could be like this;
<?php
$output=<<<JSONSTR
{"playerId":1178,"percentChange":0.1,"averageDraftPosition":260,"percentOwned":0.1,"mostRecentNews":{"news":"Accardo was called up from Columbus on Monday, the Indians' official Twitter feed reports.","spin":"He'll replace Dan Wheeler on the active roster after carrying a 2.76 ERA over 13 appearances with the Clippers to start the season.","date":"Mon May 14"},"fullName":"Jeremy Accardo"}
JSONSTR;
$variable = json_decode($output);
var_dump($variable);
?>

How do I parse visitors by country info from alexa?

if you search alexa with any URL's you will get a detailed traffic information of the same.
what I am looking into is I would like to parse Visitors by Country info from alexa.
example for google.com
url is - http://www.alexa.com/siteinfo/google.com.
on the Audience tab you can see:
Visitors by Country for Google.com
United States 35.0%
India 8.8%
China 4.1%
Germany 3.4%
United Kingdom 3.2%
Brazil 3.2%
Iran 2.8%
Japan 2.1%
Russia 2.0%
Italy 1.9%
Brazil 3.2%
Iran 2.8%
Japan 2.1%
Russia 2.0%
Italy 1.9%
Indonesia 1.7% //etc.
How can I get only these info from alexa.com?? I have tried with preg_match function but it is very difficult in this case....
If you don't want to use DOM and getElementById which is the most elegant solution in this case, you can try regexp:
$data = file_get_contents('http://www.alexa.com/siteinfo/google.com');
preg_match_all(
'/<a href="\/topsites\/countries\/(.*)">(.*)<\/a>/mU',
$data,
$result,
PREG_SET_ORDER
);
The DOM solution looks like:
$doc = new DomDocument;
$doc->loadHTMLFile('http://www.alexa.com/siteinfo/google.com');
$data = $doc->getElementById('visitors-by-country');
$my_data = $data->getElementsByTagName('div');
$countries = array();
foreach ($my_data as $node)
{
foreach($node->getElementsByTagName('a') as $href)
{
preg_match('/([0-9\.\%]+)/',$node->nodeValue, $match);
$countries[trim($href->nodeValue)] = $match[0];
}
}
var_dump($countries);

Categories