Parsing Wikipedia API json in PHP

Parsing Wikipedia API json in PHP - php

I'm successfully returning json from Wikipedia but am not having any luck grabbing the value I need in PHP (trying to do this in a Drupal site).
Here is the code I'm using, you can substitute $safeurl for this value:
Squantz%20Pond%20State%20Park
<?php
$safeurl = str_replace(' ', '%20', $title);
$json_string = file_get_contents("http://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exintro=&titles=" . $safeurl);
$parsed_json = json_decode($json_string, true);
$text = $parsed_json->{'extract'};
print "What I need:" . $text;
?>
If I print $json_string out in my HTML I see the following text which contains what I'm going after, the "extract" value. I just can't figure out what $text needs to be to grab that paragraph.
{"query":{"pages":{"1332160":{"pageid":1332160,"ns":0,"title":"Squantz Pond State Park","extract":"Squantz Pond State Park is a state park located 10 miles (16\u00a0km) north of Danbury in the town of New Fairfield, Connecticut. The park offers opportunities for swimming, fishing, hiking and boating.\n"}}}}

You need to change your json_decode to
$parsed_json = json_decode($json_string);
Since, you pass the true the $parsed_json will become an array. So remove the true flag.
and access it like ...
$text = $parsed_json->query->pages->{1332160}->extract;
What if 1332160 is not known ?
Proceed like this..
foreach($parsed_json->query->pages as $k)
{
echo $k->extract;
}

Related

Wrapping String PHP

I have a problem with my code, i have this code that create image from external source of image & string. I used json to get the string.
My problem is if i used the string from json data i could not get the proper wrapping of string like this:
http://prntscr.com/dbhg4n
$url = 'https://bible-api.com/Psalm100:4-5?translation=kjv';
$JSON = file_get_contents($url);
$data = json_decode($JSON);
$string = $data->text;
But if i declare and set string directly i got the output that i want like this:
http://prntscr.com/dbhg7q
$string = "Enter into his gates with thanksgiving, and into his courts with praise: be thankful unto him, and bless his name. For the Lord is good; his mercy is everlasting; and his truth endureth to all generations.";
I dont think the error or the problem is on the code for wrapping the text on my image. I think it is on the json data. How can i fix this?

The text has \n symblols. Just replace them:
$string = preg_replace("/\n/", ' ', $data->text);
or without a regular expression:
$string = str_replace("\n", ' ', $data->text);

unicode chars with wikipedia search in PHP

I pass a PHP string to wikipedia search page in order to retrieve part of the definition.
Everythin works fine, except unicode chars which appear in the \u... form. Here is an example to explain myself better. As you can see, the phonetic transcription of the name is not readable:
Henrik Ibsen, Henrik Ibsen \u02c8h\u025bn\u027eik \u02c8ips\u0259n
(Skien, 20 marzo 1828 - Oslo, 23 maggio 1906) è stato uno scrittore,
drammaturgo, poeta e regista teatrale norvegese.
The code I use to get the snippet from Wikipedia is this:
$word = $_GET["word"];
$html = file_get_contents('https://it.wikipedia.org/w/api.php?action=opensearch&search='.$word);
$utf8html = html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", $html), ENT_NOQUOTES, 'UTF-8');
The last line of my code does not solve the problem.
Do you know how to get a clean text which is entirely readable?

The output of the Wikipedia search API is JSON. Don't try to scrape bits out of it and parse string literal escapes yourself, that way madness lies. Just use a readily available JSON parser.
Also, you need to URL-escape the word when you add it into a query string, otherwise any searches for words with URL-special characters in will fail.
In summary:
$word = $_GET['word'];
$url = 'https://it.wikipedia.org/w/api.php?action=opensearch&search='.urlencode($word);
$response = json_decode(file_get_contents($url));
$matching_titles_array = $response[1];
$matching_summaries_array = $response[2];
$matching_urls = $response[3];
...etc...

You got some errors in your regex string, try using:
<?php
$str = "Henrik Ibsen, Henrik Ibsen \u02c8h\u025bn\u027eik \u02c8ips\u0259n(Skien, 20 marzo 1828 - Oslo, 23 maggio 1906) è stato uno scrittore, drammaturgo, poeta e regista teatrale norvegese.";
$utf8html = preg_replace('#\\\U([0-9A-F]{4})#i', "&#x\\1", $str);
echo $utf8html;

Well, the answer posted by bobince is certainly more effective than my previous procedure, which aimed at scraping and pruning bit by bit what I needed. Just to show you how I was doing it, here is my previous code:
$html = file_get_contents('https://it.wikipedia.org/w/api.php?action=opensearch&search='.$s);
$decoded = preg_replace('#\\\U([0-9A-F]{4})#i', "&#x\\1", $html);
$par = array("[", "]");
$def_no_par = str_replace($par, "", $decoded);
$def_no_vir = str_replace("\"\",", "", $def_no_par);
$def_cap = str_replace("\",", "\",<br>", $def_no_vir);
$def_pulita = str_replace("\"", "", $def_cap);
$def_clean = str_replace(".,", ".", $def_pulita);
$definizione = str_replace("$s,", "", $def_clean);
$out = str_replace("\\", "\"", $definizione);
As you can see, removing parts of the output to make it more readable was quite tiresome (and not completely successful).
Using the JSON approach makes everything more linear. Here is my new workaround:
$search = 'https://it.wikipedia.org/w/api.php?action=opensearch&search='.urlencode($s);
$response = json_decode(file_get_contents($search));
$matching_titles_array = $response[1];
$matching_summaries_array = $response[2];
$matching_urls = $response[3];
echo '<h3><div align="center"><font color=" #A3A375">'.$titolo.'</font></div></h3><br><br>';
foreach($response[1] as $t) {
echo '<font color="#5C85D6"><b>'.$t.'</b></font><br><br>';
}
foreach($response[2] as $s) {
echo $s.'<br><br>';
}
foreach($response[3] as $l) {
$link = preg_replace('!(((f|ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i', '$1', $l);
echo $link.'<br><br>';
}
The advantage is that now I can manipulate the arrays as I wish.
You can see it in action here:

Pull data into full calendar using php

I am trying to pull event data into my full calendar using php. I have successfully retrieved the data from the database and converted it to json but I have one problem.
Here is the json outputted:
[
{"id":"53","start":"2013-06-06","title":"Assignment2"},
{"id":"52","start":"2013-06-07","title":"Assignment1"},
{"id":"54","start":"2013-06-08","title":"Assignment3"}
]
So when I want to put it into my fullcalendar i do this:
var class_id = $("#calendar").attr('c_id');
$("#calendar").fullCalendar({
dayClick:function(data){
},
events: '/classes/get_due_dates/'+c_id
});
When I did this nothing was showing up on the calendar but when I copied and pasted the output and removed the quotes surrounding id, start and title it worked fine
Like so:
[
{id:"53",start:"2013-06-06",title:"Assignment2"},
{id:"52",start:"2013-06-07",title:"Assignment1"},
{id:"54",start:"2013-06-08",title:"Assignment3"}
]
Notice the quotes removed, So my question is how do i convert the output that I am getting and remove those quotes so I can display these events on my calendar?
Thanks!

Try this if it is what you are looking for.
<?php
$str = '[
{"id":"53","start":"2013-06-06","title":"Assignment2"},
{"id":"52","start":"2013-06-07","title":"Assignment1"},
{"id":"54","start":"2013-06-08","title":"Assignment3"}
]';
$str = str_replace('{"', '{', $str);
$str = str_replace('":', ':', $str);
$str = str_replace(',"', ',', $str);
echo "<pre>" . $str . "</pre>";
?>

PHP line parsed into separate objects

I have a line of code in my wordpress widget that outputs from an RSS feed:
<?php echo $entry->title ?>
and when displayed it looks like:
$220,000 :: 504 Freemason St, Unit 2B, Norfolk VA, 23510
or
$274,900 :: 1268 Bells Road, Virginia Beach VA, 23454
What is the easiest way to break this up into different objects?
For example, I'd like to have the price, street name, and city state zip in different objects. The problem is that some of the addresses have unit numbers and it's complicating things. Below is an example of how I would like it to work:
<?php echo $entry->price ?>
<?php echo $entry->street ?>
<?php echo $entry->citystatezip ?>
$220,000
504 Freemason St, Unit 2B
Norfolk VA, 23510
or
$274,900
1268 Bells Road
Virginia Beach VA, 23454

Here is a very crude regex that seems able to parse your string. I'm not the best with regexes, but it seems to work.
/^(\$(?:\d{1,3},?)*) :: (\d* [\w\s,\d]*), ([\w\s]* \w{2}, \d{5})$/
Use this with preg_match; the 1st group is the price, the 2nd is the address, and 3rd is the city/state/zip.
Example:
<?php
$ptn = '/^(\$(?:\d{1,3},?)*) :: (\d* [\w\s,\d]*), ([\w\s]* \w{2}, \d{5})$/';
if(preg_match($ptn, $entry->title, $match) === 1){
$price = $match[1];
$street = $match[2];
$citystatezip = $match[3];
}

What you need is a regular expression , check http://php.net/manual/en/function.preg-match.php

Use f.e. array explode ( string $delimiter , string $string [, int $limit ] ) which will give you array with strings if you use correct delimiter

The code below will fill your $entry object as required:
$string = '$274,900 :: 1268 Bells Road, Virginia Beach VA, 23454';
$pricePart = explode('::', $string);
$addressPart = explode(',', $pricePart[1]);
$entry = new stdClass();
$entry->price = trim($pricePart[0]);
if ( count($addressPart) == 3 ) {
$entry->street = trim($addressPart[0]);
$entry->citystatezip = trim($addressPart[1]) . ', ' . trim($addressPart[2]);
} else {
$entry->street = trim($addressPart[0]) . ', ' . trim($addressPart[1]);
$entry->citystatezip = trim($addressPart[2]) . ', ' . trim($addressPart[3]);
}
Updated answer to handle the unit bit
Update: changed array names, I hate $array.. names.. even if its just a mockup
(Note: this code isn't the prettiest, but its ment to give a base to work on. It should be cleaned up and improved a bit)

Getting a specific subset of a string

I'm getting responses such as:
Si. USC01019181 j2log-aGoWZbUSKJWNYALQQEXG-detail This is my response.
Si. RVC000827503 Si.
How would it be possible to get the "Si." part of the response and ignore the rest/duplicates, bearing in mind that the response sentence can be something other than "Si.". My regex is pretty poor, so I'm not sure how to even start with this one. Note that the response above is posted as-received, complete with newlines (there are none) etc.

You can try
$text = array();
$text[] = 'Si. USC01019181 j2log-aGoWZbUSKJWNYALQQEXG-detail This is my response. ';
$text[] = "Cotton eye joe. USC01019181 j2log-vBaIUwYSOJRZTAPCQISG-detail";
foreach ( $text as $msg ) {
echo strstr($msg, '.', true) , "\n";
}
Output
Si
Cotton eye joe

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Parsing Wikipedia API json in PHP - php

Related

Wrapping String PHP

unicode chars with wikipedia search in PHP

Pull data into full calendar using php

PHP line parsed into separate objects

Getting a specific subset of a string

Categories

Resources