Hi, is there a way to download the BibTeX entry for something from Google Scholar using PHP without having to download the BibTeX manually one by one? For example, setting a search value like "research" and then downloading the related BibTeX from the links automatically through code.
Any help would be appreciated. I tried to get the HTML page, but as I try to get the page contents the "Import to BibTeX" link disappears on the retrieved page contents.
My code:
<?php
$url = 'http://scholar.google.com/scholar?q=honors+college&hl=en&btnG=Search& amp;as_sdt=1%2C4&as_sdtp=on';
$needle = 'Import into bibtex';
$contents = file_get_contents($url);
echo $contents;
if(strpos($contents, $needle)!== false) {
echo 'found';
} else {
echo 'not found';
}
?>
The short answer is No you cannot do this
Google does not provide API's for search / scholar and uses firm rate-limitation. The problem is that for each BibTex entry you need 2 additional requests (1 for the query, 1 for the 'import link' and a final one to get the actual BibTex entry content)
I wrote a script that scrapes google scholar results and finds the BibTex links and saves the results. However, due to the rate limit is not viable and will get blocked almost instantly.
Code can be viewed here: https://gist.github.com/Tessmore/11099509 and is free of use, but at your own risk.
As Tessmore said - you can't. But you can make it work by using Google Scholar Organic Results API from SerpApi that bypasses quota limits and blocks from search engines so you don't have to think about how to reduce the chance of being blocked.
Example:
Install google-search-results-php package first via composer:
$ composer require serpapi/google-search-results-php:2.0
Code to integrate and full example in the online IDE:
<?php
ini_set("display_errors", 1);
ini_set("display_startup_errors", 1);
error_reporting(E_ALL);
require __DIR__ . "/vendor/autoload.php";
function getResultIds () {
$result_ids = array();
$params = [
"engine" => "google_scholar", // parsing engine
"q" => "biology" // search query
];
$search = new GoogleSearch(getenv("API_KEY"));
$response = $search->get_json($params);
foreach ($response->organic_results as $result) {
// print_r($result->result_id);
array_push($result_ids, $result->result_id);
}
return $result_ids;
}
function getBibtexData () {
$bibtex_data = array();
foreach (getResultIds() as $result_id) {
$params = [
"engine" => "google_scholar_cite", // parsing engine
"q" => $result_id
];
$search = new GoogleSearch(getenv("API_KEY"));
$response = $search->get_json($params);
foreach ($response->links as $result) {
if ($result->name === "BibTeX") {
array_push($bibtex_data, $result->link);
}
}
}
return $bibtex_data;
}
print_r(json_encode(getBibtexData(), JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES));
?>
Output:
[
"https://scholar.googleusercontent.com/scholar.bib?q=info:KNJ0p4CbwgoJ:scholar.google.com/&output=citation&scisdr=CgXjqB_WGAA:AAGBfm0AAAAAYkm8amenawYn_EBidiCQT5QBh0L1KJEX&scisig=AAGBfm0AAAAAYkm8at9X4P3eIWKUCOc6UriCEDKVsQE0&scisf=4&ct=citation&cd=-1&hl=en",
"https://scholar.googleusercontent.com/scholar.bib?q=info:6zRLFbcxtREJ:scholar.google.com/&output=citation&scisdr=CgWhqfi6GAA:AAGBfm0AAAAAYkm8bDoIhTlfTkQFCOzYGax54Bst576o&scisig=AAGBfm0AAAAAYkm8bMe_7Nq4e4pB5lg_eR9jmeGrO8ek&scisf=4&ct=citation&cd=-1&hl=en",
"https://scholar.googleusercontent.com/scholar.bib?q=info:6Yb0qOX88FMJ:scholar.google.com/&output=citation&scisdr=CgXn_4MdGAA:AAGBfm0AAAAAYkm8bi8ypCZcFDNEQZYZeoSlvx-U1OSk&scisig=AAGBfm0AAAAAYkm8bnFMnwTWGfkfJDCNEx0C4n-aQwql&scisf=4&ct=citation&cd=-1&hl=en",
"https://scholar.googleusercontent.com/scholar.bib?q=info:HFdEElNr3IgJ:scholar.google.com/&output=citation&scisdr=CgXKCFpQGAA:AAGBfm0AAAAAYkm8byukcQCl4WHQx-nSNp2pC1gUFSKG&scisig=AAGBfm0AAAAAYkm8b8EReTVkLwtxfth_pjwMyyY3dqts&scisf=4&ct=citation&cd=-1&hl=en",
"https://scholar.googleusercontent.com/scholar.bib?q=info:bs-D_MeC14YJ:scholar.google.com/&output=citation&scisdr=CgXEUXwWGAA:AAGBfm0AAAAAYkm8bwwfMNJrffe16EaGypsem9JlmGTi&scisig=AAGBfm0AAAAAYkm8b6nWlPOQL63fXg6dV2U-JQbpyQyS&scisf=4&ct=citation&cd=-1&hl=en",
"https://scholar.googleusercontent.com/scholar.bib?q=info:Rn1qFVLRfKwJ:scholar.google.com/&output=citation&scisdr=CgU-HswkGAA:AAGBfm0AAAAAYkm8cHE1YRK23eHV8nzF89Eem-Bsuz72&scisig=AAGBfm0AAAAAYkm8cDEj8ZrzZjAo2bNX-tjYYYJYQZay&scisf=4&ct=citation&cd=-1&hl=en",
"https://scholar.googleusercontent.com/scholar.bib?q=info:d8thHtTwq6YJ:scholar.google.com/&output=citation&scisdr=CgXj7oe9GAA:AAGBfm0AAAAAYkm8cTYamCKGKImjdg5MQdgbxUIIHAEY&scisig=AAGBfm0AAAAAYkm8cTcop1ceKzKYvKAKtvlSQ1EdEtSN&scisf=4&ct=citation&cd=-1&hl=en",
"https://scholar.googleusercontent.com/scholar.bib?q=info:IUmhOhGaDaEJ:scholar.google.com/&output=citation&scisdr=CgU0qZ2_GAA:AAGBfm0AAAAAYkm8ctCPwoihZkjbNcdEqSnwa0J3jwDy&scisig=AAGBfm0AAAAAYkm8cingBcYnEp8YRqFDFdN-FAEBgDT7&scisf=4&ct=citation&cd=-1&hl=en",
"https://scholar.googleusercontent.com/scholar.bib?q=info:PWsf8O5OMQEJ:scholar.google.com/&output=citation&scisdr=CgVBAJxXGAA:AAGBfm0AAAAAYkm8c3CDKQG0Wh_lWsXU_DZxEJkwZz5y&scisig=AAGBfm0AAAAAYkm8c6I-HjAxD1Gy6FLFDRdxH_qU4OBr&scisf=4&ct=citation&cd=-1&hl=en",
"https://scholar.googleusercontent.com/scholar.bib?q=info:yGvgHH8ROuIJ:scholar.google.com/&output=citation&scisdr=CgXFuhOkGAA:AAGBfm0AAAAAYkm8dD0rcSR4LQF8GgTxx865BADtXNDN&scisig=AAGBfm0AAAAAYkm8dIQhodz3rHF9IUdaCSRlhdudACNQ&scisf=4&ct=citation&cd=-1&hl=en"
]
Bibtex data from the first URL:
#article{woese2004new,
title={A new biology for a new century},
author={Woese, Carl R},
journal={Microbiology and molecular biology reviews},
volume={68},
number={2},
pages={173--186},
year={2004},
publisher={Am Soc Microbiol}
}
Disclaimer, I work for SerpApi.
Related
I'm using the ZohoCRM PHP SDK to attempt to pull all Account records from the CRM and manipulate them locally (do some reports). The basic code looks like this, which works fine:
$account_module = ZCRMModule::getInstance('Accounts');
$response = $account_module->getRecords();
$records = $response->getData();
foreach ($records as $record) {
// do stuff
}
The problem is that the $records object only has 200 records (out of about 3000 total). I can't find any docs in the (minimally / poorly documented) SDK documentation showing how to paginate or get bigger result sets, and the Zoho code samples in the dev site don't seem to be using the same SDK for some reason.
Does anyone know how I can paginate through these records?
The getRecords() method seems to accept 2 parameters. This is used within some of their examples. You should be able to use those params to set/control pagination.
$param_map = ["page" => "20", "per_page" => "200"];
$response = $account_module->getRecords($param_map);
#dakdad was right that you can pass in the page and per page values into the param_map. You also should use the $response->getInfo()->getMoreRecords() to determine if you need to paginate. Something like this seems to work:
$account_module = ZCRMModule::getInstance('Accounts');
$page = 1;
$has_more = true;
while ($has_more) {
$param_map = ["page" => $page, "per_page" => "200"];
$response = $account_module->getRecords($param_map);
$has_more = $response->getInfo()->getMoreRecords();
$records = $response->getData();
foreach ($records as $record) {
// do stuff
}
$page++;
}
The issue is this:
I have a web application that runs on a PHP server. I'd like to build a REST api for it.
I did some research and I figured out that REST api uses HTTP methods (GET, POST...) for certain URI's with an authentication key (not necessarily) and the information is presented back as a HTTP response with the info as XML or JSON (I'd rather JSON).
My question is:
How do I, as the developer of the app, build those URI's? Do I need to write a PHP code at that URI?
How do I build the JSON objects to return as a response?
Here is a very simply example in simple php.
There are 2 files client.php & api.php. I put both files on the same url : http://localhost:8888/, so you will have to change the link to your own url. (the file can be on two different servers).
This is just an example, it's very quick and dirty, plus it has been a long time since I've done php. But this is the idea of an api.
client.php
<?php
/*** this is the client ***/
if (isset($_GET["action"]) && isset($_GET["id"]) && $_GET["action"] == "get_user") // if the get parameter action is get_user and if the id is set, call the api to get the user information
{
$user_info = file_get_contents('http://localhost:8888/api.php?action=get_user&id=' . $_GET["id"]);
$user_info = json_decode($user_info, true);
// THAT IS VERY QUICK AND DIRTY !!!!!
?>
<table>
<tr>
<td>Name: </td><td> <?php echo $user_info["last_name"] ?></td>
</tr>
<tr>
<td>First Name: </td><td> <?php echo $user_info["first_name"] ?></td>
</tr>
<tr>
<td>Age: </td><td> <?php echo $user_info["age"] ?></td>
</tr>
</table>
Return to the user list
<?php
}
else // else take the user list
{
$user_list = file_get_contents('http://localhost:8888/api.php?action=get_user_list');
$user_list = json_decode($user_list, true);
// THAT IS VERY QUICK AND DIRTY !!!!!
?>
<ul>
<?php foreach ($user_list as $user): ?>
<li>
<a href=<?php echo "http://localhost:8888/client.php?action=get_user&id=" . $user["id"] ?> alt=<?php echo "user_" . $user_["id"] ?>><?php echo $user["name"] ?></a>
</li>
<?php endforeach; ?>
</ul>
<?php
}
?>
api.php
<?php
// This is the API to possibility show the user list, and show a specific user by action.
function get_user_by_id($id)
{
$user_info = array();
// make a call in db.
switch ($id){
case 1:
$user_info = array("first_name" => "Marc", "last_name" => "Simon", "age" => 21); // let's say first_name, last_name, age
break;
case 2:
$user_info = array("first_name" => "Frederic", "last_name" => "Zannetie", "age" => 24);
break;
case 3:
$user_info = array("first_name" => "Laure", "last_name" => "Carbonnel", "age" => 45);
break;
}
return $user_info;
}
function get_user_list()
{
$user_list = array(array("id" => 1, "name" => "Simon"), array("id" => 2, "name" => "Zannetie"), array("id" => 3, "name" => "Carbonnel")); // call in db, here I make a list of 3 users.
return $user_list;
}
$possible_url = array("get_user_list", "get_user");
$value = "An error has occurred";
if (isset($_GET["action"]) && in_array($_GET["action"], $possible_url))
{
switch ($_GET["action"])
{
case "get_user_list":
$value = get_user_list();
break;
case "get_user":
if (isset($_GET["id"]))
$value = get_user_by_id($_GET["id"]);
else
$value = "Missing argument";
break;
}
}
exit(json_encode($value));
?>
I didn't make any call to the database for this example, but normally that is what you should do. You should also replace the "file_get_contents" function by "curl".
In 2013, you should use something like Silex or Slim
Silex example:
require_once __DIR__.'/../vendor/autoload.php';
$app = new Silex\Application();
$app->get('/hello/{name}', function($name) use($app) {
return 'Hello '.$app->escape($name);
});
$app->run();
Slim example:
$app = new \Slim\Slim();
$app->get('/hello/:name', function ($name) {
echo "Hello, $name";
});
$app->run();
That is pretty much the same as created a normal website.
Normal pattern for a php website is:
The user enter a url
The server get the url, parse it and execute a action
In this action, you get/generate every information you need for the page
You create the html/php page with the info from the action
The server generate a fully html page and send it back to the user
With a api, you just add a new step between 3 and 4. After 3, create a array with all information you need. Encode this array in json and exit or return this value.
$info = array("info_1" => 1; "info_2" => "info_2" ... "info_n" => array(1,2,3));
exit(json_encode($info));
That all for the api.
For the client side, you can call the api by the url. If the api work only with get call, I think it's possible to do a simply (To check, I normally use curl).
$info = file_get_contents(url);
$info = json_decode($info);
But it's more common to use the curl library to perform get and post call.
You can ask me if you need help with curl.
Once the get the info from the api, you can do the 4 & 5 steps.
Look the php doc for json function and file_get_contents.
curl : http://fr.php.net/manual/fr/ref.curl.php
EDIT
No, wait, I don't get it. "php API page" what do you mean by that ?
The api is only the creation/recuperation of your project. You NEVER send directly the html result (if you're making a website) throw a api. You call the api with the url, the api return information, you use this information to create the final result.
ex: you want to write a html page who say hello xxx. But to get the name of the user, you have to get the info from the api.
So let's say your api have a function who have user_id as argument and return the name of this user (let's say getUserNameById(user_id)), and you call this function only on a url like your/api/ulr/getUser/id.
Function getUserNameById(user_id)
{
$userName = // call in db to get the user
exit(json_encode($userName)); // maybe return work as well.
}
From the client side you do
$username = file_get_contents(your/api/url/getUser/15); // You should normally use curl, but it simpler for the example
// So this function to this specifique url will call the api, and trigger the getUserNameById(user_id), whom give you the user name.
<html>
<body>
<p>hello <?php echo $username ?> </p>
</body>
</html>
So the client never access directly the databases, that the api's role.
Is that clearer ?
(1) How do I ... build those URI's? Do I need to write a PHP code at that URI?
There is no standard for how an API URI scheme should be set up, but it's common to have slash-separated values. For this you can use...
$apiArgArray = explode("/", substr(#$_SERVER['PATH_INFO'], 1));
...to get an array of slash-separated values in the URI after the file name.
Example: Assuming you have an API file api.php in your application somewhere and you do a request for api.php/members/3, then $apiArgArray will be an array containing ['members', '3']. You can then use those values to query your database or do other processing.
(2) How do I build the JSON objects to return as a response?
You can take any PHP object and turn it into JSON with json_encode. You'll also want to set the appropriate header.
header('Content-Type: application/json');
$myObject = (object) array( 'property' => 'value' ); // example
echo json_encode($myObject); // outputs JSON text
All this is good for an API that returns JSON, but the next question you should ask is:
(3) How do I make my API RESTful?
For that we'll use $_SERVER['REQUEST_METHOD'] to get the method being used, and then do different things based on that. So the final result is something like...
header('Content-Type: application/json');
$apiArgArray = explode("/", substr(#$_SERVER['PATH_INFO'], 1));
$returnObject = (object) array();
/* Based on the method, use the arguments to figure out
whether you're working with an individual or a collection,
then do your processing, and ultimately set $returnObject */
switch ($_SERVER['REQUEST_METHOD']) {
case 'GET':
// List entire collection or retrieve individual member
break;
case 'PUT':
// Replace entire collection or member
break;
case 'POST':
// Create new member
break;
case 'DELETE':
// Delete collection or member
break;
}
echo json_encode($returnObject);
Sources: https://stackoverflow.com/a/897311/1766230 and http://en.wikipedia.org/wiki/Representational_state_transfer#Applied_to_web_services
Another framework which has not been mentioned so far is Laravel. It's great for building PHP apps in general but thanks to the great router it's really comfortable and simple to build rich APIs. It might not be that slim as Slim or Sliex but it gives you a solid structure.
See Aaron Kuzemchak - Simple API Development With Laravel on YouTube and
Laravel 4: A Start at a RESTful API on NetTuts+
I know that this question is accepted and has a bit of age but this might be helpful for some people who still find it relevant. Although the outcome is not a full RESTful API the API Builder mini lib for PHP allows you to easily transform MySQL databases into web accessible JSON APIs.
As simon marc said, the process is much the same as it is for you or I browsing a website. If you are comfortable with using the Zend framework, there are some easy to follow tutorials to that make life quite easy to set things up. The hardest part of building a restful api is the design of the it, and making it truly restful, think CRUD in database terms.
It could be that you really want an xmlrpc interface or something else similar. What do you want this interface to allow you to do?
--EDIT
Here is where I got started with restful api and Zend Framework.
Zend Framework Example
In short don't use Zend rest server, it's obsolete.
Using the following code I am able to get the logs of calls and SMS's. How do I modify this code to only search between certain dates using PHP?
// Instantiate a new Twilio Rest Client
$client = new Services_Twilio($AccountSid, $AuthToken, $ApiVersion);
// http://www.twilio.com/docs/quickstart...
try {
// Get Recent Calls
foreach ($client->account->calls as $call) {
echo "Call from $call->sid : $call->from to $call->to at $call->start_time of length $call->duration $call->price <br>";
}
}
catch (Exception $e) {
echo 'Error: ' . $e->getMessage();
}
You will want to add a code snippet that looks something like this:
$client = new Services_Twilio('AC123', '123');
foreach ($client->account->calls->getIterator(0, 50, array(
'StartTime>' => '2012-04-01',
'StartTime<' => '2012-05-01'
)) as $call) {
echo "From: {$call->from}\nTo: {$call->to}\nSid: {$call->sid}\n\n";
}
If you want to filter the list, you have to construct the iterator yourself with the getIterator command. There's more documentation here: Filtering Twilio Calls with PHP
User search terms StartTime> and StartTime< for this. First one means call start time is greater than and last one means call start time is less than.
To find every calls that started between 4th and 6th July of 2009 add search term
array(
'StartTime>' => '2009-07-04',
'StartTime<' => '2009-07-06'
)
See example 4 on the twilio doc.
Also note you can always ask twilio support. They usually help gladly.
Is there any way I can access the thumbnail picture of any wikipedia page by using an API? I mean the image on the top right side in box. Is there any APIs for that?
You can get the thumbnail of any wikipedia page using prop=pageimages. For example:
http://en.wikipedia.org/w/api.php?action=query&titles=Al-Farabi&prop=pageimages&format=json&pithumbsize=100
And you will get the thumbnail full URL.
http://en.wikipedia.org/w/api.php
Look at prop=images.
It returns an array of image filenames that are used in the parsed page. You then have the option of making another API call to find out the full image URL, e.g.:
action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url
or to calculate the URL via the filename's hash.
Unfortunately, while the array of images returned by prop=images is in the order they are found on the page, the first can not be guaranteed to be the image in the info box because sometimes a page will include an image before the infobox (most of the time icons for metadata about the page: e.g. "this article is locked").
Searching the array of images for the first image that includes the page title is probably the best guess for the infobox image.
This is good way to get the Main Image of a page in wikipedia
http://en.wikipedia.org/w/api.php?action=query&prop=pageimages&format=json&piprop=original&titles=India
Check out the MediaWiki API example for getting the main picture of a wikipedia page: https://www.mediawiki.org/wiki/API:Page_info_in_search_results.
As other's have mentioned, you would use prop=pageimages in your API query.
If you also want the image description, you would use prop=pageimages|pageterms instead in your API query.
You can get the original image using piprop=original. Or you can get a thumbnail image with a specified width/height. For a thumbnail with width/height=600, piprop=thumbnail&pithumbsize=600. If you omit either, the image returned in the API callback will default to a thumbnail with width/height of 50px.
If you are requesting results in JSON format, you should always use formatversion=2 in your API query (i.e., format=json&formatversion=2) because it makes retrieving the image from the query easier.
Original Size Image:
https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=original&titles=Albert Einstein
Thumbnail Size (600px width/height) Image:
https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages|pageterms&piprop=thumbnail&pithumbsize=600&titles=Albert Einstein
Way 1: You can try some query like this:
http://en.wikipedia.org/w/api.php?action=opensearch&limit=5&format=xml&search=italy&namespace=0
in the response, you can see the Image tag.
<Item>
<Text xml:space="preserve">Italy national rugby union team</Text>
<Description xml:space="preserve">
The Italy national rugby union team represent the nation of Italy in the sport of rugby union.
</Description>
<Url xml:space="preserve">
http://en.wikipedia.org/wiki/Italy_national_rugby_union_team
</Url>
<Image source="http://upload.wikimedia.org/wikipedia/en/thumb/4/46/Italy_rugby.png/43px-Italy_rugby.png" width="43" height="50"/>
</Item>
Way 2: use query http://en.wikipedia.org/w/index.php?action=render&title=italy
then you can get a raw html code, you can get the image use something like PHP Simple HTML DOM Parser
http://simplehtmldom.sourceforge.net
I have no time write it to you. just give you some advice, thanks.
I'm sorry for not answering specifically your question about the main image. But here's some code to get a list of all images:
function makeCall($url) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
return curl_exec($curl);
}
function wikipediaImageUrls($url) {
$imageUrls = array();
$pathComponents = explode('/', parse_url($url, PHP_URL_PATH));
$pageTitle = array_pop($pathComponents);
$imagesQuery = "http://en.wikipedia.org/w/api.php?action=query&titles={$pageTitle}&prop=images&format=json";
$jsonResponse = makeCall($imagesQuery);
$response = json_decode($jsonResponse, true);
$imagesKey = key($response['query']['pages']);
foreach($response['query']['pages'][$imagesKey]['images'] as $imageArray) {
if($imageArray['title'] != 'File:Commons-logo.svg' && $imageArray['title'] != 'File:P vip.svg') {
$title = str_replace('File:', '', $imageArray['title']);
$title = str_replace(' ', '_', $title);
$imageUrlQuery = "http://en.wikipedia.org/w/api.php?action=query&titles=Image:{$title}&prop=imageinfo&iiprop=url&format=json";
$jsonUrlQuery = makeCall($imageUrlQuery);
$urlResponse = json_decode($jsonUrlQuery, true);
$imageKey = key($urlResponse['query']['pages']);
$imageUrls[] = $urlResponse['query']['pages'][$imageKey]['imageinfo'][0]['url'];
}
}
return $imageUrls;
}
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Saturn_%28mythology%29'));
print_r(wikipediaImageUrls('http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel'));
I got this for http://en.wikipedia.org/wiki/Saturn_%28mythology%29:
Array
(
[0] => http://upload.wikimedia.org/wikipedia/commons/1/10/Arch_of_SeptimiusSeverus.jpg
[1] => http://upload.wikimedia.org/wikipedia/commons/8/81/Ivan_Akimov_Saturn_.jpg
[2] => http://upload.wikimedia.org/wikipedia/commons/d/d7/Lucius_Appuleius_Saturninus.jpg
[3] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Polidoro_da_Caravaggio_-_Saturnus-thumb.jpg
[4] => http://upload.wikimedia.org/wikipedia/commons/b/bd/Porta_Maggiore_Alatri.jpg
[5] => http://upload.wikimedia.org/wikipedia/commons/6/6a/She-wolf_suckles_Romulus_and_Remus.jpg
[6] => http://upload.wikimedia.org/wikipedia/commons/4/45/Throne_of_Saturn_Louvre_Ma1662.jpg
)
And for the second URL (http://en.wikipedia.org/wiki/Hans-Ulrich_Rudel):
Array
(
[0] => http://upload.wikimedia.org/wikipedia/commons/e/e9/BmRKEL.jpg
[1] => http://upload.wikimedia.org/wikipedia/commons/3/3f/BmRKELS.jpg
[2] => http://upload.wikimedia.org/wikipedia/commons/2/2c/Bundesarchiv_Bild_101I-655-5976-04%2C_Russland%2C_Sturzkampfbomber_Junkers_Ju_87_G.jpg
[3] => http://upload.wikimedia.org/wikipedia/commons/6/62/Bundeswehr_Kreuz_Black.svg
[4] => http://upload.wikimedia.org/wikipedia/commons/9/99/Flag_of_German_Reich_%281935%E2%80%931945%29.svg
[5] => http://upload.wikimedia.org/wikipedia/en/6/64/HansUlrichRudel.jpeg
[6] => http://upload.wikimedia.org/wikipedia/commons/8/82/Heinkel_He_111_during_the_Battle_of_Britain.jpg
[7] => http://upload.wikimedia.org/wikipedia/commons/6/66/Regulation_WW_II_Underwing_Balkenkreuz.png
)
Note that the URL changed a bit on the 6th element of the second array. It's what #JosephJaber was warning about in his comment above.
Hope this helps someone.
I have written some code that gets main image (full URL) by Wikipedia article title. It's not perfect, but overall I'm very pleased with the results.
The challenge was that when queried for a specific title, Wikipedia returns multiple image filenames (without path). Furthermore, the secondary search (I used the code varatis posted in this thread - thanks!) returns URLs of all images found based on the image filename that was searched, regardless of the original article title. After all this, we may end up with a generic image irrelevant to the search, so we filter those out. The code iterates over filenames and URLs until it finds (hopefully the best) match... a bit complicated, but it works :)
Note on the generic filter: I've been compiling a list of generic image strings for the isGeneric() function, but the list just keeps growing. I am considering maintaining it as a public list - if there is any interest let me know.
Pre:
protected static $baseurl = "http://en.wikipedia.org/w/api.php";
Main function - get image URL from title:
public static function getImageURL($title)
{
$images = self::getImageFilenameObj($title); // returns JSON object
if (!$images) return '';
foreach ($images as $image)
{
// get object of image URL for given filename
$imgjson = self::getFileURLObj($image->title);
// return first image match
foreach ($imgjson as $img)
{
// get URL for image
$url = $img->imageinfo[0]->url;
// no image found
if (!$url) continue;
// filter generic images
if (self::isGeneric($url)) continue;
// match found
return $url;
}
}
// match not found
return '';
}
== The following functions are called by the main function above ==
Get JSON object (filenames) by title:
public static function getImageFilenameObj($title)
{
try // see if page has images
{
// get image file name
$json = json_decode(
self::retrieveInfo(
self::$baseurl . '?action=query&titles=' .
urlencode($title) . '&prop=images&format=json'
))->query->pages;
/** The foreach is only to get around
* the fact that we don't have the id.
*/
foreach ($json as $id) { return $id->images; }
}
catch(exception $e) // no images
{
return NULL;
}
}
Get JSON object (URLs) by filename:
public static function getFileURLObj($filename)
{
try // resolve URL from filename
{
return json_decode(
self::retrieveInfo(
self::$baseurl . '?action=query&titles=' .
urlencode($filename) . '&prop=imageinfo&iiprop=url&format=json'
))->query->pages;
}
catch(exception $e) // no URLs
{
return NULL;
}
}
Filter out generic images:
public static function isGeneric($url)
{
$generic_strings = array(
'_gray.svg',
'icon',
'Commons-logo.svg',
'Ambox',
'Text_document_with_red_question_mark.svg',
'Question_book-new.svg',
'Canadese_kano',
'Wiki_letter_',
'Edit-clear.svg',
'WPanthroponymy',
'Compass_rose_pale',
'Us-actor.svg',
'voting_box',
'Crystal_',
'transportation_inv',
'arrow.svg',
'Quill_and_ink-US.svg',
'Decrease2.svg',
'Rating-',
'template',
'Nuvola_apps_',
'Mergefrom.svg',
'Portal-',
'Translation_to_',
'/School.svg',
'arrow',
'Symbol_',
'stub',
'Unbalanced_scales.svg',
'-logo.',
'P_vip.svg',
'Books-aj.svg_aj_ashton_01.svg',
'Film',
'/Gnome-',
'cap.svg',
'Missing',
'silhouette',
'Star_empty.svg',
'Music_film_clapperboard.svg',
'IPA_Unicode',
'symbol',
'_highlighting_',
'pictogram',
'Red_pog.svg',
'_medal_with_cup',
'_balloon',
'Feature',
'Aiga_'
);
foreach ($generic_strings as $str)
{
if (stripos($url, $str) !== false) return true;
}
return false;
}
Comments welcome.
Lets take Example of Page http://en.wikipedia.org/wiki/index.html?curid=57570
to get Main Pic
Check out
prop=pageprops
action=query&pageids=57570&prop=pageprops&format=json
Results Page Data Eg.
{ "pages" : { "57570":{
"pageid":57570,
"ns":0,
"title":"Sachin Tendulkar",
"pageprops" : {
"defaultsort":"Tendulkar,Sachin",
"page_image":"Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg",
"wikibase_item":"Q9488"
}
}
}
}}
We get main Pic file name this result as
** (wikiId).pageprops.page_image = Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg**
Now as we have Image file name we will have to make another Api Call to get full image path from file name as follows
action=query&titles=Image:INSERT_EXAMPLE_FILE_NAME_HERE.jpg&prop=imageinfo&iiprop=url
Eg.
action=query&titles=Image:Sachin_at_Castrol_Golden_Spanner_Awards_(crop).jpg&prop=imageinfo&iiprop=url
Returns Array of Image Data having url in it as
http://upload.wikimedia.org/wikipedia/commons/3/35/Sachin_at_Castrol_Golden_Spanner_Awards_%28crop%29.jpg
I there is a way to reliably get a main image for a wikipedia page - the Extension called PageImages
The PageImages extension collects information about images used on a page.
Its aim is to return the single most appropriate thumbnail associated
with an article, attempting to return only meaningful images, e.g. not
those from maintenance templates, stubs or flag icons. Currently it
uses the first non-meaningless image used in the page.
https://www.mediawiki.org/wiki/Extension:PageImages
Just add the prop pageimages to your API Query:
/w/api.php?action=query&prop=pageimages&titles=Somepage&format=xml
This reliably filters out annoying default images and prevents you from having to filter them yourself! The extension is installed on all the main wikipedia pages...
Like Anuraj mentioned, the pageimages parameter is it. Look at the following url that'll bring about some nifty stuff:
https://en.wikipedia.org/w/api.php?action=query&prop=info|extracts|pageimages|images&inprop=url&exsentences=1&titles=india
Her are some interesting parameters:
The two parameters extracts and exsentences gives you a short
description you can use. (exsentences is the number of sentences you want to include in the excerpt)
The info and the inprop=url parameters gives you the url of the page
The prop property has multiple parameters separated by a bar symbol
And if you insert the format=json in there, it is even better
See this related question on an API for Wikipedia. However, I would not know if it is possible to retrieve the thumbnail picture through an API.
You can also consider just parsing the web page to find the image URL, and retrieve the image that way.
Here is my list of XPaths I have found work for 95 percent of articles. the main ones are 1, 2 3 and 4. A lot of articles are not formatted correctly and these would be edge cases:
You can use a DOM parsing lib to fetch image using the XPath.
static NSString *kWikipediaImageXPath2 = #"//*[#id=\"mw-content-text\"]/div[1]/div/table/tr[2]/td/a/img";
static NSString *kWikipediaImageXPath3 = #"//*[#id=\"mw-content-text\"]/div[1]/table/tr[1]/td/a/img";
static NSString *kWikipediaImageXPath1 = #"//*[#id=\"mw-content-text\"]/div[1]/table/tr[2]/td/a/img";
static NSString *kWikipediaImageXPath4 = #"//*[#id=\"mw-content-text\"]/div[2]/table/tr[2]/td/a/img";
static NSString *kWikipediaImageXPath5 = #"//*[#id=\"mw-content-text\"]/div[1]/table/tr[2]/td/p/a/img";
static NSString *kWikipediaImageXPath6 = #"//*[#id=\"mw-content-text\"]/div[1]/table/tr[2]/td/div/div/a/img";
static NSString *kWikipediaImageXPath7 = #"//*[#id=\"mw-content-text\"]/div[1]/table/tr[1]/td/div/div/a/img";
I used a ObjC wrapper called Hpple around libxml2.2 to pull out the image url. Hope this helps
You can also use cocoa Pod called SDWebImage
Code sample (remember to also add import SDWebImage):
func requestInfo(flowerName: String) {
let parameters : [String:String] = [
"format" : "json",
"action" : "query",
"prop" : "extracts|pageimages",//pageimages allows fetch imagePath
"exintro" : "",
"explaintext" : "",
"titles" : flowerName,
"indexpageids" : "",
"redirects" : "1",
"pithumbsize" : "500"//specify image size in px
]
AF.request(wikipediaURL, method: .get, parameters: parameters).responseJSON { (response) in
switch response.result {
case .success(let value):
print("Got the wikipedia info.")
print(response)
let flowerJSON : JSON = JSON(response.value!)
let pageid = flowerJSON["query"]["pageids"][0].stringValue
let flowerDescription = flowerJSON["query"]["pages"][pageid]["extract"].stringValue
let flowerImageURL = flowerJSON["query"]["pages"][pageid]["thumbnail"]["source"].stringValue //fetching Image URL
self.wikiInfoLabel.text = flowerDescription
self.imageView.sd_setImage(with: URL(string : flowerImageURL))//imageView updated with Wiki Image
case .failure(let error):
print(error)
}
}
}
I think not, but you can capture the image using a link parser HTML documents
The issue is this:
I have a web application that runs on a PHP server. I'd like to build a REST api for it.
I did some research and I figured out that REST api uses HTTP methods (GET, POST...) for certain URI's with an authentication key (not necessarily) and the information is presented back as a HTTP response with the info as XML or JSON (I'd rather JSON).
My question is:
How do I, as the developer of the app, build those URI's? Do I need to write a PHP code at that URI?
How do I build the JSON objects to return as a response?
Here is a very simply example in simple php.
There are 2 files client.php & api.php. I put both files on the same url : http://localhost:8888/, so you will have to change the link to your own url. (the file can be on two different servers).
This is just an example, it's very quick and dirty, plus it has been a long time since I've done php. But this is the idea of an api.
client.php
<?php
/*** this is the client ***/
if (isset($_GET["action"]) && isset($_GET["id"]) && $_GET["action"] == "get_user") // if the get parameter action is get_user and if the id is set, call the api to get the user information
{
$user_info = file_get_contents('http://localhost:8888/api.php?action=get_user&id=' . $_GET["id"]);
$user_info = json_decode($user_info, true);
// THAT IS VERY QUICK AND DIRTY !!!!!
?>
<table>
<tr>
<td>Name: </td><td> <?php echo $user_info["last_name"] ?></td>
</tr>
<tr>
<td>First Name: </td><td> <?php echo $user_info["first_name"] ?></td>
</tr>
<tr>
<td>Age: </td><td> <?php echo $user_info["age"] ?></td>
</tr>
</table>
Return to the user list
<?php
}
else // else take the user list
{
$user_list = file_get_contents('http://localhost:8888/api.php?action=get_user_list');
$user_list = json_decode($user_list, true);
// THAT IS VERY QUICK AND DIRTY !!!!!
?>
<ul>
<?php foreach ($user_list as $user): ?>
<li>
<a href=<?php echo "http://localhost:8888/client.php?action=get_user&id=" . $user["id"] ?> alt=<?php echo "user_" . $user_["id"] ?>><?php echo $user["name"] ?></a>
</li>
<?php endforeach; ?>
</ul>
<?php
}
?>
api.php
<?php
// This is the API to possibility show the user list, and show a specific user by action.
function get_user_by_id($id)
{
$user_info = array();
// make a call in db.
switch ($id){
case 1:
$user_info = array("first_name" => "Marc", "last_name" => "Simon", "age" => 21); // let's say first_name, last_name, age
break;
case 2:
$user_info = array("first_name" => "Frederic", "last_name" => "Zannetie", "age" => 24);
break;
case 3:
$user_info = array("first_name" => "Laure", "last_name" => "Carbonnel", "age" => 45);
break;
}
return $user_info;
}
function get_user_list()
{
$user_list = array(array("id" => 1, "name" => "Simon"), array("id" => 2, "name" => "Zannetie"), array("id" => 3, "name" => "Carbonnel")); // call in db, here I make a list of 3 users.
return $user_list;
}
$possible_url = array("get_user_list", "get_user");
$value = "An error has occurred";
if (isset($_GET["action"]) && in_array($_GET["action"], $possible_url))
{
switch ($_GET["action"])
{
case "get_user_list":
$value = get_user_list();
break;
case "get_user":
if (isset($_GET["id"]))
$value = get_user_by_id($_GET["id"]);
else
$value = "Missing argument";
break;
}
}
exit(json_encode($value));
?>
I didn't make any call to the database for this example, but normally that is what you should do. You should also replace the "file_get_contents" function by "curl".
In 2013, you should use something like Silex or Slim
Silex example:
require_once __DIR__.'/../vendor/autoload.php';
$app = new Silex\Application();
$app->get('/hello/{name}', function($name) use($app) {
return 'Hello '.$app->escape($name);
});
$app->run();
Slim example:
$app = new \Slim\Slim();
$app->get('/hello/:name', function ($name) {
echo "Hello, $name";
});
$app->run();
That is pretty much the same as created a normal website.
Normal pattern for a php website is:
The user enter a url
The server get the url, parse it and execute a action
In this action, you get/generate every information you need for the page
You create the html/php page with the info from the action
The server generate a fully html page and send it back to the user
With a api, you just add a new step between 3 and 4. After 3, create a array with all information you need. Encode this array in json and exit or return this value.
$info = array("info_1" => 1; "info_2" => "info_2" ... "info_n" => array(1,2,3));
exit(json_encode($info));
That all for the api.
For the client side, you can call the api by the url. If the api work only with get call, I think it's possible to do a simply (To check, I normally use curl).
$info = file_get_contents(url);
$info = json_decode($info);
But it's more common to use the curl library to perform get and post call.
You can ask me if you need help with curl.
Once the get the info from the api, you can do the 4 & 5 steps.
Look the php doc for json function and file_get_contents.
curl : http://fr.php.net/manual/fr/ref.curl.php
EDIT
No, wait, I don't get it. "php API page" what do you mean by that ?
The api is only the creation/recuperation of your project. You NEVER send directly the html result (if you're making a website) throw a api. You call the api with the url, the api return information, you use this information to create the final result.
ex: you want to write a html page who say hello xxx. But to get the name of the user, you have to get the info from the api.
So let's say your api have a function who have user_id as argument and return the name of this user (let's say getUserNameById(user_id)), and you call this function only on a url like your/api/ulr/getUser/id.
Function getUserNameById(user_id)
{
$userName = // call in db to get the user
exit(json_encode($userName)); // maybe return work as well.
}
From the client side you do
$username = file_get_contents(your/api/url/getUser/15); // You should normally use curl, but it simpler for the example
// So this function to this specifique url will call the api, and trigger the getUserNameById(user_id), whom give you the user name.
<html>
<body>
<p>hello <?php echo $username ?> </p>
</body>
</html>
So the client never access directly the databases, that the api's role.
Is that clearer ?
(1) How do I ... build those URI's? Do I need to write a PHP code at that URI?
There is no standard for how an API URI scheme should be set up, but it's common to have slash-separated values. For this you can use...
$apiArgArray = explode("/", substr(#$_SERVER['PATH_INFO'], 1));
...to get an array of slash-separated values in the URI after the file name.
Example: Assuming you have an API file api.php in your application somewhere and you do a request for api.php/members/3, then $apiArgArray will be an array containing ['members', '3']. You can then use those values to query your database or do other processing.
(2) How do I build the JSON objects to return as a response?
You can take any PHP object and turn it into JSON with json_encode. You'll also want to set the appropriate header.
header('Content-Type: application/json');
$myObject = (object) array( 'property' => 'value' ); // example
echo json_encode($myObject); // outputs JSON text
All this is good for an API that returns JSON, but the next question you should ask is:
(3) How do I make my API RESTful?
For that we'll use $_SERVER['REQUEST_METHOD'] to get the method being used, and then do different things based on that. So the final result is something like...
header('Content-Type: application/json');
$apiArgArray = explode("/", substr(#$_SERVER['PATH_INFO'], 1));
$returnObject = (object) array();
/* Based on the method, use the arguments to figure out
whether you're working with an individual or a collection,
then do your processing, and ultimately set $returnObject */
switch ($_SERVER['REQUEST_METHOD']) {
case 'GET':
// List entire collection or retrieve individual member
break;
case 'PUT':
// Replace entire collection or member
break;
case 'POST':
// Create new member
break;
case 'DELETE':
// Delete collection or member
break;
}
echo json_encode($returnObject);
Sources: https://stackoverflow.com/a/897311/1766230 and http://en.wikipedia.org/wiki/Representational_state_transfer#Applied_to_web_services
Another framework which has not been mentioned so far is Laravel. It's great for building PHP apps in general but thanks to the great router it's really comfortable and simple to build rich APIs. It might not be that slim as Slim or Sliex but it gives you a solid structure.
See Aaron Kuzemchak - Simple API Development With Laravel on YouTube and
Laravel 4: A Start at a RESTful API on NetTuts+
I know that this question is accepted and has a bit of age but this might be helpful for some people who still find it relevant. Although the outcome is not a full RESTful API the API Builder mini lib for PHP allows you to easily transform MySQL databases into web accessible JSON APIs.
As simon marc said, the process is much the same as it is for you or I browsing a website. If you are comfortable with using the Zend framework, there are some easy to follow tutorials to that make life quite easy to set things up. The hardest part of building a restful api is the design of the it, and making it truly restful, think CRUD in database terms.
It could be that you really want an xmlrpc interface or something else similar. What do you want this interface to allow you to do?
--EDIT
Here is where I got started with restful api and Zend Framework.
Zend Framework Example
In short don't use Zend rest server, it's obsolete.