Fetching Images of places from wikipedia

Fetching Images of places from wikipedia - php

I'm Using the following code to fetch images from wikipedia api but at the moment it is giving me random images on that keyword like if I search "spain " it will give me random images with word spain but I need the images of places in spain like we get in wikipedia.
Can any one help me with that?
<form action="" method="get">
<input type="text" name="search">
<input type="submit" value="Search">
</form>
<?php
if(#$_GET['search']){
function get_wiki_image( $search, $limit) {
$streamContext = array(
"ssl" => array(
"verify_peer" => false,
"verify_peer_name" => false,
),
);
$url = 'https://en.wikipedia.org/w/';
$url .= '/api.php?action=query&format=json&list=allimages&aifrom=' . $search . '&ailimit=' . $limit;
$context = stream_context_create($streamContext);
if(FALSE === ($content = #file_get_contents($url, false,$context)) ) {
return false;
} else {
$data = json_decode($content,true);
$ret = array();
foreach($data['query']['allimages'] as $img) {
$ret[] = $img['url'];
}
return $ret;
}
}
$search = ucwords($_GET['search']);
$images = get_wiki_image($search,500);
foreach($images as $img) {
echo "<img src='{$img}' height='50' width='50'>";
}
}
?>

You can use the PageImages API for this purpose. Generally, it returns you the first image in an article, however, depending on the configurations of Wikipedia it might return a different image in some cases.
To get for example the image of the "Barcelona" article, call https://en.wikipedia.org/w/api.php?action=query&prop=pageimages&titles=Barcelona&piprop=original.
If you need the picture in a certain size, you can also call https://en.wikipedia.org/w/api.php?action=query&prop=pageimages&titles=Barcelona&pithumbsize=250.

Related

Fetching Images URL from wikipedia

I'm using wikipedia api to scrape images from api its returning data in json form in which the image Url is like this
"https://upload.wikimedia.org/wikipedia/en/f/f7/Canada%27s_Aviation_Hall_of_Fame_logo.jpg"
the Url that is same for all images is "https://upload.wikimedia.org/wikipedia/en/"
The Php code is as follows:
<form action="" method="get">
<input type="text" name="search">
<input type="submit" value="Search">
</form>
<?php
if(#$_GET['search']){
$api_url="https://en.wikipedia.org/w/api.php?action=query&format=json&list=allimages&aifrom=".ucwords($_GET['search'])."&ailimit=500";
$api_url=str_replace('', '%20', $api_url);
$curl=curl_init();
curl_setopt($curl, CURLOPT_URL, $api_url);
curl_setopt($curl,CURLOPT_RETURNTRANSFER, true);
$output=curl_exec($curl);
curl_close($curl);
preg_match_all('!//upload.wikimedia.org/wikipedia/en/!', $output, $data);
echo '<pre>';
foreach ($data[0] as $list) {
echo "<img src='$list'/>";
# code...
}
}
?>
How can I get the remaining part of the url correctly?

You need to decode it using json_decode and get the url image link
function get_wiki_image( $search, $limit) {
$streamContext = array(
"ssl" => array(
"verify_peer" => false,
"verify_peer_name" => false,
),
);
$url = 'https://en.wikipedia.org/w/';
$url .= '/api.php?action=query&format=json&list=allimages&aifrom=' . $search . '&ailimit=' . $limit;
$context = stream_context_create($streamContext);
if(FALSE === ($content = #file_get_contents($url, false,$context)) ) {
return false;
} else {
$data = json_decode($content,true);
$ret = array();
foreach($data['query']['allimages'] as $img) {
$ret[] = $img['url'];
}
return $ret;
}
}
$search = ucwords($_GET['search']);
$images = get_wiki_image($search,500);
foreach($images as $img) {
echo "<img src='{$img}'>";
}

Sorting PHP by date modified results from glob

I need to go into a directory [named 'People'] and pull the names of folders and then construct some HTML that builds links using the contents of the directories.
I take a folder named: article-number-one and display a title, link, thumbnail, and excerpt based on the folder name.
Here is my code. It works except for the ordering. It is alphabetical. I want it to be by date created...newest on top:
<?php
$files = glob("people/*");
foreach($files as $file)
{
echo '<div class="block"><img src="people/'.basename($file).'/'.basename($file).'-thumbnail.jpg" height="180" width="320" alt="'.basename($file).'"/><br/><br/>';
echo ''.str_replace('-', ' ', basename($file, "")).'<br/><div class="excerpt">';
include 'people/'.basename($file).'/'.basename($file).'-excerpt.txt';
echo '</div><hr></div>';
}
?>
Please help me to order the resulting HTML newest to oldest.

Do this in several steps. This way you can check out how it proceeds and test each step separately.
// Get an array with all useful file info
$fileInfo = array_map(
function($filename) {
$short = basename($filename);
// Excerpt
$excerpt = $filename . DIRECTORY_SEPARATOR . $short . '-excerpt.txt';
if (is_readable($excerpt)) {
$text = trim(file_get_contents($excerpt));
} else {
$text = 'Excerpt not available';
}
return [
'full' => $filename,
'base' => $short,
'thumb' => $filename . DIRECTORY_SEPARATOR . $short . '.jpg',
'name' => ucwords(str_replace('-', ' ', $short)),
'date' => filectime($filename),
'text' => $text,
];
},
glob("people/*", GLOB_ONLYDIR)
);
To sort the array:
// Now sort the array. Have a function decide when a file is before another
usort($fileInfo, function($info1, $info2) {
if ($info1['date'] < $info2['date']) {
return 1;
}
if ($info1['date'] > $info2['date']) {
return -1;
}
return 0;
});
Finally you get the HTML in a variable. As a rule, it's good to have as few echo/print's as possible. Mixing PHP and output can be a maintenance nightmare.
// Get the HTML. You could put this in a .html file of its own
// such as 'template.html' and read it with
// $template = trim(file_get_contents('template.html'));
$template = <<<INFO2HTML
<div class="block">
<a href="{base}"><img
src="{thumb}"
height="180" width="320"
alt="{base}"/></a>
<br />
<br />
{name}
<br />
<div class="excerpt">{text}</div>
<hr />
</div>
INFO2HTML;
// Repeat the parsing of the template using each entry
// to populate the variables.
$html = implode('', array_map(
function($info) use($template) {
return preg_replace_callback(
'#{(.*?)}#',
function($keys) use ($info) {
return $info[$keys[1]];
},
$template
);
},
$fileInfo
));
print $html;
Testing
$ mkdir people
$ mkdir people/joe-smith
$ echo "this is the excerpt" > people/joe-smith/joe-smith-excerpt.txt
$ touch people/joe-smith/joe-smith-thumbnail.jpg
$ ls -R people
people:
joe-smith
people/joe-smith:
joe-smith-excerpt.txt
joe-smith-thumbnail.jpg
Sample run outputs:
<div class="block">
<a href="joe-smith"><img
src="people/joe-smith/joe-smith.jpg"
height="180" width="320"
alt="joe-smith"/></a>
<br />
<br />
Joe Smith
<br />
<div class="excerpt">this is the excerpt</div>
<hr /> </div>

usort($files, function($a, $b) {
return ($a['date'] < $b['date']) ? -1 : 1;
});

list of urls in associative array

My string looks like this:
http://localhost/layerthemes/wp-content/uploads/2014/05/46430454_Subscription_XXL-4_mini.jpghttp://localhost/layerthemes/wp-content/uploads/2014/05/Eddy-Need-Remix-mp3-image.jpghttp://localhost/layerthemes/wp-content/uploads/2013/03/static-pages.png
How do I extract each urls in array like this:
array(
0 => 'http://localhost/layerthemes/wp-content/uploads/2014/05/46430454_Subscription_XXL-4_mini.jpg'
1 => 'http://localhost/layerthemes/wp-content/uploads/2014/05/46430454_Subscription_XXL-4_mini.jpg'
2 => 'http://localhost/layerthemes/wp-content/uploads/2014/05/46430454_Subscription_XXL-4_mini.jpg'
)
This is how i tried with no avail:
$imgss = 'http://localhost/layerthemes/wp-content/uploads/2014/05/46430454_Subscription_XXL-4_mini.jpghttp://localhost/layerthemes/wp-content/uploads/2014/05/Eddy-Need-Remix-mp3-image.jpghttp://localhost/layerthemes/wp-content/uploads/2013/03/static-pages.png';
preg_match_all(
"#((?:[\w-]+://?|[\w\d]+[.])[^\s()<>]+[.](?:\([\w\d]+\)|(?:[^`!()\[\]{};:'\".,<>?«»“”‘’\s]|(?:[:]\d+)?/?)+))#",
$imgss
);
foreach($imgss as $imgs){
echo '<img src="'.$imgs.'" />';
}
Any help would be appreciated. needless to say I am very weak in php
Thanks

If there are no spaces in string you can use:
$string = 'http://localhost/layerthemes/wp-content/uploads/2014/05/46430454_Subscription_XXL-4_mini.jpghttp://localhost/layerthemes/wp-content/uploads/2014/05/Eddy-Need-Remix-mp3-image.jpghttp://localhost/layerthemes/wp-content/uploads/2013/03/static-pages.png';
$string = str_replace( 'http', ' http', $string );
$array = array_filter( explode( ' ', $string ) );
print_r( $array );

Exploding is fine but perhaps you should also validate the inputted links, ive put together this which will let you know the inputted links need to be on a new line or have a space between them, then it will validate the links and create a new array of valid links that you can then do something with.
<?php
if($_SERVER['REQUEST_METHOD'] == 'POST' & !empty($_POST['links'])){
//replace all \r\n and \n and space with , delimiter
$links = str_replace(array(PHP_EOL, "\r\n", " "), ',', $_POST['links']);
//explode using ,
$links = explode(',', $links);
//validate links by going through the array
foreach($links as $link){
//does the link contain more then one http://
if(substr_count($link, 'http://') >1){
$error[] = 'Add each url on a new line or separate with a space.';
}else{
//does the link pass validation
if(!filter_var($link, FILTER_VALIDATE_URL)){
$error[] = 'Invalid url skipping: '.htmlentities($link);
}else{
//does the link contain http or https
$scheme = parse_url($link, PHP_URL_SCHEME);
if($scheme == 'http' || $scheme == 'https'){
//yes alls good, add to valid links array
$valid_links[] = $link;
}else{
$error[] = 'Invalid url skipping: '.htmlentities($link);
}
}
}
}
//show whats wrong
if(!empty($error)){
echo '
<pre>
'.print_r($error, true).'
</pre>';
}
//your valid links do somthing
if(!empty($valid_links)){
echo '
<pre>
'.print_r($valid_links, true).'
</pre>';
}
}?>
<form method="POST" action="">
<textarea rows="2" name="links" cols="50"><?php echo (isset($_POST['links']) ? htmlentities($_POST['links']) : null);?></textarea><input type="submit" value="Submit">
</form>
Perhaps it will help.

How about:
$input = "http://localhost/layerthemes/wp-content/uploads/2014/05/46430454_Subscription_XXL-4_mini.jpghttp://localhost/layerthemes/wp-content/uploads/2014/05/Eddy-Need-Remix-mp3-image.jpghttp://localhost/layerthemes/wp-content/uploads/2013/03/static-pages.png";
$exploded = explode("http://", $input);
$result;
for ($i = 1; $i < count($exploded); ++$i)
{
$result[$i - 1] = "http://" . $exploded[$i];
}

Here's an example, if you have control over this entire process.
Your form:
<form id="myform" method="POST">
</form>
Your javascript (using jquery):
<script>
var myurls = getUrls();
$('<input>').attr({
type: 'hidden',
name: 'myurls',
value: JSON.stringify(myurls),
}).appendTo('#myform');
// gathers your URLs (however you do this) and returns them as a javascript array
function getUrls() {
// just return this as a placeholder/example
return ["http://localhost/layerthemes/wp-content/uploads/2014/05/46430454_Subscription_XXL-4_mini.jpg", "http://localhost/layerthemes/wp-content/uploads/2014/05/Eddy-Need-Remix-mp3-image.jpg", "http://localhost/layerthemes/wp-content/uploads/2013/03/static-pages.png"];
}
</script>
Your PHP:
$myurls = json_decode($_POST['myurls']);
var_dump($myurls); // should be the array you sent
You could do this with AJAX too if you want. Or make the form automatically submit.

xpath replace [en-media] elements with [img]

I need to find and replace in an Evernote xml file. It contains multiple entries like this:
<en-media alt="Evernote Logo" hash="4914ced8925f9adcc1c58ab87813c81f" type="image/png"></en-media>
<en-media alt="Evernote Logo" hash="4914dsd8925f9adcc1c58ab87813c81f" type="image/png"></en-media>
with
<img src="https://sandbox.evernote.com/shard/s1/res/143c8ad0-da92-4271-8410-651b88e8a2f1" height="something" width="something"/>
<img src="https://sandbox.evernote.com/shard/s1/res/143c8233-da92-4271-8410-651b88e8a2f1" height="something" width="something"/>
I'm doing this because Evernote's SDK admittedly lacks an easy way to show both textual and mime data (images, pdf's, etc.) in a single (or several easy) commands.
Here is my attempt at a "find" but now I need the "replace":
$content=html_entity_decode($content); //content contains and — causing simplexml to complain
$content=str_replace('&',htmlentities('&'),$content); //encode & -- Lewis & Clark
$x = new SimpleXMLElement($content);
$x = xpath('//en-media'); //find
[?????] MYCODE_getLink(); //replace
$content=$x->asXML(); //output
$content=htmlentities($content); //put entities back

Below is my solution, which just appends, since I couldn't figure out the replace. I still need to remove certain XML tags anyway for the final HTML presentation. Anyway, it works, so I'm posting it here in case it helps you:
$f = new NoteFilter();
$f->notebookGuid="e0a42e90-0297-442f-8157-44a596e5b8b5"; //default
//$f->notebookGuid="b733f6ab-e3b7-443a-8f5a-2bbe77ea1c1e"; //MyStuff
$n = $noteStore->findNotes($authToken, $f, 0, 100); // Fetch up to 100 notes
$total=$n->totalNotes;
if (!empty($n->notes)) {
foreach ($n->notes as $note) {
$fullNote = $noteStore->getNote($authToken, $note->guid, true, false, false, false);
$content = $fullNote->content;
$dom = new DOMDocument;
$dom->loadXml($content);
//list all <en-media>
$medias = $dom->getElementsByTagName('en-media');
foreach ($medias as $media) {
$hash = $media->getAttribute('hash');
$hash = hashXmlToBin($hash); //xml to bin for ByHash method below
$resource=$noteStore->getResourceByHash($authToken, $note->guid,$hash,0,0,0,0);
//get url
$url=resourceUrl($authToken,$resource);
//if image, show inline
$inline=array('image/png','image/jpeg','image/jpg','image/gif');
if (in_array($resource->mime,$inline)) {
$img=$dom->createElement('img');
$img->setAttribute('src', $url);
$img->setAttribute('width', $resource->width);
$img->setAttribute('height', $resource->height);
}else { //show link
$rewrite=array('application/pdf'=>'PDF');
$mime=str_replace('application/','',$resource->mime);
$filename=$resource->attributes->fileName;
$img=$dom->createElement('a',"Download {$filename} ({$mime})");
$img->setAttribute('href', $url);
$img->setAttribute('class', "download-attachement");
}
// append to DOM
$media->appendChild($img);
}//foreach medias
$content=$dom->saveXML();
$out[]=$content;
}//foreach notes
foreach ($out as $val)
print "<hr/>".$val; //each note
}//notes exist
/*
* http://discussion.evernote.com/topic/4521-en-media-hash/
*/
function hashXmlToBin($hash) {
$chunks = explode("\n", chunk_split($hash,2,"\n"));
$calc_hash = "";
foreach ($chunks as $chunk) {
$newdata="";
if (!empty($chunk)) {
$len = strlen($chunk);
for($i=0;$i<$len;$i+=2) {
$newdata .= pack("C",hexdec(substr($chunk,$i,2)));
}
$bin_chunk = $newdata;
$calc_hash .= $bin_chunk;
}
}
return $calc_hash;
}
/*
* return a resource url
*/
function resourceUrl($authToken, $resource, $resize = FALSE, $thumbnailSize = 150) {
//build URL
if (!$resize)
$url=EVERNOTE_SERVER."/shard/".$_SESSION['shard']."/res/".$resource->guid; //originals
else
$url=EVERNOTE_SERVER."/shard/".$_SESSION['shard']."/thm/res/".$resource->guid."?size={$thumbnailSize}"; //thumbnail
return $url;
}

Need help with preg_replace interpreting {variables} with parameters

I want to replace
{youtube}Video_ID_Here{/youtube}
with the embed code for a youtube video.
So far I have
preg_replace('/{youtube}(.*){\/youtube}/iU',...)
and it works just fine.
But now I'd like to be able to interpret parameters like height, width, etc. So could I have one regex for this whether is does or doesn't have parameters? It should be able to inperpret all of these below...
{youtube height="200px" width="150px" color1="#eee" color2="rgba(0,0,0,0.5)"}Video_ID_Here{/youtube}
{youtube height="200px"}Video_ID_Here{/youtube}
{youtube}Video_ID_Here{/youtube}
{youtube width="150px" showborder="1"}Video_ID_Here{/youtube}

Try this:
function createEmbed($videoID, $params)
{
// $videoID contains the videoID between {youtube}...{/youtube}
// $params is an array of key value pairs such as height => 200px
return 'HTML...'; // embed code
}
if (preg_match_all('/\{youtube(.*?)\}(.+?)\{\/youtube\}/', $string, $matches)) {
foreach ($matches[0] as $index => $youtubeTag) {
$params = array();
// break out the attributes
if (preg_match_all('/\s([a-z0-9]+)="([^\s]+?)"/', $matches[1][$index], $rawParams)) {
for ($x = 0; $x < count($rawParams[0]); $x++) {
$params[$rawParams[1][$x]] = $rawParams[2][$x];
}
}
// replace {youtube}...{/youtube} with embed code
$string = str_replace($youtubeTag, createEmbed($matches[2][$index], $params), $string);
}
}
this code matches the {youtube}...{/youtube} tags first and then splits out the attributes into an array, passing both them (as key/value pairs) and the video ID to a function. Just fill in the function definition to make it validate the params you want to support and build up the appropriate HTML code.

You probably want to use preg_replace_callback, as the replacing can get quite convoluted otherwise.
preg_replace_callback('/{youtube(.*)}(.*){\/youtube}/iU',...)
And in your callback, check $match[1] for something like the /(width|showborder|height|color1)="([^"]+)"/i pattern. A simple preg_match_all inside a preg_replace_callback keeps all portions nice & tidy and above all legible.

I would do it something like this:
preg_match_all("/{youtube(.*?)}(.*?){\/youtube}/is", $content, $matches);
for($i=0;$i<count($matches[0]);$i++)
{
$params = $matches[1][$i];
$youtubeurl = $matches[2][$i];
$paramsout = array();
if(preg_match("/height\s*=\s*('|\")([0-9]+px)('|\")/i", $params, $match)
{
$paramsout[] = "height=\"{$match[2]}\"";
}
//process others
//setup new code
$tagcode = "<object ..." . implode(" ", $paramsout) ."... >"; //I don't know what the code is to display a youtube video
//replace original tag
$content = str_replace($matches[0][$i], $tagcode, $content);
}
You could just look for params after "{youtube" and before "}" but you open yourself up to XSS problems. The best way would be look for a specific number of parameters and verify them. Don't allow things like < and > to be passed inside your tags as someone could put do_something_nasty(); or something.

I'd not use regex at all, since they are notoriously bad at parsing markup.
Since your input format is so close to HTML/XML in the first place, I'd rely on that
$tests = array(
'{youtube height="200px" width="150px" color1="#eee" color2="rgba(0,0,0,0.5)"}Video_ID_Here{/youtube}'
, '{youtube height="200px"}Video_ID_Here{/youtube}'
, '{youtube}Video_ID_Here{/youtube}'
, '{youtube width="150px" showborder="1"}Video_ID_Here{/youtube}'
, '{YOUTUBE width="150px" showborder="1"}Video_ID_Here{/youtube}' // deliberately invalid
);
echo '<pre>';
foreach ( $tests as $test )
{
try {
$youtube = SimpleXMLYoutubeElement::fromUserInput( $test );
print_r( $youtube );
}
catch ( Exception $e )
{
echo $e->getMessage() . PHP_EOL;
}
}
echo '</pre>';
class SimpleXMLYoutubeElement extends SimpleXMLElement
{
public static function fromUserInput( $code )
{
$xml = #simplexml_load_string(
str_replace( array( '{', '}' ), array( '<', '>' ), strip_tags( $code ) ), __CLASS__
);
if ( !$xml || 'youtube' != $xml->getName() )
{
throw new Exception( 'Invalid youtube element' );
}
return $xml;
}
public function toEmbedCode()
{
// write code to convert this to proper embode code
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Fetching Images of places from wikipedia - php

Related

Fetching Images URL from wikipedia

Sorting PHP by date modified results from glob

list of urls in associative array

xpath replace [en-media] elements with [img]

Need help with preg_replace interpreting {variables} with parameters

Categories

Resources