Fetching Images URL from wikipedia - php

I'm using wikipedia api to scrape images from api its returning data in json form in which the image Url is like this
"https://upload.wikimedia.org/wikipedia/en/f/f7/Canada%27s_Aviation_Hall_of_Fame_logo.jpg"
the Url that is same for all images is "https://upload.wikimedia.org/wikipedia/en/"
The Php code is as follows:
<form action="" method="get">
<input type="text" name="search">
<input type="submit" value="Search">
</form>
<?php
if(#$_GET['search']){
$api_url="https://en.wikipedia.org/w/api.php?action=query&format=json&list=allimages&aifrom=".ucwords($_GET['search'])."&ailimit=500";
$api_url=str_replace('', '%20', $api_url);
$curl=curl_init();
curl_setopt($curl, CURLOPT_URL, $api_url);
curl_setopt($curl,CURLOPT_RETURNTRANSFER, true);
$output=curl_exec($curl);
curl_close($curl);
preg_match_all('!//upload.wikimedia.org/wikipedia/en/!', $output, $data);
echo '<pre>';
foreach ($data[0] as $list) {
echo "<img src='$list'/>";
# code...
}
}
?>
How can I get the remaining part of the url correctly?

You need to decode it using json_decode and get the url image link
function get_wiki_image( $search, $limit) {
$streamContext = array(
"ssl" => array(
"verify_peer" => false,
"verify_peer_name" => false,
),
);
$url = 'https://en.wikipedia.org/w/';
$url .= '/api.php?action=query&format=json&list=allimages&aifrom=' . $search . '&ailimit=' . $limit;
$context = stream_context_create($streamContext);
if(FALSE === ($content = #file_get_contents($url, false,$context)) ) {
return false;
} else {
$data = json_decode($content,true);
$ret = array();
foreach($data['query']['allimages'] as $img) {
$ret[] = $img['url'];
}
return $ret;
}
}
$search = ucwords($_GET['search']);
$images = get_wiki_image($search,500);
foreach($images as $img) {
echo "<img src='{$img}'>";
}

Related

Fetching Images of places from wikipedia

I'm Using the following code to fetch images from wikipedia api but at the moment it is giving me random images on that keyword like if I search "spain " it will give me random images with word spain but I need the images of places in spain like we get in wikipedia.
Can any one help me with that?
<form action="" method="get">
<input type="text" name="search">
<input type="submit" value="Search">
</form>
<?php
if(#$_GET['search']){
function get_wiki_image( $search, $limit) {
$streamContext = array(
"ssl" => array(
"verify_peer" => false,
"verify_peer_name" => false,
),
);
$url = 'https://en.wikipedia.org/w/';
$url .= '/api.php?action=query&format=json&list=allimages&aifrom=' . $search . '&ailimit=' . $limit;
$context = stream_context_create($streamContext);
if(FALSE === ($content = #file_get_contents($url, false,$context)) ) {
return false;
} else {
$data = json_decode($content,true);
$ret = array();
foreach($data['query']['allimages'] as $img) {
$ret[] = $img['url'];
}
return $ret;
}
}
$search = ucwords($_GET['search']);
$images = get_wiki_image($search,500);
foreach($images as $img) {
echo "<img src='{$img}' height='50' width='50'>";
}
}
?>
You can use the PageImages API for this purpose. Generally, it returns you the first image in an article, however, depending on the configurations of Wikipedia it might return a different image in some cases.
To get for example the image of the "Barcelona" article, call https://en.wikipedia.org/w/api.php?action=query&prop=pageimages&titles=Barcelona&piprop=original.
If you need the picture in a certain size, you can also call https://en.wikipedia.org/w/api.php?action=query&prop=pageimages&titles=Barcelona&pithumbsize=250.

PHP Write to text file full contents

Frist of all,this is my code so take a look :)
<form method="POST">
<input name="link">
<button type="submit">></button>
</form>
<title>GET IMAGES</title>
<?php
if (!isset($_POST['link'])) exit();
$link = $_POST['link'];
echo '<div id="pin" style="float:center"><textarea class="text" cols="110" rows="50">';
function curl($link)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $link);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
$result = curl_exec($ch);
if(curl_errno($ch)){
return FALSE;
}
return $result;
}
$get=array();
//GET TITLE
$get[$i] = curl($link);
if (preg_match_all('/">(.*?)<\/a><\/h1>/', $get[$i], $matches))
foreach ($matches[1] as $title)
$data = "$title\n";
echo $data;
//GET IMAGES
for($i=1;$i<=100;$i++)
{
if ($i == 1) $url = $link;
else $url = "$link?p=$i";
$get[$i] = curl($url);
if (preg_match_all('/<img id="bigImg" src="(.*?)"/', $get[$i], $matches))
{
foreach ($matches[1] as $content) {
$content = str_replace("//img","http://img",$content);
$data = "<img src=\"".$content."\" />";
echo $data."\r\n";
}
}
if (!substr_count($get[$i], '下一页')) break;
}
file_put_contents("1.txt","$data",FILE_APPEND | LOCK_EX);
echo '</textarea>';
?>
The results i receive in the textarea when i submit the URL like this :
THIS IS A TITLE
<img src="https://img.example.com/1.jpg"/> <img
src="https://img.example.com/2.jpg"/> <img
src="https://img.example.com/3.jpg"/>
But when i use file_put_contents function to write to text file and i open it to check the result, i only get
<img src="https://img.example.com/3.jpg"/>
Any ideas?
You need to append to $data not overwrite it. Use $data .= not $data = because the latter overwrites it.
$data .= "<img src=\"".$content."\" />";
// ^ append

Specific Array to CSV file

I have a problem with sending data from a table to a CSV file.
Array
[link1] => HTTP Code
[link2] => HTTP Code
[link3] => HTTP Code
[link4] => HTTP Code
I need to send the data to a CSV file so that the links do not recur.
Unfortunately, I don't know how to send link after link (I work in a foreach loop) to extract each of these links and send it to CSV, and at the same time check that already did not show up.
This is my code:
require('simple/simple_html_dom.php');
$xml = simplexml_load_file('https://www.gutscheinpony.de/sitemap.xml');
$fp = fopen('Links2.csv', 'w');
set_time_limit(0);
$links=[];
foreach ($xml->url as $link_url)
{
$url = $link_url->loc;
$data=file_get_html($url);
$data = strip_tags($data,"<a>");
$d = preg_split("/<\/a>/",$data);
foreach ( $d as $k=>$u ){
if( strpos($u, "<a href=") !== FALSE ){
$u = preg_replace("/.*<a\s+href=\"/sm","",$u);
$u = preg_replace("/\".*/","",$u);
if ( strpos($u, "http") !== FALSE) {
$ch = curl_init($u);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if(strpos($u, "https://www.gutscheinpony.de/") !== FALSE )
$u = substr($u, 28);
if($u == "/")
$u = $url;
}
$links[$u] = $http_code;
$wynik = array( array($u, $url , $http_code));
foreach ($wynik as $fields) {
fputcsv($fp, $fields);
}
}
}
}
curl_close($ch);
fclose($fp);
echo 'Send to CSV file successfully completed ... ';
I need get every link from .xml, download links that are on the same page and specify the HTTP status. This part I have done. I can't only appropriate way to send data to a CSV file.
I'm counting on your help.
The code below is essentially your code with a few modifications. There was also the observation that :// does not seem acceptable as part of PHP Array Keys.
<?php
require __DIR__ . '/simple/simple_html_dom.php';
$xml = simplexml_load_file('https://www.gutscheinpony.de/sitemap.xml');
$fp = fopen(__DIR__ . '/Links2.csv', 'w');
set_time_limit(0);
$links = [];
$status = false;
foreach ($xml->url as $link_url){
$url = $link_url->loc;
$data = file_get_html($url);
$data = strip_tags($data,"<a>");
$d = preg_split("/<\/a>/",$data);
foreach ( $d as $k=>$u ){
$http_code = 404;
if( strpos($u, "<a href=") !== FALSE ){
$u = preg_replace("/.*<a\s+href=\"/sm","",$u);
$u = preg_replace("/\".*/","",$u);
if ( strpos($u, "http") !== FALSE) {
// JUST GET THE CODE ON EACH ITERATION,
// OPENING THE STREAM & CLOSING IT AGAIN ON EACH ITERATION...
$http_code = getHttpCodeStatus($u);
if(strpos($u, "https://www.gutscheinpony.de/") !== FALSE ){
$u = substr($u, 28);
}
if($u == "/") {
$u = $url;
}
// THIS COULD BE A BUG... USING :// AS PART OF AN ARRAY KEY SEEMS NOT TO WORK
$links[str_replace("://", "_", $u)] = $http_code;
// RUN THE var_dump(), TO VIEW THE PROCESS AS IT PROGRESSES IF YOU WISH TO
var_dump($links);
$status = fputcsv($fp, array($u, $url , $http_code));
}
}
}
}
fclose($fp);
if($status) {
echo count($links) . ' entries were successfully processed and written to disk as a CSV File... ';
}else{
echo 'It seems like some entries were not successfully written to disk - at least the last entry... ';
}
function getHttpCodeStatus($u){
$ch = curl_init($u);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return $http_code;
}

Issues with decoding two JSON feed sources and display with PHP/HTML

I am using two JSON feed sources and PHP to display a real estate property slideshow with agents on a website. The code was working prior to the feed provider making changes to where they store property and agent images. I have made the necessary adjustments for the images, but the feed data is not working now. I have contacted the feed providers about the issue, but they say the problem is on my end. No changes beyond the image URLs were made, so I am unsure where the issue may be. I am new to JSON, so I might be missing something. I have included the full script below. Here are the two JSON feed URLs: http://century21.ca/FeaturedDataHandler.c?DataType=4&EntityType=2&EntityID=2119 and http://century21.ca/FeaturedDataHandler.c?DataType=3&AgentID=27830&RotationType=1. The first URL grabs all of the agents and the second grabs a single agent's properties. The AgentID value is sourced from the JSON feed URL dynamically.
class Core
{
private $base_url;
private $property_image_url;
private $agent_id;
private $request_agent_properties_url;
private $request_all_agents_url;
private function formatJSON($json)
{
$from = array('Props:', 'Success:', 'Address:', ',Price:', 'PicTicks:', ',Image:', 'Link:', 'MissingImage:', 'ShowingCount:', 'ShowcaseHD:', 'ListingStatusCode:', 'Bedrooms:', 'Bathrooms:', 'IsSold:', 'ShowSoldPrice:', 'SqFootage:', 'YearBuilt:', 'Style:', 'PriceTypeDesc:');
$to = array('"Props":', '"Success":', '"Address":', ',"Price":', '"PicTicks":', ',"Image":', '"Link":', '"MissingImage":', '"ShowingCount":', '"ShowcaseHD":', '"ListingStatusCode":', '"Bedrooms":', '"Bathrooms":', '"IsSold":', '"ShowSoldPrice":', '"SqFootage":', '"YearBuilt":', '"Style":', '"PriceTypeDesc":' );
return str_ireplace($from, $to, $json); //returns the clean JSON
}
function __construct($agent=false)
{
$this->base_url = 'http://www.century21.ca';
$this->property_image_url = 'http://images.century21.ca';
$this->agent_id = ($agent ? $agent : false);
$this->request_all_agents_url =
$this->base_url.'/FeaturedDataHandler.c?DataType=4&EntityType=3&EntityID=3454';
$this->request_agent_properties_url =
$this->base_url.'/FeaturedDataHandler.c?DataType=3'.'&AgentID='.$this->agent_id.'&RotationType=1';
}
/**
* getSlides()
*/
function getSlides()
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $this->request_all_agents_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$response = curl_exec($ch);
curl_close($ch);
if (empty($response))
return false;
else
$agents = $this->decode_json_string($response);
// Loop Agents And Look For Requested ID
foreach ($agents as $agent)
{
if (($this->agent_id != false) && (isset($agent['WTLUserID'])) && ($agent['WTLUserID'] != $this->agent_id))
{
continue; // You have specified a
}
$properties = $this->getProperties($agent['WTLUserID']);
$this->print_property_details($properties, $agent);
}
}
/**
* getProperties()
*/
function getProperties($agent_id)
{
$url = $this->base_url.'/FeaturedDataHandler.c?DataType=3'.'&AgentID='.$agent_id.'&RotationType=1';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$response = curl_exec($ch);
curl_close($ch);
$json = json_decode($response);
if (empty($response))
die('No response 2'); //return false;
else
$json = $this->formatJSON($this->decode_json_string($response));
var_dump($json);
die();
// return $json;
}
/**
* print_property_details()
*/
function print_property_details($properties, $agent, $html='')
{
$BASE_URL = $this->base_url;
$PROPERTY_IMAGE_URL = $this->property_image_url;
foreach ($properties as $property)
{
$img = $property['Image'];
// $img = ($property['Image'] ? $property['Image'] : "some url to a dummy image here")
if($property['ListingStatusCode'] != 'SOLD'){
$address = $property['Address'];
$shortaddr = substr($address, 0, -12);
$html .= "<div class='listings'>";
$html .= "<div class='property-image'>";
$html .= "<img src='". $PROPERTY_IMAGE_URL ."' width='449' height='337' alt='' />";
$html .= "</div>";
$html .= "<div class='property-info'>";
$html .= "<span class='property-price'>". $property['Price'] ."</span>";
$html .= "<span class='property-street'>". $shortaddr ."</span>";
$html .= "</div>";
$html .= "<div class='agency'>";
$html .= "<div class='agent'>";
$html .= "<img src='". $agent['PhotoUrl']. "' class='agent-image' width='320' height='240' />";
$html .= "<span class='agent-name'><b>Agent:</b>". $agent['DisplayName'] ."</span>";
$html .= "</div>";
$html .= "</div>";
$html .= "</div>";
}
}
echo $html;
}
function decode_json_string($json)
{
// Strip out junk
$strip = array("{\"Agents\": [","{Props: ",",Success:true}",",\"Success\":true","\r","\n","[{","}]");
$json = str_replace($strip,"",$json);
// Instantiate array
$json_array = array();
foreach (explode("},{",$json) as $row)
{
/// Remove commas and colons between quotes
if (preg_match_all('/"([^\\"]+)"/', $row, $match)) {
foreach ($match as $m)
{
$row = str_replace($m,str_replace(",","|comma|",$m),$row);
$row = str_replace($m,str_replace(":","|colon|",$m),$row);
}
}
// Instantiate / clear array
$array = array();
foreach (explode(',',$row) as $pair)
{
$var = explode(":",$pair);
// Add commas and colons back
$val = str_replace("|colon|",":",$var[1]);
$val = str_replace("|comma|",",",$val);
$val = trim($val,'"');
$val = trim($val);
$key = trim($var[0]);
$key = trim($key,'{');
$key = trim($key,'}');
$array[$key] = $val;
}
// Add to array
$json_array[] = $array;
}
return $json_array;
}
}
Try this code to fix the JSON:
$url = 'http://century21.ca/FeaturedDataHandler.c?DataType=3&AgentID=27830&RotationType=1';
$invalid_json = file_get_contents($url);
$json = preg_replace("/([{,])([a-zA-Z][^: ]+):/", "$1\"$2\":", $invalid_json);
var_dump($json);
All your keys need to be double-quoted
JSON on the second URL is not a valid JSON, that's why you're not getting the reults, as PHP unable to decode that feed.
I tried to process it, and get this error
Error: Parse error on line 1:
{Props: [{Address:"28
-^
Expecting 'STRING', '}'
Feed image for first URL
and here is view of 2nd URL's feed
as per error for second feed, all the keys should be wrapped within " as these are strings rather than CONSTANTS.
e.g.
Props should be "Props" and all other too.
EDIT
You need to update your functionand add this one(formatJSON($json)) to your class
// Update this function, just need to update last line of function
function getProperties($agent_id)
{
$url = $this->base_url.'/FeaturedDataHandler.c?DataType=3'.'&AgentID='.$agent_id.'&RotationType=1';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$response = curl_exec($ch);
curl_close($ch);
$json = json_decode($response);
if (empty($response))
die('No response 2'); //return false;
else
return $this->formatJSON($this->decode_json_string($response)); //this one only need to be updated.
}
//add this function to class. This will format json
private function formatJSON($json){
$from= array('Props:', 'Success:', 'Address:', ',Price:', 'PicTicks:', ',Image:', 'Link:', 'MissingImage:', 'ShowingCount:', 'ShowcaseHD:', 'ListingStatusCode:', 'Bedrooms:', 'Bathrooms:', 'IsSold:', 'ShowSoldPrice:', 'SqFootage:', 'YearBuilt:', 'Style:', 'PriceTypeDesc:');
$to = array('"Props":', '"Success":', '"Address":', ',"Price":', '"PicTicks":', ',"Image":', '"Link":', '"MissingImage":', '"ShowingCount":', '"ShowcaseHD":', '"ListingStatusCode":', '"Bedrooms":', '"Bathrooms":', '"IsSold":', '"ShowSoldPrice":', '"SqFootage":', '"YearBuilt":', '"Style":', '"PriceTypeDesc":' );
return str_ireplace($from, $to, $json); //returns the clean JSON
}
EDIT
I've tested that function, and it's working fine, may be there is something wrong with your function decode_json_string($json)
I've taken unclean json from second URL, and cleaning it here, and putting that cleaned json in json editor to check either it's working or not HERE

Sequential cURLs returns the same content despite updating url

I'm trying to write a script to cURL a few pages from a password protected site.
The idea is to scrape information on submitted stock codes from their products database to generate and print out the results (eventually importing directly to my own database, but currently just printing the results on screen).
My function is as follows:
function LookupProduct($ItemCodes) {
//set a temp file name for the login cookie
$tmp_fname = "tmp/".md5(date('D F d')).".cookie";
$tmp_fname = realpath($tmp_fname);
//reset/declare the functions output
$return = '';
// build post data from form
$fields = array(
'UserName' => urlencode("username"),
'Password' => urlencode("password"),
);
$fieldString='';
foreach($fields as $key=>$value) {
$fieldString .= $key.'='.$value.'&';
}
rtrim($fieldString, '&');
//initialise the curl session
$ch = curl_init();
//set options for curl login
$loginurl = "https://suppliers-website/login/";
curl_setopt($ch,CURLOPT_URL, $loginurl);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch,CURLOPT_COOKIESESSION, true);
curl_setopt($ch,CURLOPT_POST, count($fields));
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_POSTFIELDS, $fieldString);
curl_setopt($ch,CURLOPT_COOKIEJAR, $tmp_fname);
curl_setopt($ch,CURLOPT_COOKIEFILE, $tmp_fname);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, true);
//do the actual login, generate cookie
$result = curl_exec($ch);
//build array of codes to lookup
$codes=explode(",", $ItemCodes);
//lookup each code in the array
foreach($codes as $code) {
//set the product page to curl
$lookupUrl = "https://suppliers-website/product/".$code;
curl_setopt($ch,CURLOPT_URL, $lookupUrl);
//load product page html into $lookupcontent
unset($lookupcontent);
$lookupcontent = curl_exec($ch);
//if we have a valid page, then go ahead and pluck the data
if (strlen($lookupcontent) < 100) {
echo "<li>Error logging in: <blockquote>".$lookupcontent."</blockquote></li>";
} else {
//load product page html into a DOM
unset($dom);
unset($xpath);
$dom = new DOMDocument;
$dom->loadHTML($lookupcontent);
$xpath = new DOMXPath($dom);
//find the image src
unset($imgnames);
foreach($dom->getElementsByTagName('a') as $node) {
if (strpos($node->getAttribute('href'),'StockLoRes') !== false) {
$imgnames = explode("=", $node->getAttribute('href'));
$imgname = $imgnames[1];
$filelocation = $node->getAttribute('href');
}
}
//set the image to curl
$imglink = "https://suppliers-website/login/".$filelocation;
curl_setopt($ch,CURLOPT_URL,$imglink);
//curl the image
unset($curlimage);
$curlimage = curl_exec($ch);
//save the image locally
unset($saveimage);
$saveimage = fopen('tmp/'.$imgname, 'w');
fwrite($saveimage, $curlimage);
fclose($saveimage);
// find the product description
unset($results);
$classname = 'ItemDetails_Description';
$results = $xpath->query("//*[#class='" . $classname . "']");
if ($results->length > 0) {
$description = $results->item(0)->nodeValue;
$description = strip_tags($description);
$description = str_replace("•", "", $description);
}
//find the price
unset($pricearray);
foreach($dom->getElementsByTagName('div') as $node) {
if (strpos($node->nodeValue,'£') !== false) {
$pricearray[] = $node->nodeValue;
}
}
$pricearray=array_reverse($pricearray);
$price = $pricearray[0];
$price = str_replace("£", "", $price);
//find the title
unset($results);
$classname = 'ItemDetails_ItemName';
$results = $xpath->query("//*[#class='" . $classname . "']");
if ($results->length > 0) {
$title = $results->item(0)->nodeValue;
}
//find the publisher
unset($results);
$classname = 'ItemDetails_Publisher';
$results = $xpath->query("//*[#class='" . $classname . "']");
if ($results->length > 0) {
$publisher = $results->item(0)->nodeValue;
}
}
//add all the values to the data to be returned
$return .= '<div style="border:1px solid grey;margin:20px;float:left;">';
$return .= "<a href='tmp/".$imgname."'>";
$return .= "<img src='tmp/".$imgname."' width='100' align='left' /></a>";
$return .= "<h1>" .$title ."</h1>";
$return .= "<h3>" .$publisher ."</h3>";
$return .= "<h2>£" .$price ."</h2>";
$return .= "<h4>" .$description."</h2>";
$return .= '</div><br clear="all" />';
}
//echo out the data
echo $return;
//close connection
curl_close($ch);
}
I am using the following to trigger it:
if(isset($_POST['ItemCodes'])) {
$code=$_POST['ItemCodes'];
$code=str_replace("\n\r", ",", $code);
$code=str_replace("\r", ",", $code);
echo "ItemCodes: ".$code;
echo LookupProduct($code);
}
The script can successfully log in, save a cookie, and get info from a page, but if I try to request multiple pages the script fails to work as intended, instead returning 3 instances of the same product. Did I fail to reset a variable somewhere? I've tried unsetting everything but I still just get the same product three times, as if my function only works once.

Categories