I want to parse some data (game names) from an external page :
https://www.mol.com/Product/GamesHive
using this code :
<?php
$url = 'https://www.mol.com/Product/GamesHive';
$content = file_get_contents($url);
$first_step = explode('<div class="col-xs-4">', $content);
$second_step = explode('</div>', $first_step[1]);
echo $second_step[0];
?>
However some data is lost. The original page has 384 items and my page only has 169 items. What's the problem?
Is this what you are trying to achieve?
<?php
$curl = curl_init();
curl_setopt_array($curl,
array(
CURLOPT_URL => 'https://www.mol.com/Product/GamesHive',
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_SSL_VERIFYHOST => 0,
CURLOPT_SSL_VERIFYPEER => FALSE
)
);
$response = curl_exec($curl);
curl_close($curl);
$arr = simplexml_import_dom(#DOMDocument::loadHTML($response))
->xpath('//div[#class="caption"]');
foreach ($arr as $val)
{
echo $val->h5->a . '<br />';
}
?>
Related
I'm using wikipedia api to scrape images from api its returning data in json form in which the image Url is like this
"https://upload.wikimedia.org/wikipedia/en/f/f7/Canada%27s_Aviation_Hall_of_Fame_logo.jpg"
the Url that is same for all images is "https://upload.wikimedia.org/wikipedia/en/"
The Php code is as follows:
<form action="" method="get">
<input type="text" name="search">
<input type="submit" value="Search">
</form>
<?php
if(#$_GET['search']){
$api_url="https://en.wikipedia.org/w/api.php?action=query&format=json&list=allimages&aifrom=".ucwords($_GET['search'])."&ailimit=500";
$api_url=str_replace('', '%20', $api_url);
$curl=curl_init();
curl_setopt($curl, CURLOPT_URL, $api_url);
curl_setopt($curl,CURLOPT_RETURNTRANSFER, true);
$output=curl_exec($curl);
curl_close($curl);
preg_match_all('!//upload.wikimedia.org/wikipedia/en/!', $output, $data);
echo '<pre>';
foreach ($data[0] as $list) {
echo "<img src='$list'/>";
# code...
}
}
?>
How can I get the remaining part of the url correctly?
You need to decode it using json_decode and get the url image link
function get_wiki_image( $search, $limit) {
$streamContext = array(
"ssl" => array(
"verify_peer" => false,
"verify_peer_name" => false,
),
);
$url = 'https://en.wikipedia.org/w/';
$url .= '/api.php?action=query&format=json&list=allimages&aifrom=' . $search . '&ailimit=' . $limit;
$context = stream_context_create($streamContext);
if(FALSE === ($content = #file_get_contents($url, false,$context)) ) {
return false;
} else {
$data = json_decode($content,true);
$ret = array();
foreach($data['query']['allimages'] as $img) {
$ret[] = $img['url'];
}
return $ret;
}
}
$search = ucwords($_GET['search']);
$images = get_wiki_image($search,500);
foreach($images as $img) {
echo "<img src='{$img}'>";
}
I have a simple script that until yesterday had worked fine for 2 years. Im just taking a XML feed from a WP site and formatting it to be displayed on a different website. Here is the code:
<?php
function download_page($path){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$path);
curl_setopt($ch, CURLOPT_FAILONERROR,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
$retValue = curl_exec($ch);
curl_close($ch);
return $retValue;
}
$sXML = download_page('https://example.com/tradeblog/feed/atom/');
$oXML = new SimpleXMLElement($sXML);
$items = $oXML->entry;
$i = 0;
foreach($items as $item) {
$title = $item->title;
$link = $item->link;
echo '<li>';
foreach($link as $links) {
$loc = $links['href'];
$href = str_replace("/feed/atom/", "", $loc);
echo "<a href=\"$href\" target=\"_blank\">";
}
echo $title;
echo "</a>";;
echo "</li>";
if(++$i == 3) break;
}
?>
I can echo out $sXML and it will display the entire XML contents as expected. When I try and echo $oXML I get the 500 error. Any use of $oXML causes the 500. What changed? Is there a different / better way to do this using PHP?
It seems your xml source is not exactly a xml. I tried to validate it using w3 scholl validator and it throws an error. Tried here too, and got the same error.
Not sure why, but this worked
<?php
$rss = new DOMDocument();
$rss->load('https://example.com/tradeblog/feed/rss2/');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
$item = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
);
array_push($feed, $item);
}
$limit = 3;
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
echo '<li>'.$title.'</li>';
}
?>
I am trying to upgrade a friend's Wordpress site to the latest versions of Wordpress and PHP.
All is working fine except for a scrolling news ticker he uses on his homepage that errors out with "Illegal string offset 'date'", and no news is shown.
This is the script:
<?php
$xmlOption = get_option('xmlFeed');
if (!isset($xmlOption)) {
$buildURL = "https://wordpress.org/news/feed/";
$request = curl_init();
curl_setopt($request, CURLOPT_URL, $buildURL);
curl_setopt($request, CURLOPT_HEADER, false);
curl_setopt($request, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($request);
curl_close($request);
$xml = new SimpleXMLElement($result);
$channel = $xml->channel;
delete_option('xmlFeed');
$otion = array(
'xml' => $channel,
'date' => date('y-m-d')
);
add_option('xmlFeed', $option);
}
if ($xmlOption['date'] == date('y-m-d')) {
$channel = $xmlOption['xml'];
} else {
$buildURL = "https://wordpress.org/news/feed/";
$request = curl_init();
curl_setopt($request, CURLOPT_URL, $buildURL);
curl_setopt($request, CURLOPT_HEADER, false);
curl_setopt($request, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($request);
curl_close($request);
$xml = new SimpleXMLElement($result);
$channel = $xml->channel;
delete_option('xmlFeed');
$otion = array(
'xml' => $channel,
'date' => date('y-m-d')
);
add_option('xmlFeed', $option);
}
$i = 0;
while ($i <= 5) {
echo "<li><a href='" . $channel->item->$i->link . "' target='_blank'>" . $channel->item->$i->title . "</a></li>";
$i++;
}
?>
I noticed the use of $otion two times, which i thought was maybe a typo. But when i changed that to $option the rest of the page was not parsed, so I guess that isn't the problem.
As I am not a coder and i pulled my hairs out for 2 nights now.
Time to get some help before i have none left.
Anyone can help me with this one?
It is not a real answer to my question, but I found another script that, with some small changes, works perfectly. So I'm happy.
<?php
$rss = new DOMDocument();
$rss->load('http://wordpress.org/news/feed/');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
$item = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
);
array_push($feed, $item);
}
$limit = 5;
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
echo '<li>'.$title.'</li>';
}
?>
It is smaller and cleaner.
Thanks for your help #Marcus
i am using DOMDocument() to include RSS feed in my code. However i get this error:
URL file-access is disabled in the server configuration
and thats because my server doesnt allow me either to modify the php.ini file or to set allow_url_fopen to ON.
Is there a workaround for this? This is my full code:
<?php
$rss = new DOMDocument();
$rss->load('rss.php');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
$item = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
);
array_push($feed, $item);
}
$limit = 5;
echo '<table>';
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
echo <<<EOF
<tr>
<td><b>$title</b></td>
</tr>
EOF;
}
echo '</table>';
?>
Thank you.
Okay, i solved it myself.
<?php
$k = 'rss.php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $k);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$rss = curl_exec($ch);
curl_close($ch);
$xml = simplexml_load_string($rss, 'SimpleXMLElement', LIBXML_NOCDATA);
$feed = array();
foreach($xml->channel->item as $item){
$item = array (
'title' => $item->title,
'desc' => $item->description,
'link' => $item->link,
'date' => $item->pubDate,
);
array_push($feed, $item);
}
$limit = 5;
echo '<table>';
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
echo <<<EOF
<tr>
<td><b>$title</b></td>
</tr>
EOF;
}
echo '</table>';
?>
Use cURL commands. You really should be using this for server to server interactions rather than trying to pass URL's to constructors anyways.
Here is cURL documentation - http://us1.php.net/curl
I also have a simple cURL based REST client you can feel free to use - https://github.com/mikecbrant/php-rest-client
Basically, all you are looking to do is use cURL to retrieve the remote content instead of trying to open it directly using fopen wrapper. Once you retrieve the content then you pass it in to DOMDocument.
I have this line in my code:
$item=$page->items;
$page is an array, and its content is as it should be but $item is Null. do you have any idea what could be the reason?
UPDATE:
<?php
search( "Steve Jobs" );
// Submit query to Google Web Search
function search( $query )
{
$url = "https://www.googleapis.com/customsearch/v1?key=YOUR_KEY&cx=YOUR_SEARCHE_NGINE_ID&q=" . urlencode( $query ) . "&callback=handleResponse&prettyprint=true&num=10";
$result = get_web_page( $url );
// Exception handling
if ( $result['http_code'] == 403 )
echo "... error: daily limit exceeded ...";
if ( $result['errno'] != 0 )
echo "... error: bad url, timeout, redirect loop ...";
if ( $result['http_code'] != 200 )
echo "... error: no page, no permissions, no service ...";
// Get and parse JSON output
$page = $result['content'];
$page = str_replace("// API callback\nhandleResponse(", "", $page);
$page = str_replace(");", "", $page);
$page=json_decode($page, true);
// Print results
$items=$page->items;
var_dump($items);
for ($i = 0; $i < sizeof($items); $i++)
{
$item = $items[$i];
print("<font size=\"3\">" . "" . $item->htmlTitle . "</font><br>");
print("<font size=\"2\" color=\"black\">" . $item->htmlSnippet . "</font><br>");
print("<font size=\"2\" color=\"green\">" . $item->displayLink . "</font><br>"); print("<br>");
}
}
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
// curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
?>
ANSWER:
this part should be changed since $page is an array.
$items = $page['items'];
foreach ($items as $item)
{
$item = (object) $item;
print("<font size=\"3\">" . "" . $item->htmlTitle . "</font><br>");
print("<font size=\"2\" color=\"black\">" . $item->htmlSnippet . "</font><br>");
print("<font size=\"2\" color=\"green\">" . $item->displayLink . "</font><br>"); print("<br>");
}
Simple question, simple answer:
$item is NULL because $page->items is NULL.
More details:
You're treating $page as an object, but you write it's an array.
Suggestion:
Treat $page as an array instead.
Further information:
The PHP manual has information how to treat a value like an array or like an object. Additionally if you enable warnings on your development box, you will be notified about undefined properties which are always NULL.
Update:
For your concrete code, with a slight modification this should work, just change your for loop a bit:
foreach ($items as $item)
{
$item = (object) $item;
print("<font size=\"3\">" . "" . $item->htmlTitle . "</font><br>");
print("<font size=\"2\" color=\"black\">" . $item->htmlSnippet . "</font><br>");
print("<font size=\"2\" color=\"green\">" . $item->displayLink . "</font><br>"); print("<br>");
}
$page is an array, and its content is as it should be
If $page is an array, you should access it like
$item=$page['items'];
If you provide more info you would get more helpful answer.
EDIT
First, Please make sure you have $page not empty at all.
What you have is $page=json_decode($page, true); that return an associative array and you should use the way I posted above.
If you don't want the array to be returned you should change your json_decode to return object by not passing second argument as true.
$page=json_decode($page); then you are okay with the code you have right now.
If $page is array you do this $page['items']. If it is JSON or XML you can also access ait the way you are doing. like $page->items;