Extract content of meta element in php? [closed] - php

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am totally new to PHP development and I would like to extract the contents of a meta tag.
I have this code that allows me to extract the contents of the element # squad.
// Pull in PHP Simple HTML DOM Parser
include("simplehtmldom/simple_html_dom.php");
// Settings on top
$sitesToCheck = array(
// id is the page ID for selector
array("url" => "http://www.arsenal.com/first-team/players", "selector" => "#squad"),
array("url" => "http://www.liverpoolfc.tv/news", "selector" => "ul[style='height:400px;']")
);
$savePath = "cachedPages/";
$emailContent = "";
// For every page to check...
foreach($sitesToCheck as $site) {
$url = $site["url"];
// Calculate the cachedPage name, set oldContent = "";
$fileName = md5($url);
$oldContent = "";
// Get the URL's current page content
$html = file_get_html($url);
// Find content by querying with a selector, just like a selector engine!
foreach($html->find($site["selector"]) as $element) {
$currentContent = $element->plaintext;;
}
// If a cached file exists
if(file_exists($savePath.$fileName)) {
// Retrieve the old content
$oldContent = file_get_contents($savePath.$fileName);
}
// If different, notify!
if($oldContent && $currentContent != $oldContent) {
// Build simple email content
$emailContent = "Hey, the following page has changed!\n\n".$url."\n\n";
}
// Save new content
file_put_contents($savePath.$fileName,$currentContent);
}
// Send the email if there's content!
if($emailContent) {
// Sendmail!
mail("me#myself.name","Sites Have Changed!",$emailContent,"From: alerts#myself.name","\r\n");
// Debug
echo $emailContent;
}
But I want to change this code to get the number of comments in income.
Here is the meta tag where i would just extract the number of comments :
<meta item="desc" content="Comments:645">
Am I clear enough, do you understand me?
If I am not explicit enough, ask me?
Thanks for help

There's two ways to do this. You could either use the native PHP function: get_meta_tags() like so:
$tags = get_meta_tags('http://yoursite.com');
$comments = $tags['desc'];
Or you could use RegEx, but the above would be much more practical.

What you are looking for might be screen scraping.
This is the process where a programming-language like php, python or ruby loads a website in memory and uses various selectors to grab content from it.
Screen scraping is mostly used on websites that feature a lot of interesting data but have no json or xml API's
having googled around for it I stumbled on this post:
PHP equivalent of PyQuery or Nokogiri?
This article explains more about screen-scraping for web:
http://en.wikipedia.org/wiki/Web_scraping

Look for use domDocument
$dom = new domDocument;
$dom->loadHTML($htmlPage);
$metas = $dom->documentElement->getElementsByTagName('meta');
$ar = array();
foreach ($metas as $meta) {
$name = $meta->getAttribute('name');
$value = $meta->getAttribute('content');
$ar[$name] = $value;
}
print_r($ar); // print array meta-values

Related

Replace meta tag instead selector in array for parse in php?

i'm newbies in the PHP development and i would like to parse the contents of a meta tags but i don't know how to proceed.
I've this code that allows me to parse the contents of these elements #squad, ul[style='height:400px;'] selector.
// Pull in PHP Simple HTML DOM Parser
include("simplehtmldom/simple_html_dom.php");
// Settings on top
$sitesToCheck = array(
// id is the page ID for selector
array("url" => "http://www.arsenal.com/first-team/players", "selector" => "#squad"),
array("url" => "http://www.liverpoolfc.tv/news", "selector" => "ul[style='height:400px;']")
);
$savePath = "cachedPages/";
$emailContent = "";
// For every page to check...
foreach($sitesToCheck as $site) {
$url = $site["url"];
// Calculate the cachedPage name, set oldContent = "";
$fileName = md5($url);
$oldContent = "";
// Get the URL's current page content
$html = file_get_html($url);
// Find content by querying with a selector, just like a selector engine!
foreach($html->find($site["selector"]) as $element) {
$currentContent = $element->plaintext;;
}
// If a cached file exists
if(file_exists($savePath.$fileName)) {
// Retrieve the old content
$oldContent = file_get_contents($savePath.$fileName);
}
// If different, notify!
if($oldContent && $currentContent != $oldContent) {
// Build simple email content
$emailContent = "Hey, the following page has changed!\n\n".$url."\n\n";
}
// Save new content
file_put_contents($savePath.$fileName,$currentContent);
}
// Send the email if there's content!
if($emailContent) {
// Sendmail!
mail("me#myself.name","Sites Have Changed!",$emailContent,"From: alerts#myself.name","\r\n");
// Debug
echo $emailContent;
}
The HTML DOM Parser is available here :
http://sourceforge.net/projects/simplehtmldom/files/
But, my problem is i want to change this code to get the number of comments and the rates contents in the meta tag. And, don't forget, i'm FULLY newbies !!!
Here is the meta tag where i would just extract the number of comments :
<meta item="desc" content="Comments:645">
<meta item="rates" content="Rates:112">
For several days I was tearing his hair to do that and I do not succeed!
Help me please!
Very important precision, meta tags that i want to extract is not in the header of the document, but in the body!
I'm pretty explanatory? If you need more information, ask me (in the limits of my small skills of newbie ;-))

How to get Magento Store Details in a PHP Array by Store ID? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to create a function which takes store id and return a PHP array which have the store details like store name, store code, logo, banner, name etc.
you can get a store details like this:
$store = Mage::getModel('core/store')->load($storeId);
$code = $store->getCode();
$name = $store->getName();
You can do this to see what data you can get from the store object
var_dump($store->getData())
The logo and other settings you need to get from the config section.
$logo = Mage::getStoreConfig('design/header/logo_src', $soreId);
This way you can get all the information from the config. You just need the correct path. For this you can see the name of the input field from system->configuration and section name and build the path.
Let's analyse the logo. You can find it in the Design tab and the url looks like this: 'admin/system_config/edit/section/design'. So the first part of the path is the section name design.
The field name is groups[header][fields][logo_src][value]. Just remove groups, [fields] and [value] and you get the rest of the path header/logo_src.
Try this it will work...
public function get_storedetails($store) {
$res = array();
try {
$res["store"] = Mage::app()->getStore($store);
Mage::app()->setCurrentStore($store);
$res["storeid"] = Mage::app()->getStore($store)->getStoreId();
$res["storecode"] = Mage::app()->getStore($store)->getCode();
$res["storewebid"] = Mage::app()->getStore($store)->getWebsiteId();
$res["storename"] = Mage::app()->getStore($store)->getName();
$res["storeactive"] = Mage::app()->getStore($store)->getIsActive();
$res["rooturl"] = Mage::getBaseUrl(Mage_Core_Model_Store::URL_TYPE_WEB);
$res["storeurl"] = Mage::helper('core/url')->getHomeUrl();
$res["storelogo_alt"] = Mage::getStoreConfig('design/header/logo_alt');
$res["storefrontname"] = Mage::app()->getStore($store)->getFrontendName(); //getLogoSrc()
$res["current_url"] = Mage::helper('core/url')->getCurrentUrl();
$res["media_url1"] = Mage::getBaseUrl(Mage_Core_Model_Store::URL_TYPE_LINK);
$res["media_url2"] = Mage::getBaseUrl(Mage_Core_Model_Store::URL_TYPE_MEDIA);
$res["skin_url"] = Mage::getBaseUrl(Mage_Core_Model_Store::URL_TYPE_SKIN);
$res["js_url"] = Mage::getBaseUrl(Mage_Core_Model_Store::URL_TYPE_JS);
$res["storelogo"] = Mage::getBaseUrl(Mage_Core_Model_Store::URL_TYPE_SKIN).'frontend/default/default/'.Mage::getStoreConfig('design/header/logo_src');
$res["storeadminname"] = Mage::getStoreConfig('trans_email/ident_sales/name');
$res["storeemail"] = Mage::getStoreConfig('trans_email/ident_sales/email');
}
catch(Exception $ex) {
echo $ex;
}
return $res;
}

Get ul li a string values and store them in a variable or array php [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Im trying to store the string value's of a list item on my website into a variable/array in PHP to do some conditional checks/statements with them. Am having a bit off difficulty getting the list item's string value using PHP, can anybody help?
This is the markup.
<div class="coursesListed">
<ul>
<li><h3>Item one</h3></li>
<li><h3>item two</h3></li>
<li><h3>Item three</h3></li>
</ul>
</div>
What i want ideally is either a variable or array that holds the values "Item one", "Item two", "Item three".
Try this
$html = '<div class="coursesListed">
<ul>
<li><h3>Item one</h3></li>
<li><h3>item two</h3></li>
<li><h3>Item three</h3></li>
</ul>
</div>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$liList = $doc->getElementsByTagName('li');
$liValues = array();
foreach ($liList as $li) {
$liValues[] = $li->nodeValue;
}
var_dump($liValues);
You will need to parse the HTML code get the text out. DOM parser can be used for this purpose.
$DOM = new DOMDocument;
$DOM->loadHTML($str); // $str is your HTML code as a string
//get all H3
$items = $DOM->getElementsByTagName('h3');
It might be easier to parse it in Javascript (perhaps using jQuery), and then send it to your PHP with some AJAX.
// Javascript/jQuery
var array = [];
$("h3").each(function() {
array.push($(this).html());
});
var message = JSON.stringify(array);
$.post('test.php', {data: message}, function(data) {
document.write(data); // "success"
}
Then in PHP:
<?php
$data = $_POST['data'];
// convert json into array
$array = json_decode($data);
// do stuff with your data
// then send back whatever you need
echo "success";
?>

Edit child node XML with DOM in PHP [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm trying to implement functionality to edit a XML-based news feed from a PHP-powered web app. However, it doesn't seem to ever save.
The XML file I'm working with is as such:
<?xml version="1.0" standalone="yes"?>
<issues>
<issue>
<issue_id>1</issue_id>
<issue_name>Don't double my rates!</issue_name>
<issue_body>Congress is on the verge of letting student rates double a week from today. Swing by the UC Lawn at 5:00 this Thursday to reach out to our Representatives and tell them: #DontDoubleMyRates!</issue_body></issue>
<issue>
<issue_id>2</issue_id>
<issue_name>Proposed Senate Budget</issue_name>
<issue_body>College Democrats are baffled by the proposed senate budget. This is our state, we must make our opinions heard! #NCGOPBudget #StopCuts</issue_body></issue>
<issue>
<issue_id>3</issue_id>
<issue_name>Voter Suppression Law Invalidated!</issue_name>
<issue_body>Join us in applauding the US Supreme Court for invalidating Arizona's voter-suppression law requiring that voters present proof of citizenship before voting!</issue_body></issue>
<issue>
<issue_id>4</issue_id>
<issue_name>Here's an actual article I found interesting</issue_name>
<issue_body>Actually, not really beacause I really didn't want to google for some arbitrary article to help test this out so here's a bunch of filler text to hopefully emulate at least the by-line of an article pertaining to the democratic party organization here on campus.</issue_body>
</issue>
</issues>
Here is the relevant php script that tries to edit the pre-existing node:
<?php
$newName = $_POST['name'];
$newBody = $_POST['body'];
$issue_id = $_POST['edit'];
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->load('issues.xml');
$xpath = new DOMXPath($dom);
$query = '/issues/issue';
foreach($xpath->query($query) as $issue) {
$id = $issue->parentNode->getElementsByTagName("issue_id");
if($id->item($issue_id)->nodeValue = $issue_id) {
$name = $issue->parentNode->getElementsByTagName("issue_name");
$body = $issue->parentNode->getElementsByTagName("issue_body");
$name->item($issue_id-1)->nodeValue = '$newName';
$body->item($issue_id-1)->nodeValue = '$newBody';
break;
}
}
$dom->save("issues.xml");
?>
Here is the referring page which iterates through the child nodes until it finds the previously selected node's ID and then displays its info in a table.
<?php
$issue_id = $_POST['edit'];
$issueArray = array(
'id' =>$_POST['id'],
'issue_name' => $_POST['issue_name'],
'issue_body' => $_POST['issue_body'],
);
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->load('issues.xml');
$xpath = new DOMXPath($dom);
$query = '/issues/issue';
$i = 0;
echo "<body><form action='saveChanges.php' method='post'><table border='1'><tr><th>ID</th><th>Name</th><th>Body</th></tr>";
foreach($xpath->query($query) as $issue) {
$eventI = $issue->parentNode->getElementsByTagName("issue_id");
if($eventI->item($issue_id)->nodeValue = $issue_id) {
$eventN = $issue->parentNode->getElementsByTagName("issue_name");
$eventP = $issue->parentNode->getElementsByTagName("issue_body");
print "<tr><td>'".$eventI->item($issue_id-1)->nodeValue."'></td><td>'".$eventN->item($issue_id-1)->nodeValue."'></td><td>'".$eventP->item($issue_id-1)->nodeValue."'</td></tr>";
print "<tr><td></td><th>New Name</td><th>New Body</td></tr>";
print "<tr><td></td><td><input type='text' name='name'size='50'</input></td><td><input type='text' name='body' size='200'</input></td></tr>";
print "<tr><td><input type='hidden' name='id' value='$issue_id'/></td><th><input type='submit' action='saveChanges.php' name='edit' method='post' value='Confirm Edit'/></th><th></th>";
break;
}
}
print "</table></body>";
?>
I'm not that great at PHP, and even worse at parsing XML, any help to get this going in the right direction would be great!
There are all sorts of problems in the code that is manipulating the DOM. Just looking at the contents of the for loop, you start with this:
$id = $issue->parentNode->getElementsByTagName("issue_id");
In the line above, you have taken the $issue that you enumerated in the for loop and then referenced its parent, which is the same for every issue, thus making the enumeration irrelevant.
You're then getting all issue_id elements in that tree, with which you do this:
if($id->item($issue_id)->nodeValue = $issue_id) {
Here you are using the $issue_id as an index, which assumes that an issue_id of 3 (for example) would always be the third issue, which probably isn't true.
Also a single = is an assignment, not a comparison, which I'm sure was not your intention.
The $name and $body lookups are the same:
$name = $issue->parentNode->getElementsByTagName("issue_name");
$body = $issue->parentNode->getElementsByTagName("issue_body");
Again you're ignoring the $issue that has been enumerated and are working from the parent node, and then just getting all the child elements that match issue_name and issue_body.
And again you're using the $issue_id as an index:
$name->item($issue_id-1)->nodeValue = '$newName';
$body->item($issue_id-1)->nodeValue = '$newBody';
This time, though, you're using $issue_id-1 - was there a reason for that?
Also when you use single quotes for a string in php, that doesn't expand the variable, so the name will always be set to the literal string $newName rather than value of that variable. You should either use double quotes, or better still, just assign the value directly.
This is more like what I would expect the code to look like:
foreach($xpath->query($query) as $issue) {
$id = $issue->getElementsByTagName("issue_id")->item(0);
if($id->nodeValue == $issue_id) {
$name = $issue->getElementsByTagName("issue_name")->item(0);
$body = $issue->getElementsByTagName("issue_body")->item(0);
$name->nodeValue = $newName;
$body->nodeValue = $newBody;
break;
}
}
The rest of your code has more of the same problems, but hopefully that will point you in the right direction.

Best way to parse RSS/Atom feeds with PHP [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm currently using Magpie RSS but it sometimes falls over when the RSS or Atom feed isn't well formed. Are there any other options for parsing RSS and Atom feeds with PHP?
I've always used the SimpleXML functions built in to PHP to parse XML documents. It's one of the few generic parsers out there that has an intuitive structure to it, which makes it extremely easy to build a meaningful class for something specific like an RSS feed. Additionally, it will detect XML warnings and errors, and upon finding any you could simply run the source through something like HTML Tidy (as ceejayoz mentioned) to clean it up and attempt it again.
Consider this very rough, simple class using SimpleXML:
class BlogPost
{
var $date;
var $ts;
var $link;
var $title;
var $text;
}
class BlogFeed
{
var $posts = array();
function __construct($file_or_url)
{
$file_or_url = $this->resolveFile($file_or_url);
if (!($x = simplexml_load_file($file_or_url)))
return;
foreach ($x->channel->item as $item)
{
$post = new BlogPost();
$post->date = (string) $item->pubDate;
$post->ts = strtotime($item->pubDate);
$post->link = (string) $item->link;
$post->title = (string) $item->title;
$post->text = (string) $item->description;
// Create summary as a shortened body and remove images,
// extraneous line breaks, etc.
$post->summary = $this->summarizeText($post->text);
$this->posts[] = $post;
}
}
private function resolveFile($file_or_url) {
if (!preg_match('|^https?:|', $file_or_url))
$feed_uri = $_SERVER['DOCUMENT_ROOT'] .'/shared/xml/'. $file_or_url;
else
$feed_uri = $file_or_url;
return $feed_uri;
}
private function summarizeText($summary) {
$summary = strip_tags($summary);
// Truncate summary line to 100 characters
$max_len = 100;
if (strlen($summary) > $max_len)
$summary = substr($summary, 0, $max_len) . '...';
return $summary;
}
}
With 4 lines, I import a rss to an array.
$feed = implode(file('http://yourdomains.com/feed.rss'));
$xml = simplexml_load_string($feed);
$json = json_encode($xml);
$array = json_decode($json,TRUE);
For a more complex solution
$feed = new DOMDocument();
$feed->load('file.rss');
$json = array();
$json['title'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$json['description'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
$json['link'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('link')->item(0)->firstChild->nodeValue;
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
$json['item'] = array();
$i = 0;
foreach($items as $key => $item) {
$title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$description = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
$pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
$guid = $item->getElementsByTagName('guid')->item(0)->firstChild->nodeValue;
$json['item'][$key]['title'] = $title;
$json['item'][$key]['description'] = $description;
$json['item'][$key]['pubdate'] = $pubDate;
$json['item'][$key]['guid'] = $guid;
}
echo json_encode($json);
Your other options include:
SimplePie
Last RSS
PHP Universal Feed Parser
I would like introduce simple script to parse RSS:
$i = 0; // counter
$url = "http://www.banki.ru/xml/news.rss"; // url to parse
$rss = simplexml_load_file($url); // XML parser
// RSS items loop
print '<h2><img style="vertical-align: middle;" src="'.$rss->channel->image->url.'" /> '.$rss->channel->title.'</h2>'; // channel title + img with src
foreach($rss->channel->item as $item) {
if ($i < 10) { // parse only 10 items
print ''.$item->title.'<br />';
}
$i++;
}
If feed isn't well-formed XML, you're supposed to reject it, no exceptions. You're entitled to call feed creator a bozo.
Otherwise you're paving way to mess that HTML ended up in.
The HTML Tidy library is able to fix some malformed XML files. Running your feeds through that before passing them on to the parser may help.
I use SimplePie to parse a Google Reader feed and it works pretty well and has a decent feature set.
Of course, I haven't tested it with non-well-formed RSS / Atom feeds so I don't know how it copes with those, I'm assuming Google's are fairly standards compliant! :)
Personally I use BNC Advanced Feed Parser- i like the template system that is very easy to use
The PHP RSS reader - http://www.scriptol.com/rss/rss-reader.php - is a complete but simple parser used by thousand of users...
Another great free parser - http://bncscripts.com/free-php-rss-parser/
It's very light ( only 3kb ) and simple to use!

Categories