How can I use Zend_Dom_Query to get meta data - php

Using Zend_Dom_Query, I would like to retrieve meta data out of an HTML string.
to retrieve links you can simply query like so:
$results = $dom->query('a'); //Where $dom is your html string
Unfortunately this doesn't seem to work with meta
$results = $dom->query('meta'); //doesn't work
How can I retrieve meta data and then filter by its 'property' attribute?
An example of what I'm looking for:
public function meta($dom)
{
$results = $dom->query('meta'); //This is not a correct query (does anyone have an alternative?)
$links = array();
foreach ($results as $index => $result) {
if ($result->getAttribute('property') == 'title') { //find <meta property="title"
echo $result->getAttribute('content') . '<br />'; //echo the content attribute of the title
}
}
return $results;
}
This code would work once the query is correct. However I would like to take it a step further and query directly for <meta property="title" content="This is the Title" /> instead of retrieving all meta and looping around to get the right one.
Any help with either getting all meta data using zend_dom_query or (more importantly) querying to receive only the meta data where property == title would be appreciated.
Thanks

the meta tag in not a valid CSS selector so you have to use the $dom->queryXpath($xPathQuery) method instead of $dom->query().
maybe something like:
$dom->queryXpath('/html/head');
I'm not sure of the exact query string to use, but this is the idea.
Zend_Dom_Query Theory of Operation.

if you have got url try this:
$metatagarray = get_meta_tags($url);
if (!empty($metatagarray["keywords"]))
$metakey = $metatagarray["keywords"];
if (!empty($metatagarray["description"]))
$metadesc = $metatagarray["description"];

Related

Parsing XML weather forecast data using PHP and simpleXML

I am trying parse the following XML weather forecast data using the PHP simpleXML extension:
http://www.pakenhamweather.info/test/IDV10753.xml
I have tried a number of times to parse this XML but to no avail. The following is an example:
$product = simplexml_load_file("http://www.pakenhamweather.info/test/IDV10753.xml");
foreach ($product->forecast->area[2]->forecast-period[0] as $blah) {
printf(
"<p>Forecast Icon: $s</p><p>Precipitation Range: $s</p>"
$blah->element["forecast_icon_code"]
$blah->element["precipitation_range"]
);
}
For the purposes of this example, I have simplified the example by selecting 'area[2]' and 'forecast-period[0]', but ideally, I would like to specify the 'area' by the 'description' attribute and 'forecast-period' based on the 'index' attribute.
Any help would be greatly appreciated. Thank you in advance.
You can try the following code to get area by attribute and forecast-period by attribute.
$product = simplexml_load_file("http://www.pakenhamweather.info/test/IDV10753.xml");
$forecasts = $product->forecast->xpath("//area[#description='Mildura']/forecast-period[#index='1']");
foreach ($forecasts as $forecast) {
$forecast_icon_code = $forecast->xpath("element[#type='forecast_icon_code']")[0];
$precipitation_range = $forecast->xpath("element[#type='precipitation_range']")[0];
echo $forecast_icon_code . "===" . $precipitation_range;
}
Just replace the attribute values which you want to pick

Get all HTML list element using Simple HTML Dom

Currently I am working on a project which requires me to parse some data from an alternative website, and I'm having some issues (note I am very new to PHP coding.)
Here's the code I am using below + the content it returns.
$dl = $html2->find('ol.tracklist',0);
print $dl = $dl->outertext;
The above code returns the data for what we're trying to get, it's below but extremely messy provided you would like to see click here.
However, when I put this in a foreach, it only returns one of the a href attributes at a time.
foreach($html2->find('ol.tracklist') as $li)
{
$title = $li->find('a',0);
print $title;
}
What can I do so that it returns all of the a href elements from the example code above?
NOTE: I am using simple_html_dom.php for this.
Based on the markup, just point directly to it, just get it list then point to its anchor:
foreach ($html2->find('ol.tracklist li') as $li) {
$anchor = $li->find('ul li a', 0);
echo $anchor->href; // and other attributes
}

how to fix this the blank space with simple html dom parser?

i have find a php snippet that i modify for my need ,it use simple html dom for parse data from a webpage
here is a part of my code
$html = new simple_html_dom();
$html->load_file($page);
$items = $html->find('ul[class=history_list]');
foreach($items as $post) {
$items = $post->find('li[class=sub_pop_headline]');
$title=$items->plaintext;
foreach($items as $post) {
# remember comments count as nodes
$items_data = strip_tags($post);
echo $query = "INSERT INTO `mysqltable` ( `entry_id`,`domain_id`, `keyword`, `data`) VALUES ('',1, '$items_data', 'a:1:{s:11:\"{%KEYWORD%}\";s:11:\"$items_data\";}')";
$query_submit = mysqli_query($conn,$query);
the data are fetched (it work) but they are inserted with a lot of blank space into the sql table
here is what should look like the entry
columns keyword & data
my fetched title | a:1:{s:11:"{%KEYWORD%}";s:9:"my fetched title";}
but my code give this as output....
my fetched title| a:1:{s:11:"{%KEYWORD%}";s:9:"my fetched title";}
as you can see there is a lot of space so this is not correct
thanks you very much for help me , im not really a coder...
Sounds like all you need to do is trim() the title:
$title = trim($items->plaintext);

Parsing a single XML section using a query string with PHP

I'm trying to retrieve a single section from an XML file using PHP.
Here's the code I'm using:
<?php
$articles = simplexml_load_file('articles.xml');
foreach ($articles as $articlecontent)
{
$title = $articlecontent->title;
$content = $articlecontent->content;
$author = $articlecontent->author;
$date = $articlecontent['date'];
echo "<h1>",$title,"</h1>\n",
"<p><i>",$date,"</i></p>\n",
"<p>",$content,"</p>\n",
"<p>",$author,"</p>\n",
"<p><time>",$date,"</time></p>\n"
;
}
?>
Which shows all the sections in the XML file, but how can I parse just one result but do so using a query string?
Example: articles.php?title=section-one will retrieve the XML section with the title "section-one"
You get the title from the query string by getting it from PHP's superglobal $_GET array, like this:
$title = false;
if (isset($_GET['title']))
{
$title = $_GET['title'];
}
and then the simplest way would be to compare it to the title from the XML file in your foreach loop.
Way to get title from url is using $_GET array, then to get required article by title i think you could use xpath which won't require foreach loop
$title = $_GET['title'];
// there should go some sanitizing
$articles = simplexml_load_file('articles.xml');
//search for article by title
//assuming that xml root tag is articles and article tag is article and searching by attribute named title
$foundArticles = $articles->xpath('articles/article[#title='.$title.']');
//query depends on xml structure
// now you have array of matched articles with matching title from url
foreach($foundArticles as $singleArticle)
{
//do printing here
}
This code is not tested, but principle should work.
To receive Query String Variable (in our case 'title') from URL you can use $_REQUEST also. Little bit corrected code (#user1597483) with some more advanced updates.
$articles = simplexml_load_file('articles.xml');
$title = "";
if(isset($_REQUEST['title'])) //receiving Query String from URL
$title = $_REQUEST['title'];
if(count($articles)):
//Fixed search means that will one return article if 'section-one' in title
//$result = $articles->xpath("//article[title='Section-one']");
//Match All Search means that will return all articles that contains 'section-one' in title
$result = $articles->xpath("//article[contains(translate(title, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),'".strtolower($title)."')]");
foreach ($result as $articleItem):
$title=$articleItem->title;
$content=$articleItem->content;
$author=$articleItem->author;
$date=$articleItem->date;
echo "<h1>",$title,"</h1>\n";
echo "<p><i>",$date,"</i></p>\n";
echo "<p>",$content,"</p>\n";
echo "<p>",$author,"</p>\n";
echo "<p><time>",$date,"</time></p>\n";
endforeach;
endif;
In the code above I have specified two types of filter through which you can get result with Specific Article by exact match or wild card type matching by contains function. And I have also used translate function which will convert value of title element to lowercase and also converted value of Query String Variable to lowercase. So that regardless of Upper/Lower case, condition will be checking with lowercase and if condition matched then resultant articles will be returned.
You can achieve that by inserting a condition into the foreach loop testing if the title-filter is in action and does not match. If so, skip that entry by using the continue keyword:
$titleFilter = trim(isset($_GET['title']) ? $_GET['title'] : '');
...
foreach ($articles as $articlecontent)
{
$title = $articlecontent->title;
if (strlen($titleFilter) and $title !== $titleFilter) {
continue;
}
...

Simple HTML DOM getting all attributes from a tag

Sort of a two part question but maybe one answers the other. I'm trying to get a piece of information out of an
<div id="foo">
<div class="bar"><a data1="xxxx" data2="xxxx" href="http://foo.bar">Inner text"</a>
<div class="bar2"><a data3="xxxx" data4="xxxx" href="http://foo.bar">more text"</a>
Here is what I'm using now.
$articles = array();
$html=file_get_html('http://foo.bar');
foreach($html->find('div[class=bar] a') as $a){
$articles[] = array($a->href,$a->innertext);
}
This works perfectly to grab the href and the inner text from the first div class. I tried adding a $a->data1 to the foreach but that didn't work.
How do I grab those inner data tags at the same time I grab the href and innertext.
Also is there a good way to get both classes with one statement? I assume I could build the find off of the id and grab all the div information.
Thanks
To grab all those attributes, you should before investigate the parsed element, like this:
foreach($html->find('div[class=bar] a') as $a){
var_dump($a->attr);
}
...and see if those attributes exist. They don't seem to be valid HTML, so maybe the parser discards them.
If they exist, you can read them like this:
foreach($html->find('div[class=bar] a') as $a){
$article = array($a->href, $a->innertext);
if (isset($a->attr['data1'])) {
$article['data1'] = $a->attr['data1'];
}
if (isset($a->attr['data2'])) {
$article['data2'] = $a->attr['data2'];
}
//...
$articles[] = $article;
}
To get both classes you can use a multiple selector, separated by a comma:
foreach($html->find('div[class=bar] a, div[class=bar2] a') as $a){
...
I know this question is old, but the OP asked how they could get all the attributes in one statement. I just did this for a project I'm working on.
You can get all the attributes for an element with the getAllAttributes() method. The results are automatically stored in an array property called attr.
In the example below I am grabbing all links but you can use this with whatever you want. NOTE: This also works with data- attributes. So if there is an attribute called data-url it will be accessible with $e->attr['data-url'] after you run the getAllAttributes method.
In your case the attributes your looking for will be $e->attr['data1'] and $e->attr['data2']. Hope this helps someone if not the OP.
Get all Attributes
$html = file_get_html('somefile.html');
foreach ($html->find('a') as $e) { //used a tag here, but use whatever you want
$e->getAllAttributes();
//testing that it worked
print_r($e->attr);
}
Check this code
<?php
$html = file_get_html('somefile.html');
foreach ($html->find('a') as $e) {
$filter = $e->getAttribute('data-filter-string');
}
?>
$data1 = $html->find('.bar > a', 0)->attr['data1'];
$data2 = $html->find('.bar > a', 0)->attr['data2'];

Categories