Can't get data-title, just data-slug - php

HTML
<article class="movie-summary" data-slug="slug-goes-here" data-title="This is a Title">
...
...
</article>
PHP
$html = file_get_html( 'example.com' );
foreach( $html->find('article') as $data) {
$property = 'data-title';
echo $data->$property;
}
Hey all, so I want to be able to get all data-title from all articles off a particular site. When I use data-slug I get data back yet when I use data-title I get nothing, with the help of this post

If you look at the actual HTML code you are trying to parse (the link provided at comments), you see that it is not valid:
<article class="movie-summary hero" data-slug="aiyaary-hindi"data-title="Aiyaary">
...
</article>
Meaning, there is no space between data-slug and data-title attributes. So to fix this I suggest to add necessary spaces. Like so:
function placeNeccessarySpaces($contents) {
return preg_replace('/"data-title/', '" data-title', $contents);
}
This is similar to this answer. Then:
$contents = placeNeccessarySpaces(file_get_contents('http://example.com'));
$html = str_get_html($contents);
foreach( $html->find('article') as $data) {
$property = 'data-title';
echo $data->$property;
}

This is simply working fine, Verified result
<?php
include 'simple_html_dom.php';
$html = str_get_html('<article class="movie-summary" data-slug="slug-goes-here" data-title="This is a Title"></article>');
foreach( $html->find('article') as $data) {
$property = 'data-title';
echo $data->$property;
}
?>
got the file 'simple_html_dom.php' from https://sourceforge.net/projects/simplehtmldom/files/
output:

Related

Fetch content of all div with same class using PHP Simple HTML DOM Parser

I am new to HTML DOM parsing with PHP, there is one page which is having different content in its but having same 'class', when I am trying to fetch content I am able to get content of last div, Is it possible that somehow I could get all the content of divs having same class request you to please have a look over my code:
<?php
include(__DIR__."/simple_html_dom.php");
$html = file_get_html('http://campaignstudio.in/');
echo $x = $html->find('h2[class="section-heading"]',1)->outertext;
?>
In your example code, you have
echo $x = $html->find('h2[class="section-heading"]',1)->outertext;
as you are calling find() with a second parameter of 1, this will only return the 1 element. If instead you find all of them - you can do whatever you need with them...
$list = $html->find('h2[class="section-heading"]');
foreach ( $list as $item ) {
echo $item->outertext . PHP_EOL;
}
The full code I've just tested is...
include(__DIR__."/simple_html_dom.php");
$html = file_get_html('http://campaignstudio.in/');
$list = $html->find('h2[class="section-heading"]');
foreach ( $list as $item ) {
echo $item->outertext . PHP_EOL;
}
which gives the output...
<h2 class="section-heading text-white">We've got what you need!</h2>
<h2 class="section-heading">At Your Service</h2>
<h2 class="section-heading">Let's Get In Touch!</h2>

Parsing HTML Table Data from XML with PHP

I am somewhat new with PHP, but can't really wrap my head around what I am doing wrong here given my situation.
Problem: I am trying to get the href of a certain HTML element within a string of characters inside an XML object/element via Reddit (if you visit this page, it would be the actual link of the video - not the reddit link but the external youtube link or whatever - nothing else).
Here is my code so far (code updated):
Update: Loop-mania! Got all of the hrefs, but am now trying to store them inside a global array to access a random one outside of this function.
function getXMLFeed() {
echo "<h2>Reddit Items</h2><hr><br><br>";
//$feedURL = file_get_contents('https://www.reddit.com/r/videos/.xml?limit=200');
$feedURL = 'https://www.reddit.com/r/videos/.xml?limit=200';
$xml = simplexml_load_file($feedURL);
//define each xml entry from reddit as an item
foreach ($xml -> entry as $item ) {
foreach ($item -> content as $content) {
$newContent = (string)$content;
$html = str_get_html($newContent);
foreach($html->find('table') as $table) {
$links = $table->find('span', '0');
//echo $links;
foreach($links->find('a') as $link) {
echo $link->href;
}
}
}
}
}
XML Code:
http://pasted.co/0bcf49e8
I've also included JSON if it can be done this way; I just preferred XML:
http://pasted.co/f02180db
That is pretty much all of the code. Though, here is another piece I tried to use with DOMDocument (scrapped it).
foreach ($item -> content as $content) {
$dom = new DOMDocument();
$dom -> loadHTML($content);
$xpath = new DOMXPath($dom);
$classname = "/html/body/table[1]/tbody/tr/td[2]/span[1]/a";
foreach ($dom->getElementsByTagName('table') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
//$originalURL = $node->getAttribute('href');
}
//$html = $dom->saveHTML();
}
I can parse the table fine, but when it comes to getting certain element's values (nothing has an ID or class), I can only seem to get ALL anchor tags or ALL table rows, etc.
Can anyone point me in the right direction? Let me know if there is anything else I can add here. Thanks!
Added HTML:
I am specifically trying to extract <span>[link]</span> from each table/item.
http://pastebin.com/QXa2i6qz
The following code can extract you all the youtube links from each content.
function extract_youtube_link($xml) {
$entries = $xml['entry'];
$videos = [];
foreach($entries as $entry) {
$content = html_entity_decode($entry['content']);
preg_match_all('/<span><a href="(.*)">\[link\]/', $content, $matches);
if(!empty($matches[1][0])) {
$videos[] = array(
'entry_title' => $entry['title'],
'author' => preg_replace('/\/(.*)\//', '', $entry['author']['name']),
'author_reddit_url' => $entry['author']['uri'],
'video_url' => $matches[1][0]
);
}
}
return $videos;
}
$xml = simplexml_load_file('reddit.xml');
$xml = json_decode(json_encode($xml), true);
$videos = extract_youtube_link($xml);
foreach($videos as $video) {
echo "<p>Entry Title: {$video['entry_title']}</p>";
echo "<p>Author: {$video['author']}</p>";
echo "<p>Author URL: {$video['author_reddit_url']}</p>";
echo "<p>Video URL: {$video['video_url']}</p>";
echo "<br><br>";
}
The code outputs in the multidimensional format of array with the elements inside are entry_title, author, author_reddit_url and video_url. Hope it helps you!
If you're looking for a specific element you don't need to parse the whole thing. One way of doing it could be to use the DOMXPath class and query directly the xml. The documentation should guide you through.
http://php.net/manual/es/class.domxpath.php .

Import data from xml

I'm trying to get data from an XML file using simple_xml , so far I can get all the data except the images . How can I call a single image name ?
<?php
$ur="http://services2.jupix.co.uk/api/get_properties.php?clientID=35871cc1b6d9ec6237aaaf94aa0e0836&passphrase=cvYG9f";
$xml = simplexml_load_file($ur);
foreach ($xml->property as $property):
var_dump($property->images->image);
echo 'images->image">'; // this is not displaying
endforeach;?>
My code output as the image below . How can i display image number 1
public 1 => string 'http://media2.jupix.co.uk/v3/clients/657/properties/1356/IMG_1356_9_large.jpg' (length=77)
I think SimpleXMLElement::xpath can do what you are looking for:
You can give this a try:
<?php
$ur="http://services2.jupix.co.uk/api/get_properties.php?clientID=35871cc1b6d9ec6237aaaf94aa0e0836&passphrase=cvYG9f";
$xml = simplexml_load_file($ur);
$image = $xml->xpath('//property/images/image[#modified="2014-07-23 14:22:05"]')[1]->__toString();
var_dump($image);
Or you can loop through all the images and check for the name the you are looking for:
$images = $xml->xpath('//property/images/image');
foreach ($images as $image) {
$url = $image->__toString();
if (false !== strpos($url, "_9_large.jpg")) {
var_dump($url);
}
}
If you want to get the second image of each /images section, then you could do it like this:
$images = $xml->xpath('//property/images');
foreach ($images as $image) {
if (isset($image->children()[1])) {
var_dump($image->children()[1]->__toString());
}
}
Thanks Guy I found solution to my problem .
Looking at the back at the question it seems I did not put it in a proper way
All i wanted is to display images within that section. Xpath was not necessary But i have learned from it . Here is my solution if you can improve it you are much welcome.
$url ="http://services2.jupix.co.uk/api/get_properties.php?clientID=35871cc1b6d9ec6237aaaf94aa0e0836&passphrase=cvYG9f";
$xml = simplexml_load_file($url);
foreach ($xml->property as $property):
?>
<li>
<h3> <?php echo $property->addressStreet;?> </h3>
<?php
$imgCount = count($property->images->image);
for ($i=0; $i < $imgCount; $i++) { ?>
<img src="<?php echo $property->images->image[$i];?>">
<?php } ?>
<p><?php echo limit_text($property->fullDescription,30);?></p>
<h4>£ <?php echo $property->price;?> </h4>
</li>
<?php endforeach; ?>

how to get href from within element using php and simple html dom

I have an html page that looks a bit like this
xxxx
google!
<div class="big-div">
<a href="http://www.url.com/123" title="123">
<div class="little-div">xxx</div></a>
<a href="http://www.url.com/456" title="456">
<div class="little-div">xxx</div></a>
</div>
xxxx
I am trying to pull of the href's out of the big-div. I can get all the href's out of the whole page by using code like below.
$links = $html->find ('a');
foreach ($links as $link)
{
echo $link->href.'<br>';
}
But how do I get only the href's within the div "big-div".
Edit:
I think I got it. For those that care:
foreach ($html->find('div[class=big-div]') as $element) {
$links = $element->find('a');
foreach ($links as $link) {
echo $link->href.'<br>';
}
}
The documentation is useful:
$html->find(".big-div")->find('a');
And then proceed to get the href and whatever other attributes you are interested in.
Edit: The above would be the general idea. I've never used Simple HTML DOM, so perhaps you need to tweak the syntax somewhat. Try:
foreach($html->find('.big-div') as $bigDiv) {
$link = $bigDiv->find('a');
echo $link->href . '<br>';
}
or perhaps:
$bigDivs = $html->find('.big-div');
foreach($bigDivs as $div) {
$link = $div->find('a');
echo $link->href . '<br>';
}
Quick flip - put this in your foreach
$image = $html->find('.big-div')->href;

php simple html dom parser doesn't return anything

Why won't my script return the div with the id of "pp-featured"?
<?php
# create and load the HTML
include('lib/simple_html_dom.php');
$html = new simple_html_dom();
$html->load("http://maps.google.com/maps/place?cid=6703996311168776503&q=hills+garage&hl=en&view=feature&mcsrc=google_reviews&num=20&start=0&ved=0CFUQtQU&sa=X&ei=sCq_Tr3mJZToygTOmuCGCg");
$ret = $html->find('div[id=pp-featured]');
# output it!
echo $ret->save();
?>
this gets me on my way. Thanks for your help.
<?php
include_once 'lib/simple_html_dom.php';
$url = "http://maps.google.com/maps/place?cid=6703996311168776503&q=hills+garage&hl=en&view=feature&mcsrc=google_reviews&num=20&start=0&ved=0CFUQtQU&sa=X&ei=sCq_Tr3mJZToygTOmuCGCg";
$html = file_get_html($url);
$ret = $html->find('div[id=pp-reviews]');
foreach($ret as $story)
echo $story;
?>
The library always returns an array because it may be possible that more than one item matches the selector.
If you expect only one you should check to ensure the page your analyzing is behaving as expected.
Suggested solution:
<?php
include_once 'lib/simple_html_dom.php';
$url = "http://maps.google.com/maps/place?cid=6703996311168776503&q=hills+garage&hl=en&view=feature&mcsrc=google_reviews&num=20&start=0&ved=0CFUQtQU&sa=X&ei=sCq_Tr3mJZToygTOmuCGCg";
$html = file_get_html($url);
$ret = $html->find('div[id=pp-reviews]');
if(count($ret)==1){
echo $ret[0]->save();
}
else{
echo "Something went wrong";
}

Categories