So lets say I have:
<?php
$template = '<img src="{image}" editable="all image_all" />';
$template .= '<div>';
$template .= '<img src="{image}" editable="yes" />';
$template .= '</div>';
?>
Now what I would like is to make the script go through all the elements containing the {image} src and checking to see if any of them have the
editable="all"
attribute.
If so: get the second editable attribute e.g.
image_all
And include that into the src.
This task can be simplified with the use of a library suggested on comments, Simple HTML DOM Parser:
It is as easy as this:
$images = array(); //an array for your images with {image} in src
$html = "...";
foreach($html->find('img') as $element)
if($element->src == '{image}') {
//add to the collection
$images[] = $element;
}
//Also you can compare for the editable attribute same way as above.
}
if you want to get second editable attr and save it in an array like $src so check this code:
$content=new DOMDocument();
$content->loadHTML($template);
$elements=simplexml_import_dom($content);
$images=$elements->xpath('//img');
foreach ($images as $img) {
if(preg_match('/all /i', $img['editable']))
$src[]=substr($img['editable'],4) ;
}
print_r($src);
will output:
Array ( [0] => image_all )
Try this,
include('simple_html_dom.php');
$html = str_get_html('<div><img src="{image}" editable="all image_all" /><img src="{image}" editable="yes" /></div>');
$second_args= array();
foreach($html->find('img[src="{image}"]') as $element){
$editables = explode(' ',$element->editable);
if($editables[0] === "all"){
$second_args[] = $editables[1];
}
}
print_r($second_args);
Related
I have looked for this answer to this question on here but I can't seem to find anything which is relevant to this particular issue.
I am currently using simpleXML to parse an RSS feed, in order to return thumbnail images by going through the nodes to parse "media:thumbnail". I have managed to do this and return all thumbnail URLs, so I know that I am getting to the right content, like so:
<?php
$url = "http://feeds.bbci.co.uk/news/rss.xml?edition=uk";
$xml = simplexml_load_file($url);
foreach($xml->channel->item as $item) {
$media = $item->children('media', 'http://search.yahoo.com/mrss/');
foreach($media->thumbnail as $thumb) {
echo $thumb->attributes()->url;
}
}
?>
This echos all the image urls. But when I store this in to a variable and try to echo this later as the img src, it only returns one image, rather than all:
<?php
$url = "http://feeds.bbci.co.uk/news/rss.xml?edition=uk";
$xml = simplexml_load_file($url);
foreach($xml->channel->item as $item) {
$media = $item->children('media', 'http://search.yahoo.com/mrss/');
foreach($media->thumbnail as $thumb) {
$image = $thumb->attributes()->url;
}
}
?>
<div><img src = <?php echo $image; ?> /></div>
How can I echo all of the URLs in to individual images? Thanks for looking.
Since you're getting and expecting multiple image urls, might as well store them inside an array:
$images_container = array();
foreach($xml->channel->item as $item) {
$media = $item->children('media', 'http://search.yahoo.com/mrss/');
foreach($media->thumbnail as $thumb) {
$image = $thumb->attributes()->url;
$images_container[] = (string) $image;
}
}
echo '<pre>', print_r($images_container, 1), '<pre>';
Sample Output
Now of course, if you want to process those array of string image urls, then just use and process the container:
<?php foreach($images_container as $url): ?>
<div><img src="<?php echo $url; ?>" alt="" /></div>
<?php endforeach; ?>
Pictures
Try xpath.
$url = "http://feeds.bbci.co.uk/news/rss.xml?edition=uk";
$xml = simplexml_load_file($url);
$xml->registerXPathNamespace( 'media', 'http://search.yahoo.com/mrss/' );
// get only thumbnails of specified width
$xpath = $xml->xpath( '//media:thumbnail[#url and #width=144]' );
/**
* The above xpath will get only thumbnails of width 144
*/
foreach( $xpath as $node ) {
echo '<div><img src="' . $node['url'] . '" /></div>';
}
Hope that helps.
i have code like that
<div class="x-ic test"><div class="abc">Test</div><table>.......</table><p>....</p><div style="margin:5px;"></div></div>
i tried a few pattern but i didnt get result i want.
and i need to get result below
<div class="abc">Test</div><table>.......</table><p>....</p><div style="margin:5px;"></div>
Use DOMDocument - I've written the code into functions in case you want to perform similar operations again:
function DOMinnerHTML(DOMNode $element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$innerHTML .= $element->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
function getElContentsByTagClass($html,$tag,$class)
{
$doc = new DOMDocument();
$doc->loadHTML($html);//Turn the $html string into a DOM document
$els = $doc->getElementsByTagName($tag); //Find the elements matching our tag name ("div" in this example)
foreach($els as $el)
{
//for each element, get the class, and if it matches return it's contents
$classAttr = $el->getAttribute("class");
if(preg_match('#\b'.$class.'\b#',$classAttr) > 0) return DOMinnerHTML($el);
}
}
//Calling it:
$html = '<div class="x-ic test"><div class="abc">Test</div><table>.......</table><p>....</p><div style="margin:5px;"></div></div>';
$ret = getElContentsByTagClass($html,'div','x-ic test');//<div class="abc">Test</div><table>.......</table><p>....</p><div style="margin:5px;"></div>
PHP Fiddle - Run (F9)
To achieve this with regex:
<?php
$html = '<div class="x-ic test"><div class="abc">Test</div><table>.......</table><p>....</p><div style="margin:5px;"></div></div>';
$split = preg_replace('#^<div class="x-ic test">|</div>$#', '' , $html);
var_dump($split);
?>
The easy and adviced way to manipulate html is using DOM. I personally use PHP Simple HTML DOM Parser to do that.
But since i'm practicing REGEX, here's an answer using regex :) :
$text = '<div class="x-ic test"><div class="abc">Test</div><table>.......</table><p>....</p><div style="margin:5px;"></div></div>';
$text = preg_replace('/^(.*)(<div class="abc">.*)(<\/div>)$/', '$2', $text);
// OUTPUT:
<div class="abc">Test</div><table>.......</table><p>....</p><div style="margin:5px;"></div>
You can try it HERE
Cheers :)
I am trying to write a preg_replace that will clean all tag properties of the allowed tags, and all tags that do not exist in the allowed list.
Basic example- this:
<p style="some styling here">Test<div class="button">Button Text</div></p>
would turn out to be:
<p>test</p>
I have this working well.. Except for img tags and a href tags. I need to not clean the properties of the img and a tags. Possibly others. I was not sure if there was a way to set two allow lists?
1) One list for what tags are allowed to stay after being cleaned
2) One list for the tags that are allowed but left alone?
3) The rest are deleted.
Here is the script I am working on:
$string = '<p style="width: 250px;">This is some text<div class="button">This is the button</div><br><img src="waves.jpg" width="150" height="200" /></p><p><b>Title</b><br>Here is some more text and this is a link</p>';
$output = strip_tags($string, '<p><b><br><img><a>');
$output = preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i", '<$1$2$3$4$5>', $output);
echo $output;
This script should clean the $string to be:
<p>This is some text<br><img src="waves.jpg" width="150" height="200" /></p><p><b>Title</b><br>Here is some more text and this is a link</p>
http://ideone.com/aoOOUN
This function will strip an element of disallowed sub elements, clean its "stripped" sub elements, and leave the rest (recursively).
function clean($element, $allowed, $stripped){
if(!is_array($allowed) || ! is_array($stripped)) return;
if(!$element)return;
$toDelete = array();
foreach($element->childNodes as $child){
if(!isset($child->tagName))continue;
$n = $child->tagName;
if ($n && !in_array($n, $allowed) && !in_array($n, $stripped)){
$toDelete[] = $child;
continue;
}
if($n && in_array($n, $stripped)){
$attr = array();
foreach($child->attributes as $a)
$attr[] = $a->nodeName;
foreach($attr as $a)
$child->removeAttribute($a);
}
clean($child, $allowed, $stripped);
}
foreach ($toDelete as $del)
$element->removeChild($del);
}
This is the code to clean your string:
$xhtml = '<p style="width: 250px;">This is some text<div class="button">This is the button</div><br><img src="waves.jpg" width="150" height="200" /></p><p><b>Title</b><br>Here is some more text and this is a link</p>';
$dom = new DOMDocument();
$dom->loadHTML($xhtml);
$body = $dom->getElementsByTagName('body')->item(0);
clean($body, array('img', 'a'), array('p', 'br', 'b'));
echo preg_replace('#^.*?<body>(.*?)</body>.*$#s', '$1', $dom->saveHTML($body));
You should check out the Documentation for PHP's DOM classes
I want to be able to extract only the src of the second image in an html file. I am using the PHP DOM parser:
foreach($html->find('img[src]') as $element)
$src = $element->getAttribute('src');
echo $src;
However, I am getting the src of the last image in the page, instead of the one I am looking for.
Can I display only a specific src outside of the foreach loop?
Your loop is missing {}, it is equivalent to
foreach($html->find('img[src]') as $element) {
$src = $element->getAttribute('src');
}
echo $src;
so, the echo gets the $src after the last iteration of your loop, which is the last element.
Using the example from their website, I'd go with this (braces are key here):
$count = 1;
foreach($html->find('img') as $element) {
if ($count == 2) {
echo $element->src;
break;
}
$count += 1;
}
Thanks for taking the time to read my post... I'm trying to extract some information from my website using Simple HTML Dom...
I have it reading from the HTML source ok, now I'm just trying to extract the information that I need. I have a feeling I'm going about this in the wrong way... Here's my script...
<?php
include_once('simple_html_dom.php');
// create doctype
$dom = new DOMDocument("1.0");
// display document in browser as plain text
// for readability purposes
//header("Content-Type: text/plain");
// create root element
$xmlProducts = $dom->createElement("products");
$dom->appendChild($xmlProducts);
$html = file_get_html('http://myshop.com/small_houses.html');
$html .= file_get_html('http://myshop.com/medium_houses.html');
$html .= file_get_html('http://myshop.com/large_houses.html');
//Define my variable for later
$product['image'] = '';
$product['title'] = '';
$product['description'] = '';
foreach($html->find('img') as $src){
if (strpos($src->src,"http://myshop.com") === false) {
$src->src = "http://myshop.com/$src->src";
}
$product['image'] = $src->src;
}
foreach($html->find('p[class*=imAlign_left]') as $description){
$product['description'] = $description->innertext;
}
foreach($html->find('span[class*=fc3]') as $title){
$product['title'] = $title->innertext;
}
echo $product['img'];
echo $product['description'];
echo $product['title'];
?>
I put echo's on the end for sake of testing...but I'm not getting anything... Any pointers would be a great HELP!
Thanks
Charles
file_get_html() returns a HTMLDom Object, and you cannot concatenate Objects, although HTMLDom have __toString methods when there concatenated there more then lilly corrupt in some way, try the following:
<?php
include_once('simple_html_dom.php');
// create doctype
$dom = new DOMDocument("1.0");
// display document in browser as plain text
// for readability purposes
//header("Content-Type: text/plain");
// create root element
$xmlProducts = $dom->createElement("products");
$dom->appendChild($xmlProducts);
$pages = array(
'http://myshop.com/small_houses.html',
'http://myshop.com/medium_houses.html',
'http://myshop.com/large_houses.html'
)
foreach($pages as $page)
{
$product = array();
$source = file_get_html($page);
foreach($source->find('img') as $src)
{
if (strpos($src->src,"http://myshop.com") === false)
{
$product['image'] = "http://myshop.com/$src->src";
}
}
foreach($source->find('p[class*=imAlign_left]') as $description)
{
$product['description'] = $description->innertext;
}
foreach($source->find('span[class*=fc3]') as $title)
{
$product['title'] = $title->innertext;
}
//debug perposes!
echo "Current Page: " . $page . "\n";
print_r($product);
echo "\n\n\n"; //Clear seperator
}
?>