I have an RSS feed from which I'm trying to extract data though SimplePie (in WordPress).
I have to extract the content tag. It works with <?php echo $item->get_content(); ?>. It throws out all this stuff (of course this is just an entry, the others have the same structure):
<table><tr valign="top">
<td width="67">
<a href="http://www.anobii.com/books/Lapproccio_sistemico_al_governo_dellimpresa/9788813230944/014c5c45a7ddaab1ec/" style="border: 1px solid #333333">
<img src="http://image.anobii.com/anobi/image_book.php?type=3&item_id=014c5c45a7ddaab1ec&time=0">
</a>
</td><td style="margin-left: 10px;padding-left: 10px">[person name] put "[title]" onto shelf<br/></td></tr></table>
Though what I need is just the content inside src="" tag (image url). How can I extract only that?
You can do it using DOMDocument (the best way):
$doc = new DOMDocument();
#$doc->loadHTML($html);
$imgs = $doc->getElementsbyTagName('img');
$res = $imgs->item(0)->getAttribute('src');
print_r($res);
With a regex (the bad way):
if (preg_match('~\bsrc\s*=\s*["\']\K[^"\']*+~i', $html, $match))
print_r($match);
Related
i have html like bottom of this. and using PHP
<table style="...">
<tbody>
<tr> <img id="foo" src="foo"/></tr>
</tbody>
</table>
<p> ....</p>
<table style="...">
<tbody>
<tr> <img id="bar" src="bar"/></tr
</tbody>
</table>
I'm beginning PHP.
I want to find specific table like img src or id equals foo or bar.
but selected both tables.
here is my regex
1.find tables has img tag
/<table.*?>.*?<img *.*?<\/table>/
-> selected 2 table
2.add img src
<table.*?<img.+(src=.*?foo).*?<\/table>
-> selected all, from first tag to last tag
3.so try to not include </table> between ... tag.
<table.*?(?!<\/table>).*?<img.+(src=.*?foo).*?<\/table>
-> same result
I don't know what is wrong!
I was solved using preg_match_all() but still want know preg_match()
has any idea??
thanks!
This job is much better suited to using PHPs DOMDocument and DOMXPath classes. In this case we use an xpath to search for a table which has a descendant which is an img with it's src attribute equal to either 'foo' or 'bar':
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$footable = $xpath->query("//table[descendant::img[#src='foo']]");
echo $footable->item(0)->C14N() . "\n";
$bartable = $xpath->query("//table[descendant::img[#src='bar']]");
echo $bartable->item(0)->C14N() . "\n";
Output:
<table style="..."><tbody><tr><img id="foo" src="foo"></img></tr></tbody></table>
<table style="..."><tbody><tr><img id="bar" src="bar"></img></tr></tbody></table>
Demo on 3v4l.org
I have summernote WYSIWYG plugin, Now whenever i add any images it converts the image into
<img data-filename="Untitled-1.png" src="" style="width: 645px;">
Now all I want is to detect this first tag and get it's src value & store it in db to show it as a featured image
for e.g if there are two img data-file-name tags
<img data-filename="Untitled-1.png" src="" style="width: 645px;">
<img data-filename="Untitled-2.png" src="" style="width: 645px;">
I want to get the src value of Untitled-1.png only, not the Untitled-2.png,
Here is what I've tried
preg_match('/(<img .*?>)/', $go, $img_tag);
$feature = $img_tag[0];
Use DOMDocument and DOMXPath to easily target what you want using the HTML structure:
$content = <<<'EOD'
<img data-filgename="Untitled-1.png" src="" style="width: 645px;">
<img data-filgename="Untitled-2.png" src="" style="width: 645px;">
EOD;
$dom = new DOMDocument;
$dom->loadHTML($content);
$xp = new DOMXPath($dom);
$result = $xp->evaluate('string(//img[#data-filename]/#src)');
# img node anywhere --------^ ^ ^---- src attribute
# in the DOM tree '---- predicate: must have a
# data-filename attribute
if (!empty($result))
echo $result, PHP_EOL;
Hi I am trying to wrap images containing a specific class (pinthis is this example) in a span to which I will add info for schema. This is a basic example and I will need to inject other schema info also. To get me started though can anyone help me get from my existing code to my example output. I need to update multiple pages dynamically and some of the content will come via PHP from Wordpress taxonomies and other data so would prefer to do it in PHP if possible.
<p>
<a class="fancybox" rel="gallery1" href="image.jpg">
<img src="img.jpg" alt="alt text" width="1000" height="1000" class="various classes including ... pinthis">
</a>
</p>
Which I would like to become...
<p>
<a class="fancybox" rel="gallery1" href="image.jpg">
<span itemscope itemtype="http://schema.org/ImageObject">
<img src="img.jpg" alt="alt text" width="1000" height="1000" class="various classes including ... pinthis">
</span>
</a>
</p>
I think if someone could point me in the right direction and give me a push start that would give me enough to carry on from there
Many thanks.
Using PHP DOMDocument, you could do something like this:
$html = '<p><a class="fancybox" rel="gallery1" href="image.jpg"><img src="img.jpg" alt="alt text" width="1000" height="1000" class="various classes pinthis"></a></p>';
// Create a DOMDocument and load the HTML.
$dom = new DOMDocument();
$dom->loadHTML($html);
// Create the span wrapper.
$span = $dom->createElement('span');
$span->setAttribute('itemscope', '');
$span->setAttribute('itemtype', 'http://schema.org/ImageObject');
// Get all the images.
$images = $dom->getElementsByTagName('img');
// Loop the images.
foreach ($images as $image) {
// Only affect those with the pinthis class.
if (strpos($image->getAttribute('class'), 'pinthis') !== false) {
// Clone the span if we need to use it often.
$span_clone = $span->cloneNode();
// Replace the image tag with the span tag.
$image->parentNode->replaceChild($span_clone, $image);
// Add the image tag as a child of the new span tag.
$span_clone->appendChild($image);
}
}
// Get your HTML with saveHTML()
$html = $dom->saveHTML();
echo $html;
Just modify the code to suit your specific needs. For example, if you need to change your span tag attributes, if you are changing your class for searching, etc... You might even want to make a function where you can specify your class and span attributes.
Documentation to DOMDocument: http://php.net/manual/en/class.domdocument.php
use warpAll
check if the image has required class
if image has class, then wrap it with the desired <span></span>
Try it this way :
if ($('img.classes').hasClass('pinthis')){
$('img.classes').wrapAll('<span itemscope itemtype="http://schema.org/ImageObject">></span>');
}
Fiddle Demo
helpful thread : jquery, wrap elements inside a div
I have the following code snippet which essentially parses my blog site and store some information as variables:
global $articles;
$items = $html->find('div[class=blogpost]');
foreach($items as $post) {
$articles[] = array($post->children(0)->innertext,
$post->children(1)->first_child()->outertext);
}
foreach($articles as $item) {
echo $item[0];
echo $item[1];
echo "<br>";
}
The above code outputs as follows:
Title of blog post 1 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/cool_news" id="963" target="_blank" >Click here for news</a> <img src="/news.gif" width="12" height="12" title="validated" /><span class="title">
Title of blog post 2 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/neato" id="963" target="_blank" >Click here for neato</a> <img src="/news.gif" width="12" height="12" title="validated" /><span class="title">
Title of blog post 3 <script type="text/javascript">execute_function(3,'')</script><a href="http://www.example.com/lame" id="963" target="_blank" >Click here for lame</a> <img src="/news.gif" width="12" height="12" title="validated" /><span class="title">
with $item[0] containing "Title of blog post X" and $item[1] containing the rest.
What I want to do is parse $item[1] and retain only the URL contained within it as a separate variable. Perhaps I am not phrasing my question correctly, but I cannot find anything that can help me figure this out.
Can anyone help me?
If you were to parse $item[1] into whatever DOM crawler object you were using for $html, you could use the following XPath
$item[1]->find('//a[0]/#href');
which will return
href="http://www.example.com/cool_news"
Then extract the url however you want, with PHP or refine the XPath query. Not sure what the XPath would be to get the value, perhaps someone might be able to expand on that one.
EDIT: Seeing as you using Simple DOM Parser, try the following
$blogItemHtml = new simple_html_dom();
$blogItemHtml->load($item[1]);
$anchors = $blogItemHtml->find('a');
echo $anchors[0]->href; // "http://www.example.com/cool_news"
I fetches from value from db like:
<p><img alt="" src="images/1.jpg" style="width: 2450px; height: 1054px;" /></p>
and wants to only get src="images/1.jpg" but don't know how. Please guide me
If you need the source, use a DOM Parser:
// Construct a new DOMDocument with your fragment
$domDoc = new DOMDocument;
$domDoc->loadHTML( '<p><img src="images/1.jpg" style="width: 2450px;" /></p>' );
// Locate the first image the document
$img = $domDoc->getElementsByTagName( "img" )->item( 0 );
// Echo its src value
echo $img->attributes->getNamedItem( "src" )->nodeValue;
Results: http://codepad.org/oMXGK9Iu
Ideally you would ensure the image elements exist before accessing items #0. Likewise, you would ensure the attributes exist before just leaping out and grabbing them.
Further reading: http://www.php.net/manual/en/class.domdocument.php
If you just want to grab that particular portion of the text, you could use a simple regular expression:
// Prep our html
$html = '<p><img src="images/1.jpg" style="width: 2450px;" /></p>';
// Look for the source string
preg_match( '/src=\".*?\"/', $html, $matches );
// If we found it, spit it out.
echo $matches ? $matches[0] : "No source";
if alt="" is empty by default and style is width: 2450px; height: 1054px; by default you could use:
<?php
$str = '<p><img alt="" src="images/1.jpg" style="width: 2450px; height: 1054px;" /></p>';
$str = str_replace('<p><img alt="" src="','', $str);
$str = str_replace('" style="width: 2450px; height: 1054px;" /></p>','',$str);
echo $str; //Outputs: images/1.jpg
?>