We load html by url. After that creating DOMDocument
libxml_use_internal_errors(true); // disable errors
$oHtml = new DOMDocument();
if (!$oHtml->loadHTML($this->getHtml($aData['href']))) {
return false;
}
Next step is to delete fancybox or other popUp links... In our case image code is
<a onclick="return hs.expand(this)" href="http://domain.com/uploads/09072014106.jpg">
<img title="Some title" alt="Some title" src="http://domain.com/uploads/thumbs/09072014106.jpg">
</a>
And we execute our method for it ...
$this->clearPopUpLink($oHtml); // delete parent <a tag....
Method...
private function clearPopUpLink($oHtml)
{
$aLink = $oHtml->getElementsByTagName('a');
if (!$aLink->length) {
return false;
}
for ($k = 0; $k < $aLink->length; $k++) {
$oLink = $aLink->item($k);
if (strpos($oLink->getAttribute('onclick'), 'return hs.expand(this)') !== false) {
// <a onclick="return hs.expand(this)" href="http://domain.com/uploads/posts/2014-07/1405107411_09072014106.jpg">
// <img title="Some title" alt="Some title" src="http://domain.com/uploads/posts/2014-07/thumbs/1405107411_09072014106.jpg">
// </a>
$oImg = $oLink->firstChild;
$oImg->setAttribute('src', $oLink->getAttribute('href')); // set img proper src
// $oLink->parentNode->removeChild($oLink);
// $oLink->parentNode->replaceChild($oImg, $oLink);
$oLink->parentNode->insertBefore($oImg); // replacing!?!?!?!
// echo $oHtml->ownerDocument->saveHtml($oImg);
}
}
}
Now questions... This code is working BUT I don't get WHY! Why when clearPopUpLink() done with all "images" it has not OLD code with tags? I tried to use (in first time when start investigation) ->insertBefore(), after that ->removeChild(). First is add simple (edited) image BEFOR current image (with <a>), after that delete old node image (with <a>). BUT! It doesn't work, it was doing only on each second (each first was done correctly).
So, let me ask simple question, how to do it in right way? Because I don't think that code below (clearPopUpLink) is correct enough... Please suggest your solutions.
Hmm, I would use the trustee XPath for this and make sure the anchor gets removed; the code you've shown doesn't exactly make that obvious (I haven't tested it).
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//a[contains(#onclick, "return hs.expand(this)")]/img') as $img) {
$anchor = $img->parentNode;
$anchor->parentNode->insertBefore($img, $anchor); // take image out
$anchor->parentNode->removeChild($anchor); // remove empty anchor
}
echo $doc->saveHTML();
Related
Not sure why the code below is not working, its displaying the "Else" value in the IF statement basically saying that there are no IMG tags found on the page but.. im sure they are there? any advice or guidance will be appreciated.
// This variable will contain all the HTML source code of the sample page
$htmlContent = file_get_contents('https://www.instagram.com/ken_flavius/');
var_dump($htmlContent);
// We'll add all the images in this array
$images = [];
// Instantiate a new object of class DOMDocument
$doc = new DOMDocument();
// Load the HTML doc into the object
$doc->loadHTML($htmlContent);
// Get all the IMG tags in the document
$elements = $doc->getElementsByTagName('img');
// If we get at least one result
if($elements->length > 0)
{
// Loop on all of the IMG tags
foreach($elements as $element)
{
// Get the attribute SRC of the IMG tag (this is the link of the image)
$src = $element->getAttribute('src');
if (strlen($src) > 0) {
// Add the link to the array containing all the links
array_push($images, $src);
}
}
//show all links
echo '<pre>'."\r\n";
print_r($images);
echo '</pre>'."\r\n";
} else {
// No result, it means that there were no IMG tags
echo 'no img tag found in the HTML source provided!';
}
Edited it to show the exact example that im using.
$url="http://example.com";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $tag->getAttribute('src');
}
I want to fetch the <a> link within the div glance_details. I can't really make it work. Don't worry about including and the url and things, that is all correct.
$redirect = $url;
$html3 = file_get_html($redirect);
foreach($html3->find('div.glance_details') as $element3) {
$html3->find('a',0)->outertext;
}
with
$redirect = $url;
$html3 = file_get_html($redirect);
foreach($html3->find('div.glance_details') as $element3) {
$knaoss = $element3->plaintext;
echo $knaoss;
}
I can fetch the plain text content of the div, but what I want is the anchor (a) that will be within the div.
This is similar to what I receive in $knaoss if I remove the ->plaintext:
<div class="glance_details">
<a href="http://www.example.com/">
<img src="http://www.example.com/img.png">
</a>
"This is a description of the example"
</div>
Though all I want from it is:
http://www.example.com/
I must delete this answer because do not match the OP requirements after he has posted the requested HTML code
The solution was simple. Only had to change:
$redirect = $url;
$html3 = file_get_html($redirect);
foreach($html3->find('div.glance_details') as $element3) {
$knaoss = $element3->plaintext;
}
to
$redirect = $url;
$html3 = file_get_html($redirect);
foreach($html3->find('div.glance_details > a') as $element3) {
$knaoss = $element3->href;
}
to find the href within div.glance_details. Problem was I used words like "url" and "link" instead of href, and could therefor not make it work.
I didn't expect this to be difficult but so far have failed at implementing it.
Simply I want to add a class name to the link tag that wraps the img tag when you insert an image into a post using the media library.
I want to turn this
<img src="http://..." alt="..." width="780" height="490" class="alignnone size-full wp-image-12" />
Into this
<img src="http://..." alt="..." width="780" height="490" class="alignnone size-full wp-image-12" />
I can do it simply enough using jQuery, but I'd much prefer to use a wordpress filter or custom function. SO the output in the post includes the class name.
Thanks msbodetti, the solution is as follows.
function add_colorbox_class_to_image_links($html, $attachment_id, $attachment) {
$linkptrn = "/<a[^>]*>/";
$found = preg_match($linkptrn, $html, $a_elem);
// If no link, do nothing
if($found <= 0) return $html;
$a_elem = $a_elem[0];
// Check to see if the link is to an uploaded image
$is_attachment_link = strstr($a_elem, "wp-content/uploads/");
// If link is to external resource, do nothing
if($is_attachment_link === FALSE) return $html;
if(strstr($a_elem, "class=\"") !== FALSE){
// If link already has class defined inject it to attribute
$a_elem_new = str_replace("class=\"", "class=\"fancylink ", $a_elem);
$html = str_replace($a_elem, $a_elem_new, $html);
}else{ // If no class defined, just add class attribute
$html = str_replace("<a ", "<a class=\"fancylink \" data-fancybox-group=\"gallery\" ", $html);
}
return $html;
}
add_filter('image_send_to_editor', 'add_colorbox_class_to_image_links', 10, 3);
I have the following code that replaces all tags on a page and adds the nCode image resizer to it. The code is as follows:
function ncode_the_content($content) {
return preg_replace("/<img([^`|>]*)>/im", "<img onload=\"NcodeImageResizer.createOn(this);\"$1>", $content); }
}
What I need to do is make it so that if an image has the class of "noresize" it doesn't do the preg_match.
I have only managed to get it so that if there is the "noresize" class anywhere on the page it stops resizing all images instead of just the one with the correct class.
Any suggestions?
UPDATE:
Am I even remotely in the right ballpark with this?
function ncode_the_content($content) {
//Load the HTML page
$html = file_get_contents($content);
//Parse it. Here we use loadHTML as a static method
//to parse the HTML and create the DOM object in one go.
#$dom = DOMDocument::loadHTML($html);
//Init the XPath object
$xpath = new DOMXpath($dom);
//Query the DOM
$linksnoresize = $xpath->query( 'img[#class = "noresize"]' );
$links = $xpath->query( 'img[]' );
//Display the results as in the previous example
foreach($links as $link){
echo $link->getAttribute('onload'), 'NcodeImageResizer.createOn(this);';
}
foreach($linksnoresize as $link){
echo $link->getAttribute('onload'), '';
}
}
Here's some untested code:
$dom = DOMDocument::loadHTML($content);
$images = $dom->getElementsByTagName("img");
foreach ($images as $image) {
if (!strstr($image->getAttribute("class"), "noresize")) {
$image->setAttribute("onload", "NcodeImageResizer.createOn(this);");
}
}
But, if it were me, I would eschew any such inline event handler and instead just find the appropriate elements with Javascript.
I ended up just using pure CSS and adding a around the images I didn't want to be resized. Forced the width and height of that div back to auto and then removed the warning message that was displayed above them. Seems to work fine. Thanks for your help :)
$img = '<img src="http://some-img-link" alt="some-img-alt"/>';
$src = preg_match('/<img src=\"(.*?)\">/', $img);
echo $src;
I want to get the src value from the img tag and maybe the alt value
Assuming you are always getting the img html as you shown in the question.
Now in the regular expression you provided its saying that, after the src attribute its given the closing tag for img. But in the string there is an alt attribute also. So you need to care about it also.
/<img src=\"(.*?)\".*\/>/
And if you are going to check alt also then the regular expression.
/<img src=\"(.*?)\"\s*alt=\"(.*?)\"\/>/
Also you are just checking whether its matched or not. If you need to get the matches, you need to provide a third parameter to preg_match which will fill with the matches.
$img = '<img src="http://some-img-link" alt="some-img-alt"/>';
$src = preg_match('/<img src=\"(.*?)\"\s*alt=\"(.*?)\"\/>/', $img, $results);
var_dump($results);
Note : The regex given above is not so generic one, if you could provide the img strings which will occur, will provide more strong regex.
function scrapeImage($text) {
$pattern = '/src=[\'"]?([^\'" >]+)[\'" >]/';
preg_match($pattern, $text, $link);
$link = $link[1];
$link = urldecode($link);
return $link;
}
Tested Code :
$ input=’<img src= ”http://www.site.com/file.png” > ‘;
preg_match(“<img.*?src=[\"\"'](?<url>.*?)[\"\"'].*?>”,$input,$output);
echo $output; // output = http://www.site.com/file/png
How to extract img src, title and alt from html using php?
See the first answer on this post.
You are going to use preg_match, just in a slightly different way.
Try this code:
<?php
$doc = new DOMDocument();
$doc->loadHTML('<img src="" />');
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
?>
Also you could use this library: SimpleHtmlDom
<?php
$html = new simple_html_dom();
$html->load('<html><body><img src="image/profile.jpg" alt="profile image" /></body></html>');
$imgs = $html->find('img');
foreach($imgs as $img)
print($img->src);
?>
preg_match('/<img src=\("|')([^\"]+)\("|')[^\>]?>/', $img);
You already have good enough responses above, but here is another one code (more universal):
function retrieve_img_src($img) {
if (preg_match('/<img(\s+?)([^>]*?)src=(\"|\')([^>\\3]*?)\\3([^>]*?)>/is', $img, $m) && isset($m[4]))
return $m[4];
return false;
}
You can use JQuery to get src and alt attributes
include jquery in header
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js">
</script>
//HTML
//to get src and alt attributes
<script type='text/javascript'>
// src attribute of first image with id =imgId
var src1= $('#imgId1').attr('src');
var alt1= $('#imgId1').attr('alt');
var src2= $('#imgId2').attr('src');
var alt2= $('#imgId2').attr('alt');
</script>