How can I go from this
<img style="width:200px; height:300px; ...
to this
<img width="200" height="300" ...
with PHP?
An easy way can be not to touch to the html...
Find a way to target those images in your css and just adapt the display
div.myimages img {
max-with:320px;
height: auto!important;
}
It's gonna use some RegEx:
//use SimpleXMLElement to parse the attributes
$img = new SimpleXMLElement($imgStr);
$pattern = '~([a-z-]+)\s*:\s*([^;$]+)~si';
//convert the value of the style attribute into an associative array
if (preg_match_all($pattern, $img['style'], $match)) {
$style[] = array_combine(
array_map('strtolower', $match[1]),
array_map(function($val) { return trim($val, '"\' '); }, $match[2])
);
}
// add the width and height attributes and get the new html
$img->addAttribute('width', $style[0]['width']);
$img->addAttribute('height', $style[0]['height']);
$newImgStr = $img->asXML();
// remove the width and height rules from the style attribute
$newImgStr = preg_replace('/width:([^;$]+);/', '', $newImgStr);
$newImgStr = preg_replace('/height:([^;$]+);/', '', $newImgStr);
If the style is already loaded with the img then the solutiona can be made with JQUERY :
$(function () {
width=$(this).css("width");
height=$(this).css("height");
$(this).attr("width",width);
$(this).attr("height",height);
});
Hope it will serve your purposes.
Related
I'm reading a webpage using PHP DOM/XPath and I've managed to get the text I need, but now I'm trying to get the src of the main image but I can't get it.
Also to complicate things, the source is different to the inspector.
Here is the source:
<div id="bg">
<img src="https://example.com/image.jpg" alt=""/>
</div>
And here is the element in the inspector:
<div class="media-player" id="media-player-0" style="width: 320px; height: 320px; background: url("https://example.com/image.jpg") center center / cover no-repeat rgb(208, 208, 208);" currentmouseover="16">
I've tried:
$img = $xpath->evaluate('substring-before(substring-after(//div[#id=\'bg\']/img, "\')")');
and
$img = $xpath->evaluate('substring-before(substring-after(//div[#class=\'media-player\']/#style, "background: url(\'"), "\')")');
but get nothing from either.
Here is my complete code:
$html = file_get_contents($externalurl);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$allChildNodesFromDiv = $xpath->query('//h1[#class="artist"]');
$releasetitle = $allChildNodesFromDiv->item(0)->textContent;
echo "</br>Title: " . $releasetitle;
$img = $xpath->evaluate('substring-before(substring-after(//div[#class=\'media-player\']/#style, "background: url(\'"), "\')")');
echo $image;
$img = $xpath->evaluate('substring-before(substring-after(//div[#id=\'bg\']/img, "\')")');
echo $image;
Not something I would normally suggest, but as the particular content you are after is loaded from javascript, BUT the content is in <script> tags, then it may be an easy one for a regex to extract. From your comment...
Ah yes, it appears in: poster :
'https://284fc2d5f6f33a52cd9f-ce476c3c56a27f320262daffab84f1af.ssl.cf3.rackcdn.com/artwork_5e74a44e1e004_CHAMPDL879D_5e74a44e4672b.jpg'
So this code looks the value of poster : '...',.
$html = file_get_contents($externalurl);
preg_match("/poster : '(.*)',/", $html, $matches);
echo $matches[1];
This can be prone to changes in the html, but it may work for now.
I'm using the following code to remove the sizes added by Wordpress to medias' filenames.
function replace_content($content) {
$content = preg_replace('/-([^-]*(\d+)x(\d+)\. ((?:png|jpeg|jpg|gif|bmp)))"/', '.${4}"', $content);
return $content;
}
add_filter('the_content','replace_content');
How to change the regex to apply it only to the href attribute value?
Folowing regex with preg_replace() function
$replaced_content = preg_replace( '#<img[^>]*?src[\s]?=[\s]?[\'"]?([^\'">]*?(https|http|\/\/)[^\'">]*?(png|jpeg|jpg|gif|bmp))[^\'" >]*?)[\'" ][^>]*?>#',
'<img src="$1">', $content );
cleans this awful img tag
<img ttl='Ren src = https://cdn.wpbeginner.com/wp-content/uploads/2015/01/rename-on-save.png' alt="Rena width=520" height="344" wp-image-25391">
to this clean and nice code
<img src="https://cdn.wpbeginner.com/wp-content/uploads/2015/01/rename-on-save.png">
I have an editor in my site that will save images in this format automatically:
<img alt="image-alt" src="image-path" style="width: Xpx; height: Ypx;" title="image-title" />
this tag will save in static .html file and then will shows in my site with readfile()...
I want to change this structure before saving it in static .html file to this new format:
<img alt="image-alt" src="image-path" width="Xpx" height="Ypx" title="image-title" />
infact, I want to change the way "width" and "height" is writing in static html file.
I'm using PHP and can run any function on the html string before fwrite() it.
thanks.
I started off thinking this'd be quite easy using preg_replace_callback, but it turned into a bit of a monster. I'm sure it could easily be improved with a bit of refactoring:
<?php
// Some HTML test data with two images. For one image I've thrown in some extra styles, just
// to complicate things
$content= '<img alt="image-alt-2" src="image-path" style="width: 20px; height: 15px; border: 1px solid red;" title="image-title" />
<p>Some other tags. These shouldn\'t be changed<br />Etc.</p>
<img alt="image-alt-2" src="image-path-2" style="width: 35px; height: 30px;" title="another-image-title" />
<p>This last image only has a width and no height</p>
<img alt="image-alt-3" src="image-path-3" style="width:35px;" title="another-image-title" />';
$content= preg_replace_callback('/<img ((?:[a-z]+="[^"]*"\s*)+)\/>/i', 'replaceWidthHeight', $content);
var_dump($content);
function replaceWidthHeight($matches) {
// matches[0] will contain all the image attributes, need to split
// those out so we can loop through them
$submatches= array();
$count= preg_match_all('/\s*([a-z]+)="([^"]*)"/i', $matches[1], $submatches, PREG_SET_ORDER);
$result= '<img ';
for($ndx=0;$ndx<sizeof($submatches);$ndx++) {
if ($submatches[$ndx][1]=='style') {
// Found the style tag ...
$width= ''; // Temporary storage for width and height if we find them
$height= '';
$result.= ' style="';
$styles= split(';', $submatches[$ndx][2]);
foreach($styles as $style) {
$style= trim($style); // remove unwanted spaces
if (strlen($style)>6 && substr($style, 0, 6)=='width:') {
$width= trim(substr($style, 6));
}
elseif (strlen($style)>7 && substr($style, 0, 7)=='height:') {
$height= trim(substr($style, 7));
}
else { // Some other style - pass it through
$result.= $style;
}
}
$result.= '"';
if (!empty($width)) $result.= " width=\"$width\"";
if (!empty($height)) $result.= " height=\"$height\"";
}
else {
// Something else, just pass it through
$result.= $submatches[$ndx][0];
}
}
return $result.'/>';
}
My html content looks like this:
<div class="preload"><img src="PRODUCTPAGE_files/like_icon_u10_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u14_line.png" width="1" height="1"/>
It is one unbroken long line with no newlines separating each img element with no indentation whatsoever.
The php code I use is as follows:
/**
*
* Take in html content as string and find all the <script src="yada.js" ... >
* and add $prepend to the src values except when there is http: or https:
*
* #param $html String The html content
* #param $prepend String The prepend we expect in front of all the href in css tags
* #return String The new $html content after find and replace.
*
*/
protected static function _prependAttrForTags($html, $prepend, $tag) {
if ($tag == 'css') {
$element = 'link';
$attr = 'href';
}
else if ($tag == 'js') {
$element = 'script';
$attr = 'src';
}
else if ($tag == 'img') {
$element = 'img';
$attr = 'src';
}
else {
// wrong tag so return unchanged
return $html;
}
// this checks for all the "yada.*"
$html = preg_replace('/(<'.$element.'\b.+'.$attr.'=")(?!http)([^"]*)(".*>)/', '$1'.$prepend.'$2$3$4', $html);
// this checks for all the 'yada.*'
$html = preg_replace('/(<'.$element.'\b.+'.$attr.'='."'".')(?!http)([^"]*)('."'".'.*>)/', '$1'.$prepend.'$2$3$4', $html);
return $html;
}
}
I want my function to work regardless how badly formed the img element is.
It must work regardless the position of the src attribute.
The only thing it is supposed to do is to prepend the src value with something.
Also note that this preg_replace will not happen if the src value starts with http.
Right now, my code works only if my content is:
<div class="preload">
<img src="PRODUCTPAGE_files/like_icon_u10_normal.png" width="1" height="1"></img>
<img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u14_line.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u15_line.png" width="1" height="1"/>
As you probably can guess, it successfully does it but only for the first img element because it goes to the next line and there is no / at the end of the opening img tag.
Please advise how to improve my function.
UPDATE:
I used DOMDocument and it worked a treat!
After prepending the src values, I need to replace it with a php code snippet
So original:
<img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/>
After using DOMDocument and adding my prepend string:
<img src="prepended/PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1" />
Now I need to replace the whole thing with:
<?php echo $this->Html->img('prepended/PRODUCTPAGE_files/read_icon_u12_normal.png', array('width'=>'1', height='1')); ?>
Can I still use DOMDocument? Or I need to use preg_replace?
DomDocument was built to parse HTML no matter how messed up it is, rather then building your own HTML parser, why not use it ?
With a combination of DomDocument and XPath you can do it like this:
<?php
$html = <<<HTML
<script src="test"/><link href="test"/><div class="preload"><img src="PRODUCTPAGE_files/like_icon_u10_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/read_icon_u12_normal.png" width="1" height="1"/><img src="PRODUCTPAGE_files/line_u14_line.png" width="1" height="1"/><img width="1" height="1" src="httpPRODUCTPAGE_files/line_u14_line.png"/>
HTML;
$doc = new DOMDocument();
#$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$searchTags = $xpath->query('//img | //link | //script');
$length = $searchTags->length;
for ($i = 0; $i < $length; $i++) {
$element = $searchTags->item($i);
if ($element->tagName == 'link')
$attr = 'href';
else
$attr = 'src';
$src = $element->getAttribute($attr);
if (!startsWith($src, 'http'))
{
$element->setAttribute($attr, "whatever" . $src);
}
}
// this small function will check the start of a string
// with a given term, in your case http or http://
function startsWith($haystack, $needle)
{
return !strncmp($haystack, $needle, strlen($needle));
}
$result = $doc->saveHTML();
echo $result;
Here is a Live DEMO of it working.
If your HTML if messed up like missing ending tags, etc, you can use before #$doc->loadHTML($html);:
$doc->recover = true;
$doc->strictErrorChecking = false;
If you want the output formatted you can use before #$doc->loadHTML($html);:
$doc->formatOutput = true;
With XPath, we are only capturing the data you need to edit so we don't worry about other elements.
Keep in mind that if your HTML had missing tags for example body, html, doctype, head this will automatically add it however if you already had em it shouldn't do anything else.
However if u want to remove them you can use the below instead of just $doc->saveHTML();:
$result = preg_replace('~<(?:!DOCTYPE|/?(?:html|head|body))[^>]*>\s*~i', '', $doc->saveHTML());
If you want to replace the element with a new created element on it's place, you can use this:
$newElement = $doc->createElement($element->tagName, '');
$newElement->setAttribute($attr, "prepended/" . $src);
$myArrayWithAttributes = array ('width' => '1', 'height' => '1');
foreach ($myArrayWithAttributes as $attribute=>$value)
$newElement->setAttribute($attribute, $value);
$element->parentNode->replaceChild($newElement, $element);
By creating a fragment:
$frag = $doc->createDocumentFragment();
$frag->appendXML('<?php echo $this->Html->img("prepended/PRODUCTPAGE_files/read_icon_u12_normal.png", array("width"=>"1", "height"=>"1")); ?>');
$element->parentNode->replaceChild($frag, $element);
Live DEMO.
You can format the HTML with tidy:
$tidy = tidy_parse_string($result, array(
'indent' => TRUE,
'output-xhtml' => TRUE,
'indent-spaces' => 4
));
$tidy->cleanRepair();
echo $tidy;
$img = '<img src="http://some-img-link" alt="some-img-alt"/>';
$src = preg_match('/<img src=\"(.*?)\">/', $img);
echo $src;
I want to get the src value from the img tag and maybe the alt value
Assuming you are always getting the img html as you shown in the question.
Now in the regular expression you provided its saying that, after the src attribute its given the closing tag for img. But in the string there is an alt attribute also. So you need to care about it also.
/<img src=\"(.*?)\".*\/>/
And if you are going to check alt also then the regular expression.
/<img src=\"(.*?)\"\s*alt=\"(.*?)\"\/>/
Also you are just checking whether its matched or not. If you need to get the matches, you need to provide a third parameter to preg_match which will fill with the matches.
$img = '<img src="http://some-img-link" alt="some-img-alt"/>';
$src = preg_match('/<img src=\"(.*?)\"\s*alt=\"(.*?)\"\/>/', $img, $results);
var_dump($results);
Note : The regex given above is not so generic one, if you could provide the img strings which will occur, will provide more strong regex.
function scrapeImage($text) {
$pattern = '/src=[\'"]?([^\'" >]+)[\'" >]/';
preg_match($pattern, $text, $link);
$link = $link[1];
$link = urldecode($link);
return $link;
}
Tested Code :
$ input=’<img src= ”http://www.site.com/file.png” > ‘;
preg_match(“<img.*?src=[\"\"'](?<url>.*?)[\"\"'].*?>”,$input,$output);
echo $output; // output = http://www.site.com/file/png
How to extract img src, title and alt from html using php?
See the first answer on this post.
You are going to use preg_match, just in a slightly different way.
Try this code:
<?php
$doc = new DOMDocument();
$doc->loadHTML('<img src="" />');
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
?>
Also you could use this library: SimpleHtmlDom
<?php
$html = new simple_html_dom();
$html->load('<html><body><img src="image/profile.jpg" alt="profile image" /></body></html>');
$imgs = $html->find('img');
foreach($imgs as $img)
print($img->src);
?>
preg_match('/<img src=\("|')([^\"]+)\("|')[^\>]?>/', $img);
You already have good enough responses above, but here is another one code (more universal):
function retrieve_img_src($img) {
if (preg_match('/<img(\s+?)([^>]*?)src=(\"|\')([^>\\3]*?)\\3([^>]*?)>/is', $img, $m) && isset($m[4]))
return $m[4];
return false;
}
You can use JQuery to get src and alt attributes
include jquery in header
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js">
</script>
//HTML
//to get src and alt attributes
<script type='text/javascript'>
// src attribute of first image with id =imgId
var src1= $('#imgId1').attr('src');
var alt1= $('#imgId1').attr('alt');
var src2= $('#imgId2').attr('src');
var alt2= $('#imgId2').attr('alt');
</script>