I want to modify the contents of an html file with php.
I am applying style to img tags, and I need to check if the tag already has a style attribute, if it has, I want to replace it with my own.
$pos = strpos($theData, "src=\"".$src."\" style=");
if (!$pos){
$theData = str_replace("src=\"".$src."\"", "src=\"".$src."\" style=\"width:".$width."px\"", $theData);
}
else{
$theData = preg_replace("src=\"".$src."\" style=/\"[^\"]+\"/", "src=\"".$src."\" style=\"width: ".$width."px\"", $theData);
}
$theData is the html source code I receive.
If a style attribute has not been found, I successfully insert my own style, but I think the problem comes when there is already a style attribute defined so my regex is not working.
I want to replace the style attribute with everything inside it, with my new style attribute.
How should my regex look?
Instead of using regex for this, you should use a DOM parser.
Example using DOMDocument:
<?php
$html = '<img src="http://example.com/image.jpg" width=""/><img src="http://example.com/image.jpg"/>';
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML('<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />'.$html);
$dom->formatOutput = true;
foreach ($dom->getElementsByTagName('img') as $item)
{
//Remove width attr if its there
$item->removeAttribute('width');
//Get the sytle attr if its there
$style = $item->getAttribute('style');
//Set style appending existing style if necessary, 123px could be your $width var
$item->setAttribute('style','width:123px;'.$style);
}
//remove unwanted doctype ect
$ret = preg_replace('~<(?:!DOCTYPE|/?(?:html|body|head))[^>]*>\s*~i', '', $dom->saveHTML());
echo trim(str_replace('<meta http-equiv="Content-Type" content="text/html;charset=utf-8">','',$ret));
//<img src="http://example.com/image.jpg" style="width:123px;">
//<img src="http://example.com/image.jpg" style="width:123px;">
?>
Here is the regexp variant of solving this problem:
<?php
$theData = "<img src=\"/image.png\" style=\"lol\">";
$src = "/image.png";
$width = 10;
//you must escape potential special characters in $src,
//before using it in regexp
$regexp_src = preg_quote($src, "/");
$theData = preg_replace(
'/src="'. $regexp_src .'" style=".*?"/i',
'src="'. $src .'" style="width: '. $width . 'px;"',
$theData);
print $theData;
prints:
<img src="/image.png" style="width: 10px;">
Regex expression:
(<[^>]*)style\s*=\s*('|")[^\2]*?\2([^>]*>)
Usage:
$1$3
Example:
http://rubular.com/r/28tCIMHs50
Search for:
<img([^>])style="([^"])"
and replace with:
<img\1style="attribute1: value1; attribute2: value2;"
http://regex101.com/r/zP2tV9
Related
How to remove all img tag in this php var? I have $text php var like this.
$text = '<p>test test test </p><p><img src="http://static.adzerk.net/Advertisers/f380ecc42410414693b467ac7a97901b.png" style="width: 728px;"><br></p><p>test test</p><p><img src="http://static.adzerk.net/Advertisers/f380ecc42410414693b467ac7a97901b.png" style="width: 728px;"><br></p>';
I want to remove all img tag in this $text php var using php, how can i do that ?
Using regex you can do it. Php preg_replace() can replace specific text with another. You can use it. The code replace all img tag with empty.
$text = preg_replace("/<img[^>]+>/", "", $text);
See result in demo
if you meant to extract attributes, try
$url="reffile.html";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $tag->getAttribute('src');
}
try this :
$xpath = new DOMXPath(#DOMDocument::loadHTML($html));
$src = $xpath->evaluate("string(//img/#src)");
In an HTML block such as this:
<p>Hello: <img src="hello/foo.png" /></p>
I need to transform the URL src of the image to a Laravel storage_path link. I'm using PHP DOMDocument to transform the url like so:
$link = $a->getAttribute('src');
$pat = '#(\w+\.\w+)+(?!.*(\w+)(\.\w+)+)#';
$matches = [];
preg_match($pat, $link, $matches);
$newStr = "{{ storage_path('app/' . " . $matches[0] . ") }}";
$a->setAttribute('src', $newStr);
The problem is that the output is src="%7B%7B%20storage_path('app/'%20.%20foo.png)%20%7D%7D"
How can I keep the special characters of the src attribute?
You can use something like:
$html = '<p>Hello: <img src="hello/foo.png" /></p>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$img = $dom->getElementsByTagName('img');
$img->item(0)->setAttribute('src', '{{ storage_path(\'app/\' . foo.png) }}');
#loadHTML causes a !DOCTYPE tag to be added, so remove it:
$dom->removeChild($dom->firstChild);
#it also wraps the code in <html><body></body></html>, so remove that:
$dom->replaceChild($dom->firstChild->firstChild->firstChild, $dom->firstChild);
$newImage = urldecode($dom->saveHTML());
//<p>Hello: <img src="{{ storage_path('app/' . foo.png) }}"></p>
Note:
In order to output the img src as you want, you'll need to use urldecode()
So I've been at this for quite a while, and the best I got is wrapping the image in a link and with a span after the image tag:
<a href="">
<img src="">
<span></span>
</a>
But wat I want is:
<a href="">
<span></span>
<img src="">
</a>
I tried al kinds of variatons and positions of
$img->parentNode->appendChild($dom->createElement('span'), $img);
and the use of insertBefore() on all kinds of places in my code and I'm completely out of ideas since I'm fairly new to the php DOM stuff. My source:
foreach($dom->getElementsByTagName('img') as $img)
{
$fancyHref = $dom->createElement('a');
$clone = $fancyHref->cloneNode();
$img->parentNode->replaceChild($clone, $img);
$clone->appendChild($img);
$img->parentNode->appendChild($dom->createElement('span'));
};
Update:
To clarify my goal: I have an img tag in the html. After it goes through the php dom I want the img tag wrapped in an a tag with a span tag before the image tag:
Before
<img src="" />
After
<a href="">
<span class=""></span>
<img src="" />
</a>
My code at the moment for doing this (without the span)
foreach($dom->getElementsByTagName('img') as $img)
{
$src = $img->getAttribute('src');
$filename = substr(strrchr($src , '/') ,1);
$filename = preg_replace('/^[.]*/', '', $filename);
$filename = explode('.', $filename);
$filename = $filename[0];
if($this->imagesTitles[$this->currentLanguage][$filename] !== '')
{
$img->setAttribute('title', $this->imagesTitles[$this->currentLanguage][$filename]);
$img->setAttribute('alt', $this->imagesTitles[$this->currentLanguage][$filename]);
}
else
{
$img->removeAttribute('title');
$img->removeAttribute('alt');
}
$classes = explode(' ', $img->getAttribute('class'));
if(!in_array('no-enlarge', $classes))
{
$fancyHref = $dom->createElement('a');
$span = $dom->createElement('span');
$span->setAttribute('class', 'magnifier');
$fancyHref->setAttribute('class', 'enlarge');
$fancyHref->setAttribute('rel', 'enlarge');
$fancyHref->setAttribute('href', $img->getAttribute('src'));
if($img->getAttribute('title') !== '')
{
$fancyHref->setAttribute('title', $img->getAttribute('title'));
$fancyHref->setAttribute('alt', $img->getAttribute('title'));
}
$clone = $fancyHref->cloneNode();
$img->parentNode->replaceChild($clone, $img);
$clone->appendChild($img);
$img->parentNode->insertBefore($span, $img);
}
$img->setAttribute('class', trim(str_replace('no-enlarge', '', $img->getAttribute('class'))));
if($img->getAttribute('class') === '')
{
$img->removeAttribute('class');
}
}
You can use the DOMNode->insertBefore() method (docs). It has the form $parentNode->insertBefore( $nodeToBeInserted, $nodeToInsertBefore ). Your code would be:
$img->parentNode->insertBefore( $dom->createElement('span'), $img );
DOMNode->appendChild() (docs) only has 1 argument (the inserted node), and this node will always be inserted after the last childNode.
Edit:
I've now tested my code, and if you would replace $img->parentNode->appendChild($dom->createElement('span')); with my line in your test case, it would end up with the correct format. You are however manipulating elements in a very confusing way. Besides that, if I test your updated code, I either end up with your desired format, or without a span element at all...
The element you want to replace is the image, because that is the only element that is originally in the document. Therefore, you should clone -that- element. While your current code eliminates the hierarchy error, you are copying all the changed code, instead of just the conflicting element and that is a waste of memory and time. When you copy the conflicting element, it is easy. You append the elements to your a in the order you want them to appear. When you append the image, append the clone instead. Do not manipulate $img. If you need to manipulate the image, you have to manipulate the clone instead. Then you just replace $img by the elements you manipulated ($fancyHref in your case).
$html = '<img src="">';
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('img') as $img) {
$fancyHref = $dom->createElement('a');
$clone = $img->cloneNode();
$span = $dom->createElement( 'span' );
#... do whatever you need to do to this span, the clone and the a element ...
#... when you are done, you can simply append all elements you have been manipulating ...
$fancyHref->appendChild( $span );
$fancyHref->appendChild( $clone );
$img->parentNode->replaceChild( $fancyHref, $img );
};
echo $dom->saveHTML();
Ummm. I know this may not classify as an answer, but don't you already have it? I mean.. I tried your code:
$dom = new DOMDocument;
$dom->loadHTMLFile("data.html");
foreach($dom->getElementsByTagName('img') as $img){
$src = $img->getAttribute('src');
$filename = substr(strrchr($src , '/') ,1);
$filename = preg_replace('/^[.]*/', '', $filename);
$filename = explode('.', $filename);
$filename = $filename[0];
$classes = explode(' ', $img->getAttribute('class'));
if(!in_array('no-enlarge', $classes))
{
$fancyHref = $dom->createElement('a');
$span = $dom->createElement('span');
$span->setAttribute('class', 'magnifier');
$fancyHref->setAttribute('class', 'enlarge');
$fancyHref->setAttribute('rel', 'enlarge');
$fancyHref->setAttribute('href', $img->getAttribute('src'));
if($img->getAttribute('title') !== '')
{
$fancyHref->setAttribute('title', $img->getAttribute('title'));
$fancyHref->setAttribute('alt', $img->getAttribute('title'));
}
$clone = $fancyHref->cloneNode();
$img->parentNode->replaceChild($clone, $img);
$clone->appendChild($img);
$img->parentNode->insertBefore($span, $img);
}
$img->setAttribute('class', trim(str_replace('no-enlarge', '', $img->getAttribute('class'))));
if($img->getAttribute('class') === ''){
$img->removeAttribute('class');
}
}
echo "<pre>" . htmlentities($dom->saveHTML()) . "</pre>";
With this (data.html) as source data:
<!doctype html>
<html>
<head>
</head>
<body>
<img src="http://lorempixel.com/g/400/200/" alt="alts" title="tits">
</body>
</html>
And the result is this:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<a class="enlarge" rel="enlarge" href="http://lorempixel.com/g/400/200/" title="tits" alt="tits">
<span class="magnifier">
</span>
<img src="http://lorempixel.com/g/400/200/" alt="alts" title="tits"></a>
</body>
</html>
So.. Isn't that what you wanted? :D
$img = '<img src="http://some-img-link" alt="some-img-alt"/>';
$src = preg_match('/<img src=\"(.*?)\">/', $img);
echo $src;
I want to get the src value from the img tag and maybe the alt value
Assuming you are always getting the img html as you shown in the question.
Now in the regular expression you provided its saying that, after the src attribute its given the closing tag for img. But in the string there is an alt attribute also. So you need to care about it also.
/<img src=\"(.*?)\".*\/>/
And if you are going to check alt also then the regular expression.
/<img src=\"(.*?)\"\s*alt=\"(.*?)\"\/>/
Also you are just checking whether its matched or not. If you need to get the matches, you need to provide a third parameter to preg_match which will fill with the matches.
$img = '<img src="http://some-img-link" alt="some-img-alt"/>';
$src = preg_match('/<img src=\"(.*?)\"\s*alt=\"(.*?)\"\/>/', $img, $results);
var_dump($results);
Note : The regex given above is not so generic one, if you could provide the img strings which will occur, will provide more strong regex.
function scrapeImage($text) {
$pattern = '/src=[\'"]?([^\'" >]+)[\'" >]/';
preg_match($pattern, $text, $link);
$link = $link[1];
$link = urldecode($link);
return $link;
}
Tested Code :
$ input=’<img src= ”http://www.site.com/file.png” > ‘;
preg_match(“<img.*?src=[\"\"'](?<url>.*?)[\"\"'].*?>”,$input,$output);
echo $output; // output = http://www.site.com/file/png
How to extract img src, title and alt from html using php?
See the first answer on this post.
You are going to use preg_match, just in a slightly different way.
Try this code:
<?php
$doc = new DOMDocument();
$doc->loadHTML('<img src="" />');
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
?>
Also you could use this library: SimpleHtmlDom
<?php
$html = new simple_html_dom();
$html->load('<html><body><img src="image/profile.jpg" alt="profile image" /></body></html>');
$imgs = $html->find('img');
foreach($imgs as $img)
print($img->src);
?>
preg_match('/<img src=\("|')([^\"]+)\("|')[^\>]?>/', $img);
You already have good enough responses above, but here is another one code (more universal):
function retrieve_img_src($img) {
if (preg_match('/<img(\s+?)([^>]*?)src=(\"|\')([^>\\3]*?)\\3([^>]*?)>/is', $img, $m) && isset($m[4]))
return $m[4];
return false;
}
You can use JQuery to get src and alt attributes
include jquery in header
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js">
</script>
//HTML
//to get src and alt attributes
<script type='text/javascript'>
// src attribute of first image with id =imgId
var src1= $('#imgId1').attr('src');
var alt1= $('#imgId1').attr('alt');
var src2= $('#imgId2').attr('src');
var alt2= $('#imgId2').attr('alt');
</script>
I'm trying to find a regular expression that would allow me replace the SRC attribute in an image. Here is what I have:
function getURL($matches) {
global $rootURL;
return $rootURL . "?type=image&URL=" . base64_encode($matches['1']);
}
$contents = preg_replace_callback("/<img[^>]*src *= *[\"']?([^\"']*)/i", getURL, $contents);
For the most part, this works well, except that anything before the src=" attribute is eliminated when $contents is echoed to the screen. In the end, SRC is updated properly and all of the attributes after the updated image URL are returned to the screen.
I am not interested in using a DOM or XML parsing library, since this is such a small application.
How can I fix the regex so that only the value for SRC is updated?
Thank you for your time!
Use a lazy star instead of a greedy one.
This may be your problem:
/<img[^>]*src *= *[\"']?([^\"']*)/
^
Change it to:
/<img[^>]*?src *= *[\"']?([^\"']*)/
This way, the [^>]* matches the smallest possible number of your bracket expression, rather than the largest possible.
Do another grouping and prepend it to the return value?
function getURL($matches) {
global $rootURL;
return $matches[1] . $rootURL . "?type=image&URL=" . base64_encode($matches['2']);
}
$contents = preg_replace_callback("/(<img[^>]*src *= *[\"']?)([^\"']*)/i", getURL, $contents);
I am not interested in using a DOM or XML parsing library, since this is such a small application.
Nevertheless, that is the correct approach regardless of your application size.
Remember, when you modify elements with DOMDocument, you should iterate in reverse to avoid unexpected oddities - in particular if you remove anything.
Here's a working example using DOMDocument. It's more complicated than a regex, but not terribly difficult and a lot more flexible and robust for any other tweaking the may be required.
function inner_html($node) {
$innerHTML = "";
foreach ($node->childNodes as $child) {
$innerHTML .= $node->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
function replace_src($html) {
$rootURL = 'https://example.com';
$dom = new DOMDocument();
if (mb_detect_encoding($html, 'UTF-8', true) == 'UTF-8') {
$html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');
}
$dom->loadHTML('<body>' . $html . '</body>', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
for ($els = $dom->getElementsByTagname('img'), $i = $els->length - 1; $i >= 0; $i--) {
$src = $els->item($i)->getAttribute('src');
$els->item($i)->setAttribute('src', $rootURL . '?type=image&URL=' . $src);
}
return inner_html($dom->documentElement);
}
$html = '
<div>
<img src="test123">
<img src="test456">
</div>
';
echo replace_src($html);
OUTPUT:
<div>
<img src="https://example.com?type=image&URL=test123">
<img src="https://example.com?type=image&URL=test456">
</div>
You can check for spaces too
Use this:
/<\s*img[^>]*?src\s*=\s*(["'])([^"']+)\1[^>]*?>/giu
https://regex101.com/r/jmMoio/1