Not able to get image src using regex

Not able to get image src using regex - php

I am using below regex to append a element in front of image tag, but it's not working. I took this code from Add link around img tags with regexp
preg_replace('#(<img[^>]+ src="([^"]*)" alt="[^"]*" />)#', '<a href="$2" ...>$1</a>', $str)
However, If I use below code without src, it works.
preg_replace('#(<img[^>]+ alt="[^"]*" />)#', '<a href="" ...>$1</a>', $str)
Any reason why I am not able to get the src from the image tag.
My image tag is <img src="" alt="">

A better way to do something like this is to use PHP's DOMDocument class as it is independent of how people write their HTML (e.g. putting the alt attribute before the src attribute). Something like this would work for your case:
$html = '<div id="x"><img src="/images/xyz" alt="xyz" /><p>hello world!</p></div>';
$doc = new DomDocument();
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DomXPath($doc);
$images = $xpath->query('//img');
foreach ($images as $img) {
// create a new anchor element
$a = $doc->createElement('a');
// copy the img src attribute to the a href attribute
$a->setAttribute('href', $img->attributes->getNamedItem('src')->nodeValue);
// add the a to the images parent
$img->parentNode->replaceChild($a, $img);
// make the image a child of the <a> element
$a->appendChild($img);
}
echo $doc->saveHTML();
Output:
<div id="x"><img src="/images/xyz" alt="xyz"><p>hello world!</p></div>
Demo on 3v4l.org

Related

Add specific class to img tags if missing

I am using Nette PHP (framework shouldn't matter), and I'm trying to replace parts of html with different one - if image tag has class=, it will be replaced with class="image-responsive, and if not it will get a new attribute class="image-responsive".
I'm getting that HTML as a string, which will be saved in database!
This is my current code. It can find the strings, but what I need help with is replacing parts of the html.
public static function ImageAddClass($string)
{
// Match Img with class="$1 (group 1 here)"
$regex_img = '/(<img)([^>]*[^>]*)(\/>)/mi';
$regex_imgClass = '/(<img[^>]* )(class=\")([^\"]*\"[^>]*>)/mi';
$html = $string;
if (preg_match_all($regex_img, $html, $matches)) {
for ($x = 0; $x < count($matches[0]); $x++) {
bdump($matches[0]);
bdump($matches[0][$x]);
bdump($x);
if (preg_match($regex_imgClass, $matches[0][$x])) {
$html = preg_replace($regex_imgClass, '$1class="image-responsiveO $3', $html);
} else if (preg_match($regex_img, $matches[0][$x])) {
$html = preg_replace($regex_img, '$1 class="image-responsiveN" $2$3', $html);
}
}
return $html;
}
}

Covering all scenarios where an img tag might have no class attribute, an orphaned class attribute, a blank class attribute, a class attribute with one or more other words, and a class attribute that already contains image-responsive -- I prefer to use XPath to filter the elements.
Not only is parsing HTML with a legitimate DOM parser like DOMDocument more robust/reliable than regex, the accompanying XPath syntax is highly intuitive.
Pay close attention to how the XPath query pads the haystack class and the needle class with spaces as a means to ensure whole word matching.
Any images that are iterated will have the desired value added to the element's class attribute.
Code: (Demo)
$html = <<<HTML
<div>
<img src="">
<img src="" class>
<img src="" class="image-responsive">
<img src="" class="">
<img src="image-responsive" class="classy">
<img src="" class="image-responsiveness">
<span class='NOT-responsive'></span>
<img src="" class = "foo image-responsive">
</div>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//img[not(contains(concat(" ", #class, " ")," image-responsive "))]') as $img) {
$img->setAttribute('class', ltrim($img->getAttribute('class') . ' image-responsive'));
}
echo $dom->saveHTML();
Output:
<div>
<img src="" class="image-responsive">
<img src="" class="image-responsive">
<img src="" class="image-responsive">
<img src="" class="image-responsive">
<img src="image-responsive" class="classy image-responsive">
<img src="" class="image-responsiveness image-responsive">
<span class="NOT-responsive"></span>
<img src="" class="foo image-responsive">
</div>
Related content:
Replace empty alt in wordpress post content with filter
Xpath syntax for "and not contains"
Parsing HTML with PHP To Add Class Names
How can I match on an attribute that contains a certain string?
As a slight variation, you can access all img tags without XPath, then use preg_match() calls to determine which tags should receive the new class. The word boundary character \b is not useful in this case because class names may contain non-word characters.
Code: (Demo)
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($dom->getElementsByTagName('img') as $img) {
$class = $img->getAttribute('class');
if (!preg_match('/(?:^| )image-responsive(?: |$)/', $class)) {
$img->setAttribute('class', ltrim("$class image-responsive"));
}
}
echo $dom->saveHTML();
// same output as first snippet

How to replace img src and link href in a document with a mustache expression?

I trying to replace the src,href value but with a small modified using regex
Simple example
//Find:
<img src="icons/google-icon.svg" >
//Replace to:
<img src="{{asset('icons/google-icon.svg')}}" >
//Find:
<link href="css/style.css">
//Replace to:
<link href="{{asset('css/style.css')}}">
/** etc... */
Now this is my regex:
//Find:
src\s*=\s*"(.+?)"
//Replace to:
src="{{ asset('$1') }}"
And its work very great actually but its only for src not [href,src], also I want to exclude any value that contains {{asset
Any idea? Thanks in advance

You can use an alternation to match src or href, and then a negative lookahead to assert that the src/href doesn't start with {{asset:
((?:src|href)\s*=\s*")((?!{{\s*asset)[^"]+)
Demo on regex101
This will also change href attributes inside <a> tags or elsewhere. If that is an issue, use a DOMDocument solution instead. Note that if your HTML is not just a snippet then you don't need to add the div tag around it in the call to loadHTML and the last line should be changed to echo substr($doc->saveXML(), 38);.
$html = <<<EOT
//Find:
<img src="icons/google-icon.svg" >
//Replace to:
<img src="{{asset('icons/google-icon.svg')}}" >
//Find:
<link href="css/style.css">
//Replace to:
<link href="{{asset('css/style.css')}}">
/** etc... */
<a href="http://www.example.com">
EOT;
$doc = new DOMDocument();
$doc->loadHTML("<div>$html</div>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//img') as $img) {
$src = $img->getAttribute('src');
if (preg_match('/^(?!{{\s*asset).*$/', $src, $m)) {
$img->setAttribute('src', "{{asset('" . $m[0] . ")'}}");
}
}
foreach ($xpath->query('//link') as $link) {
$href = $link->getAttribute('href');
if (preg_match('/^(?!{{\s*asset).*$/', $href, $m)) {
$link->setAttribute('href', "{{asset('" . $m[0] . ")'}}");
}
}
// strip XML header and added <div> tag
echo substr($doc->saveXML(), 44, -6);
Output:
//Find:
<img src="{{asset('icons/google-icon.svg)'}}"/>
//Replace to:
<img src="{{asset('icons/google-icon.svg')}}"/>
//Find:
<link href="{{asset('css/style.css)'}}"/>
//Replace to:
<link href="{{asset('css/style.css')}}"/>
/** etc... */
<a href="http://www.example.com"/>
Demo on 3v4l.org

Nick is correct that this can/should be done with DomDocument.
Also worth mentioning is a buggy side-effect when adding curly braces to the attribute strings (they get encoded) when using saveHTML() to access the mutated document. To workaround this, use saveXML() and just trim away the xml tag that is prepended to the document.
I am wrapping your sample tags in a parent tag so that DomDocument can function normally and not mangle your document structure. This might be an unnecessary precaution for your project.
My snippet directly targets the qualifying attributes using XPath and replaces their values without any regex. The pipe (|) in my xpath expression means "or" -- so it targets the img tags' src attribute OR the link tags' href attribute.
Code: (Demo)
$html = <<<HTML
<div>
<img src="icons/example.svg">
a link
<link href="css/example.css">
<iframe src="http://www.example.com/default.htm"></iframe>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//img/#src | //link/#href') as $attr) {
$attr->value = "{{asset('" . $attr->value . "')}}";
}
echo substr($dom->saveXML(), 38); // remove the auto-generated xml tag from the start
Output:
<div>
<img src="{{asset('icons/example.svg')}}"/>
a link
<link href="{{asset('css/example.css')}}"/>
<iframe src="http://www.example.com/default.htm"/>
</div>
Whoops, I just saw the last request in your question.
The implementation of not() and starts-with() are applied to both tags to disqualify elements that are already converted to mustache code.
New xpath expression: (Demo)
//img[not(starts-with(#src,"{{asset"))]/#src | //link[not(starts-with(#href,"{{asset"))]/#href

appendXML stripping out img element

I need to insert an image with a div element in the middle of an article. The page is generated using PHP from a CRM. I have a routine to count the characters for all the paragraph tags, and insert the HTML after the paragraph that has the 120th character. I am using appendXML and it works, until I try to insert an image element.
When I put the <img> element in, it is stripped out. I understand it is looking for XML, however, I am closing the <img> tag which I understood would help.
Is there a way to use appendXML and not strip out the img elements?
$mcustomHTML = "<div style="position:relative; overflow:hidden;"><img src="https://s3.amazonaws.com/a.example.com/image.png" alt="No image" /></img></div>";
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $content);
// read all <p> tags and count the text until reach character 120
// then add the custom html into current node
$pTags = $doc->getElementsByTagName('p');
foreach($pTags as $tag) {
$characterCounter += strlen($tag->nodeValue);
if($characterCounter > 120) {
// this is the desired node, so put html code here
$template = $doc->createDocumentFragment();
$template->appendXML($mcustomHTML);
$tag->appendChild($template);
break;
}
}
return $doc->saveHTML();

This should work for you. It uses a temporary DOM document to convert the HTML string that you have into something workable. Then we import the contents of the temporary document into the main one. Once it's imported we can simply append it like any other node.
<?php
$mcustomHTML = '<div style="position:relative; overflow:hidden;"><img src="https://s3.amazonaws.com/a.example.com/image.png" alt="No image" /></div>';
$customDoc = new DOMDocument();
$customDoc->loadHTML($mcustomHTML, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$doc = new DOMDocument();
$doc->loadHTML($content);
$customImport = $doc->importNode($customDoc->documentElement, true);
// read all <p> tags and count the text until reach character 120
// then add the custom html into current node
$pTags = $doc->getElementsByTagName('p');
foreach($pTags as $tag) {
$characterCounter += strlen($tag->nodeValue);
if($characterCounter > 120) {
// this is the desired node, so put html code here
$tag->appendChild($customImport);
break;
}
}
return $doc->saveHTML();

Regex extract image links

I am reading a html content. There are image tags such as
<img onclick="document.location='http://abc.com'" src="http://a.com/e.jpg" onload="javascript:if(this.width>250) this.width=250">
or
<img src="http://a.com/e.jpg" onclick="document.location='http://abc.com'" onload="javascript:if(this.width>250) this.width=250" />
I tried to reformat this tags to become
<img src="http://a.com/e.jpg" />
However i am not successful. The codes i tried to build so far is like
$image=preg_replace('/<img(.*?)(\/)?>/','',$image);
anyone can help?

Here's a version using DOMDocument that removes all attributes from <img> tags except for the src attribute. Note that doing a loadHTML and saveHTML with DOMDocument can alter other html as well, especially if that html is malformed. So be careful - test and see if the results are acceptable.
<?php
$html = <<<ENDHTML
<!doctype html>
<html><body>
<img onclick="..." src="http://a.com/e.jpg" onload="...">
<div><p>
<img src="http://a.com/e.jpg" onclick="..." onload="..." />
</p></div>
</body></html>
ENDHTML;
$dom = new DOMDocument;
if (!$dom->loadHTML($html)) {
throw new Exception('could not load html');
}
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//img') as $img) {
// unfortunately, cannot removeAttribute() directly inside
// the loop, as this breaks the attributes iterator.
$remove = array();
foreach ($img->attributes as $attr) {
if (strcasecmp($attr->name, 'src') != 0) {
$remove[] = $attr->name;
}
}
foreach ($remove as $attr) {
$img->removeAttribute($attr);
}
}
echo $dom->saveHTML();

Match one at a time then concat string, I am unsure which language you are using so ill explain in pseudo:
1.Find <img with regex place match in a string variable
2.Find src="..." with src=".*?" place match in a string variable
3.Find the end /> with \/> place match in a string variable
4.Concat the variables together

How to get img tag value inside a specific div and specific anchor tag using regular expression

I am new to regular expression i tried a lot for getting image tag value inside a anchor tag html
this is my html expresstion
<div class="smallSku" id="ctl00_ContentPlaceHolder1_smallImages">
<a title="" name="http://www.playg.in/productImages/med/PNC000051_PNC000051.jpg" href="http://www.playg.in/productImages/lrg/PNC000051_PNC000051.jpg" onclick="return showPic(this)" onmouseover="return showPic(this)">
<img border="0" alt="" src="http://www.playg.in/productImages/thmb/PNC000051_PNC000051.jpg"></a> <a title="PNC000051_PNC000051_1.jpg" name="http://www.playg.in/productImages/med/PNC000051_PNC000051_1.jpg" href="http://www.playg.in/productImages/lrg/PNC000051_PNC000051_1.jpg" onclick="return showPic(this)" onmouseover="return showPic(this)">
<img border="0" alt="PNC000051_PNC000051_1.jpg" src="http://www.playg.in/productImages/thmb/PNC000051_PNC000051_1.jpg"></a>
</div>
i want to return only the src value of image tag and i tried a matching pattern in "preg_match_all()" and the pattern was
"#<div[\s\S]class="smallSku"[\s\S]id="ctl00_ContentPlaceHolder1_smallImages"\><a title=\"\" name="[\w\W]" href="[\w\W]" onclick=\"[\w\W]" onmouseover="[\w\W]"\><img[\s\S]src="(.*)"[\s\S]></a><\/div>#"
please help i tried a lots of time for this also tried with this link too Match image tag not nested in an anchor tag using regular expression

Regular expression is not the right tool for parsing HTML. See this FAQ: How to parse and process HTML/XML?
Here is an example on how to get the src property using your example:
$doc = new DOMDocument();
$doc->loadHTML($your_html_string);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//div[#class="smallSku"]/a/img/#src') as $attr) {
$src = $attr->value;
print $src;
}

try this sunith
$content = file_get_contents('your url');
preg_match_all("|<div class='items'>.*</div>|", $content, $arr, PREG_PATTERN_ORDER);
preg_match_all("/src='([^']+)'/", $arr[0][0], $arrr, PREG_PATTERN_ORDER);
echo '<pre>';
print_r($arrr);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Not able to get image src using regex - php

Related

Add specific class to img tags if missing

How to replace img src and link href in a document with a mustache expression?

appendXML stripping out img element

Regex extract image links

How to get img tag value inside a specific div and specific anchor tag using regular expression

Categories

Resources