Parse HTML tags - php

I fetches from value from db like:
<p><img alt="" src="images/1.jpg" style="width: 2450px; height: 1054px;" /></p>
and wants to only get src="images/1.jpg" but don't know how. Please guide me

If you need the source, use a DOM Parser:
// Construct a new DOMDocument with your fragment
$domDoc = new DOMDocument;
$domDoc->loadHTML( '<p><img src="images/1.jpg" style="width: 2450px;" /></p>' );
// Locate the first image the document
$img = $domDoc->getElementsByTagName( "img" )->item( 0 );
// Echo its src value
echo $img->attributes->getNamedItem( "src" )->nodeValue;
Results: http://codepad.org/oMXGK9Iu
Ideally you would ensure the image elements exist before accessing items #0. Likewise, you would ensure the attributes exist before just leaping out and grabbing them.
Further reading: http://www.php.net/manual/en/class.domdocument.php
If you just want to grab that particular portion of the text, you could use a simple regular expression:
// Prep our html
$html = '<p><img src="images/1.jpg" style="width: 2450px;" /></p>';
// Look for the source string
preg_match( '/src=\".*?\"/', $html, $matches );
// If we found it, spit it out.
echo $matches ? $matches[0] : "No source";

if alt="" is empty by default and style is width: 2450px; height: 1054px; by default you could use:
<?php
$str = '<p><img alt="" src="images/1.jpg" style="width: 2450px; height: 1054px;" /></p>';
$str = str_replace('<p><img alt="" src="','', $str);
$str = str_replace('" style="width: 2450px; height: 1054px;" /></p>','',$str);
echo $str; //Outputs: images/1.jpg
?>

Related

How to detect data:image tag

I have summernote WYSIWYG plugin, Now whenever i add any images it converts the image into
<img data-filename="Untitled-1.png" src="" style="width: 645px;">
Now all I want is to detect this first tag and get it's src value & store it in db to show it as a featured image
for e.g if there are two img data-file-name tags
<img data-filename="Untitled-1.png" src="" style="width: 645px;">
<img data-filename="Untitled-2.png" src="" style="width: 645px;">
I want to get the src value of Untitled-1.png only, not the Untitled-2.png,
Here is what I've tried
preg_match('/(<img .*?>)/', $go, $img_tag);
$feature = $img_tag[0];
Use DOMDocument and DOMXPath to easily target what you want using the HTML structure:
$content = <<<'EOD'
<img data-filgename="Untitled-1.png" src="" style="width: 645px;">
<img data-filgename="Untitled-2.png" src="" style="width: 645px;">
EOD;
$dom = new DOMDocument;
$dom->loadHTML($content);
$xp = new DOMXPath($dom);
$result = $xp->evaluate('string(//img[#data-filename]/#src)');
# img node anywhere --------^ ^ ^---- src attribute
# in the DOM tree '---- predicate: must have a
# data-filename attribute
if (!empty($result))
echo $result, PHP_EOL;

Regular Expression to ignore a link text

I have the following code:
<p> <img src="spas01.jpg" alt="" width="630" height="480"></p>
<p style="text-align: right;">Spas</p>
<p>My Site Content [...]</p>
I need a regular expression to get only the "My Site Content [...]".
So, i need to ignore first image (and maybe other) and links.
Try This:
Use (?<=<p>)([^><]+)(?=</p>) or <p>\K([^><]+)(?=</p>)
Update
$re = "#<p>\\K([^><]+)(?=</p>)#m";
$str = "<p> <img src=\"spas01.jpg\" alt=\"\" width=\"630\" height=\"480\"></p>\n<p style=\"text-align: right;\">Spas</p>\n<p>My Site Content [...]</p>";
preg_match_all($re, $str, $matches);
Demo
With DOMDocument and DOMXPath:
$html = <<<'EOD'
<p> <img src="spas01.jpg" alt="" width="630" height="480"></p>
<p style="text-align: right;">Spas</p>
<p>My Site Content [...]</p>
EOD;
$dom = new DOMDocument;
$dom->loadHTML($html);
$xp = new DOMXPath($dom);
$query = '//p//text()[not(ancestor::a)]';
$textNodes = $xp->query($query);
foreach ($textNodes as $textNode) {
echo $textNode->nodeValue . PHP_EOL;
}

preg-replace image width, height and style

My image looks like this:
<img alt="" width="146" height="109" src="http://url.to/src.jpg" style="float:left" />
but i can't figure out how to bring it with preg_replace or preg_replace_callback to this:
<img alt="" src="http://url.to/src.jpg" style="width:146;height:109;float:left">
This works with height and width but I can't get the style-element "float:left" added
$html='<img alt="" width="146" height="109" src="http://url.to/src.jpg" style="float:left" />';
$pattern = ('/<img[^>]*width="(\d+)"\s+height="(\d+)">/');
preg_match($pattern, $html, $matches);
$style = "<img style=\"width:".$matches[1]."px; height:".$matches[2]."px;\"";
$html = preg_replace($pattern, $style, $html);
result of this will be
<img alt="" style="width:146;height:109" src="http://url.to/src.jpg" style="float:left">
which didn't work because of the double style element
Try the following regular expression
<?php
$html='<img alt="" width="146" height="109" src="http://placehold.it/140x200" style="float:left" />';
$pattern = '/(<img.*)width="(\d+)" height="(\d+)"(.*style=")(.*)" \/(>)/';
$style = '$1$4width:$2px;height:$3px;$5';
$html = preg_replace($pattern, $style, $html);
echo $html; //view source of page to see the code change
?>
Note the use of brackets '(' ')' to create groups matched that can be later referenced using $1 $2 etc go to regex101.com and try out the regular expression.
Above code will result in following, except the last part, that shouldn't matter but you can modify it further.
<img alt="" src="http://placehold.it/140x200" style="width:146;height:109;float:left" />

Make all img tags' src attributes contain absolute paths

I am trying to get/replace the image source link from the page.
Some of the page has image src='image/abc.png' so my regex fails.
What I want to do is: append the subdirectory path to main url if absolute path is not given.
i.e. if src='image/abc.png and main url is http://example.com
then it should transformed to http://example.com/image/abc.png
Note: some user may enter the url name like http://example.com/ so if I append as I did above then it will give:
http://example.com//image/abc.png which is wrong.
Can someone give me correct directions to form the exact absolute path of image?
My code:
<?php
function get_logo($html, $url) {
if (preg_match_all('/\bhttps?:\/\/\S+(?:png|jpg)\b/', $html, $matches)) {
echo "First:";
return $matches[0][0];
} else {
if (preg_match_all('~\b((\w+ps?://)?\S+(png|jpg))b~im', $html, $matches)) {
echo "Second: ";
echo $matches[0][0];
return url_to_absolute($url, $matches[0][0]);
//return $matches[0][0];
} else
return null;
}
}
Definitely don't use regex for this task. Using a combination of DOMDocument and XPath will make quick work of this task and the syntax is rather intuitive. If the src attribute of any <img> tag does not start with your pre-declared domain, then trim any forward slashes from the front of the src value and prepend the domain to form the absolute path.
Code: (Demo)
$html = <<<HTML
<div>
<img src="image/abc.png" alt="test" width="50" height="50">
<img src="http://example.com/image/abc.png" alt="test" width="50" height="50">
<img src="/image/abc.png" alt="test" width="50" height="50">
<iframe src="image/abc.png" alt="test" width="50" height="50"></iframe>
</div>
HTML;
$base = "http://example.com/";
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//img[not(starts-with(#src, '$base'))]") as $node) {
$node->setAttribute('src', $base . ltrim($node->getAttribute('src'), '/'));
}
echo $dom->saveHTML();
Output:
<div>
<img src="http://example.com/image/abc.png" alt="test" width="50" height="50">
<img src="http://example.com/image/abc.png" alt="test" width="50" height="50">
<img src="http://example.com/image/abc.png" alt="test" width="50" height="50">
<iframe src="image/abc.png" alt="test" width="50" height="50"></iframe>
</div>

Strip tags, but keep the first one

How can I keep for example the first img tag but strip all the others?
(from a HTML string)
example:
<p>
some text
<img src="aimage.jpg" alt="desc" width="320" height="200" />
<img src="aimagethatneedstoberemoved.jpg" ... />
</p>
so it should be just:
<p>
some text
<img src="aimage.jpg" alt="desc" width="320" height="200" />
</p>
The function from this example can be used to keep the first N IMG tags, and removes all the other <img>s.
// Function to keep first $nrimg IMG tags in $str, and strip all the other <img>s
// From: http://coursesweb.net/php-mysql/
function keepNrImgs($nrimg, $str) {
// gets an array with al <img> tags from $str
if(preg_match_all('/(\<img[^\>]+\>)/i', $str, $mt)) {
// gets array with the <img>s that must be stripped ($nrimg+), and removes them
$remove_img = array_slice($mt[1], $nrimg);
$str = str_ireplace($remove_img, '', $str);
}
return $str;
}
// Test, keeps the first two IMG tags in $str
$str = 'First img: <img src="img1.jpg" alt="img 1" width="30" />, second image: <img src="img_2.jpg" alt="img 2" width="30">, another Img tag <img src="img3.jpg" alt="img 3" width="30" />, etc.';
$str = keepNrImgs(2, $str);
echo $str;
/* Output:
First img: <img src="img1.jpg" alt="img 1" width="30" />, second image: <img src="img_2.jpg" alt="img 2" width="30">, another Img tag , ... etc.
*/
You might be able to accomplish this with a complex regex string, however my suggestion would be to use preg_replace_callback, particularly if you are on php 5.3+ and here's why. http://www.php.net/manual/en/function.preg-replace-callback.php
$tagTracking = array();
preg_replace_callback('/<[^<]+?(>|/>)/', function($match) use($tagTracking) {
// your code to track tags here, and apply as you desire.
});

Categories