I've got img tag in my text and I want to get the name of the file from src
So I use this code
preg_match_all("|\/img\/(.*)\/>|U", $article_header, $matches, PREG_PATTERN_ORDER);
echo "match=".$matches[1][0]."<br/>";
Doing so I get this as a result
match=500.JPG\" alt=\"\" width=\"500\" height=\"360\"
So in this case I use "\/>" which means the end of tag.
But I want only name of the file "500.JPG" So I must use "\" but when I do it
preg_match_all("|\/img\/(.*)\\|U", $article_header, $matches, PREG_PATTERN_ORDER);
I get no matches :(
Please help
With the help of yes123 I did this
$doc = new DOMDocument();
$doc->loadHTML($article_header);
$imgs = $doc->getElementsByTagName('img');
$img_src = array();
foreach ($imgs as $img) {
// Store the img src
$img_src[] = $img->getAttribute('src');
echo $img_src[0];
}
which gives me this
\"sources/public/users/qqqqqq/articles/2011-06-11/7/img/500.JPG\"
But now anyway I want only 500.JPG from this
So what is the right regexp ?
To match a real backslash-char in regex, you have to 'double-escape' it, that means 4 backslashes to match a single backslash: \\\\
preg_match_all("|/img/(.*)\\\\|U", ...);
preg_match_all('/<img[^>*]src="([^"]+)".*>/Uis', $article_header, $matches)
You can't parse HTML with regex.
Use DOMDocument
// HTML already parsed into $dom
$imgs = $dom->getElementsByTagName('img');
$img_src = array();
foreach ($imgs as $img) {
// Store the img src
$img_src[] = $img->getAttribute('src');
}
Don't forget you can always search google or stackoverflow before opening a question
Try something like, I tested it now:
$article_header = 'foo <img src=\\"sources/public/users/qqqqqq/articles/2011-06-11/7/img/500.JPG\\" /> foo';
preg_match_all('|<img[^>]+?src="[^"]*?([^/"]+?)"|', stripslashes($article_header), $matches, PREG_PATTERN_ORDER);
echo "match=".$matches[1][0]."<br/>";
It seems that you have $article_header with slashes (that was a bit irritating), so I added an stripslashes().
use php function pathinfo
http://php.net/manual/en/function.pathinfo.php
pathinfo($img_src[0]);
result
Array
(
[dirname] => sources/public/users/qqqqqq/articles/2011-06-11/7/img/
[basename] => 500.JPG
[extension] => JPG
[filename] => 500
)
Related
This question already has answers here:
Using regex to remove HTML tags
(5 answers)
Closed 9 years ago.
If I have a string like so:
Hi this is a photo of me <img src='myself.jpg' alt='pic of me' />. Another pic of me <img src='abc.jpg'/>
How can I turn that into:
Hi this is a photo of me (myself). Another pic of me (image)
Basically I want to remove all images from a string and replace them with their alt tag if it had one. If it doesn't it should say just 'image'.
I'd use a DOM parser instead of regex. Here's how:
Load the HTML string using loadHTML()
Use getElementsByTagName() to get all the images
Loop through them and check if the image has an alt attribute.
If the image has an alt attribute, set the value of $replacement variable as the alt attribute.
If the image doesn't have an alt attribute, set the $replacement to (image).
Use replaceChild() to replace the node with the newly created text node:
Code:
$html = <<<HTML
Hi this is a photo of me <img src='myself.jpg' alt='pic of me' />
another pic of me <img src='abc.jpg'/>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
$i = $images->length - 1;
while ($i > -1) {
$node = $images->item($i);
if ($node->hasAttribute('alt')) {
$replacement = '('.$node->getAttribute('alt').')';
}
else {
$replacement = '(image)';
}
$text = $dom->createTextNode($replacement."\n");
$node->parentNode->replaceChild($text, $node);
$i--;
}
echo strip_tags($dom->saveHTML());
Output:
Hi this is a photo of me (pic of me)
another pic of me (image)
Demo.
Something like this should work:
preg_match_all('/\<img[^\>]*\>/', $yourString, $matches);
foreach ($matches as $match)
{
$replacement = 'image';
if (preg_match('/alt=\'([^\']+)\'/', $match, $matches2))
$replacement = $matches2[1];
$yourString = str_replace($match, '('.$replacement.')', $yourString);
}
What it does: finds all img tags and gets them to $matches array. Cycles through them and looks for alt value. If one exists the IMG tag is replaces with (ALT VALUE) otherwise it's replaced with (image).
I'm trying to find ALL images in my blog posts with regex. The code below returns images IF the code is clean and the SRC tag comes right after the IMG tag. However, I also have images with other attributes such as height and width. The regex I have does not pick that up... Any ideas?
The following code returns images that looks like this:
<img src="blah_blah_blah.jpg">
But not images that looks like this:
<img width="290" height="290" src="blah_blah_blah.jpg">
Here is my code
$pattern = '/<img\s+src="([^"]+)"[^>]+>/i';
preg_match($pattern, $data, $matches);
echo $matches[1];
Use DOM or another parser for this, don't try to parse HTML with regular expressions.
$html = <<<DATA
<img width="290" height="290" src="blah.jpg">
<img src="blah_blah_blah.jpg">
DATA;
$doc = new DOMDocument();
$doc->loadHTML($html); // load the html
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//img');
foreach ($imgs as $img) {
echo $img->getAttribute('src') . "\n";
}
Output
blah.jpg
blah_blah_blah.jpg
Ever think of using the DOM object instead of regex?
$doc = new DOMDocument();
$doc->loadHTML('<img src="http://example.com/img/image.jpg" ... />');
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
You'd better to use a parser, but here is a way to do with regex:
$pattern = '/<img\s.*?src="([^"]+)"/i';
The problem is that you only accept \s+ after <img. Try this instead:
$pattern = '/<img\s+[^>]*?src="([^"]+)"[^>]+>/i';
preg_match($pattern, $data, $matches);
echo $matches[1];
Try this:
$pattern = '/<img\s.*?src=["\']([^"\']+)["\']/i';
Single or double quote and dynamic src attr position.
I am using preg_replace to delete from $content certain <img>:
$content=preg_replace('/(?!<img.+?id="img_menu".*?\/>)(?!<img.+?id="featured_img".*?\/>)<img.+?\/>/','',$content);
When I am now displaying the content using wordpress the_content function, I did indeed remove the <img>s from $content:
I'd like beforehand to get this images to place them elsewhere in the template. I am using the same regex pattern with preg_match_all:
preg_match_all('/(?!<img.+?id="img_menu".*?\/>)(?!<img.+?id="featured_img".*?\/>)<img.+?\/>/', $content, $matches);
But I can't get my imgs?
preg_match_all('/(?!<img.+?id="img_menu".*?\/>)(?!<img.+?id="featured_img".*?\/>)<img.+?\/>/', $content, $matches);
print_r($matches);
Array ( [0] => Array ( ) )
assuming and hopefully you are using php5, this is a task for DOMDocument and xpath. regex with html elements mostly will work, but check the following example from
<img alt=">" src="/path.jpg" />
regex will fail. since there aren't many guarantees in programming, take the guarantee that xpath will find EXACTLY what you want, at a perfomance cost, so to code it:
$doc = new DOMDocument();
$doc->loadHTML('<span><img src="com.png" /><img src="com2.png" /></span>');
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//span/img');
$html = '';
foreach($imgs as $img){
$html .= $doc->saveXML($img);
}
now you have all img elements in $html, use str_replace() to remove them from $content, and from there you can have a drink and be pleased that xpath with html elements is painless, just a little slower
ps. i couldnt be be bother understanding your regex, i just think xpath is better in your situation
at the end i have used preg_replace_callback:
$content2 = get_the_content();
$removed_imgs = array();
$content2 = preg_replace_callback('#(?!<img.+?id="featured_img".*?\/>)(<img.+? />)#',function($r) {
global $removed_imgs;
$removed_imgs[] = $r[1];
return '';
},$content2);
foreach($removed_imgs as $img){
echo $img;
}
I've got a large number of webpages stored in an MySQL database.
Most of these pages contain at least one (and occasionally two) entries like this...
<a href="http://first-url-which-always-ends-with-a-slash/">
<img src="http://second-different-url-which-always-ends-with.jpg" />
</a>
I'd like to just set up a little php loop to go through all the entires replacing the first url with a copy of the second url for that entry.
How can I use preg to:
find the second url from the image tag
replace the first url in the a tag, with a copy of the second url
Is this possible?
see this url
PHP preg match / replace?
see also:- http://php.net/manual/en/function.preg-replace.php
$qp = qp($html);
foreach ($qp->find("img") as $img) {
$img->attr("title", $img->attr("alt"));
}
print $qp->writeHTML();
Though it might be feasible in this simple case to resort to an regex:
preg_replace('#(<img\s[^>]*)(\balt=)("[^"]+")#', '$1$2$3 title=$3', $h);
(It would make more sense to use preg_replace_callback to ensure no title= attribute is present yet.)
You can do following :
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->formatOutput = true;
$source = "<a href=\"http://first-url-which-always-ends-with-a-slash/\">
<img src=\"http://second-different-url-which-always-ends-with.jpg\" />
</a>";
$dom->loadHTML($source);
$tags = $dom->getElementsByTagName('a');
foreach ($tags as $tag) {
$atag = $tag->getAttribute('href');
$imgTag = $dom->getElementsByTagName('img');
foreach ($imgTag as $img) {
$img->setAttribute('src', $atag);
echo $img->getAttribute('src');
}
}
Thanks for the suggestions i can see how they are better than using Preg.
Even so i finally solved my own question like this...
$result = mysql_query($select);
while ($frow = mysql_fetch_array($result)) {
$page_content = $frow['page_content'];
preg_match("#<img\s+src\s*=\s*([\"']+http://[^\"']*\.jpg[\"']+)#i", $page_content, $matches1);
print_r($matches1);
$imageURL = $matches1[1] ;
preg_match("#<a\s+(?:[^\"'>]+|\"[^\"]*\"|'[^']*')*href\s*=\s(\"http://[^\"]+/\"|'http://[^']+/')#i", $page_content, $matches2);
print_r( $matches2 );
$linkURL = $matches2[1] ;
$finalpage=str_replace($linkURL, $imageURL, $page_content) ;
}
I have different format array of html
[amp;src]=>image, anotherone [posthtml]=>image2, anothertwo [nbsp;image3
How to extract img and text using common preg_match() by which we can get perfect image src and text from html. If it is not possible using preg_match(), is there another way to fix it.
If any one know please, reply it. How to fix it.
I need your hand.
The recommended way is to use DOM
$dom = new DOMDocument;
$dom->loadHTML($HTML);
$images = $dom->getElementsByTagName('img');
foreach($images as $im){
$attrs = $imgages->attributes();
$src = $attrs->getNamedItem('src')->nodeValue
}
Using Regular expression:
preg_match_all("/<img .*?(?=src)src=\"([^\"]+)\"/si", $html, $m);
print_r($m);