I have a question about preg_replace. I have the following HTML in WordPress:
<img width="256" height="256" src="http://localhost/wp-content/uploads/2015/08/spiderman-avatar.png" class="attachment-post-thumbnail wp-post-image" alt="spiderman-avatar">
I change it to the following:
<img src="" data-breakpoint="http://localhost/wp-content/uploads/2015/08/" data-img="theme-{folder}.jpg" class="srcbox" alt="spiderman-avatar">
with the following preg_replace:
$html = preg_replace(
'/src="(https?:\/\/.+\/)(.+\-)([0-9]+)(.jpg|.jpeg|.png|.gif)"/',
'src="" data-breakpoint="$1" data-img="$2{folder}$4"', // Replace and split src attribute into two new attributes
preg_replace(
'/(width|height)="[0-9]*"/',
'', // Remove width and height attributes
preg_replace(
'/<img ?([^>]*)class="([^"]*)"?/',
'<img $1 class="$2 srcbox"', // Add class srcbox to class attribute
$html
)
)
);
I have the feeling I have written some serious slow code, and it can be done in a single preg_replace.
Chris85 mentioned the HTML parser, so I found this and got this so far:
http://nimishprabhu.com/top-10-best-usage-examples-php-simple-html-dom-parser.html
include('simple_html_dom.php');
$html = file_get_html($html);
From here I COULD loop through all images and change the th attribute. But how do I put the new element were it came from?
you should better use DOM
http://php.net/manual/de/domdocument.loadhtml.php
and extract the attributes with it.
Related
I have a bit of a situation. The site am working on has two sections the mobile and the main site. They both fetch content from the same db/table. Its a blog-site. When admins create content that has images using the text editor (CKEditor), the style attribute is attached to the resulting img tag. so the output looks like this.
<img alt="some content" src="some location" style="width:520px; height:600px;" />
this works great on the main site but on the mobile site the images are poorly scaled and stretched.
i have a thumbnailing script that could address that but i want a way to get the src attribute before the page loads and a way to remove the style attribute.
i did this using regex.
$str=$blog_post_column_from_database
$pattern=array ('#\<img alt="(.*?)" src="(.*)" style="(.*?)" /> #' );
$replacement=array ( '<img src="$my_thumbnailer_here.php?src=\\2" width="100%" />' );
$a=(string)$str; //converts text to string to avoid code lines from executing
return preg_replace($pattern,$replacement,$a);
please what am i doing wrong?..Regex is not my strong points thanks.
...as already suggested in the comments, you'll be better off using PHPs DOMDocument:
Something like this should do the trick:
example: http://3v4l.org/Gv4dp
//get new domdoc instance
$dom=new DOMDocument();
//load your html
$dom->loadHTML($your_html);
//get all images
$imgs = $dom->getElementsByTagName("img");
//iterate over those
foreach($imgs as $img){
//remove style attribute
$img->removeAttribute('style');
//prefix src attribute with scriptname
$img->setAttribute( 'src' , 'thumbnail.php?img=' . $img->getAttribute('src') );
}
//output modified html
echo $dom->saveHTML();
you might want to remove the <doctype>, <html> and <body> elements, created when saving the doc as html by replacing the last line with:
echo preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), '', $dom->saveHTML()));
see removing doctype while saving domdocument
Try next regexp
$pattern=array ('#<img alt="(.*?)" src="(.*)" style="(.*?)" />#' );
There is removed / from begin and space from end.
And for correct work you should in first find all img tags and then change it.
Your regexp will not work attribute tag alt is missed or when attributes are in other orders
I am new to regular expression i tried a lot for getting image tag value inside a anchor tag html
this is my html expresstion
<div class="smallSku" id="ctl00_ContentPlaceHolder1_smallImages">
<a title="" name="http://www.playg.in/productImages/med/PNC000051_PNC000051.jpg" href="http://www.playg.in/productImages/lrg/PNC000051_PNC000051.jpg" onclick="return showPic(this)" onmouseover="return showPic(this)">
<img border="0" alt="" src="http://www.playg.in/productImages/thmb/PNC000051_PNC000051.jpg"></a> <a title="PNC000051_PNC000051_1.jpg" name="http://www.playg.in/productImages/med/PNC000051_PNC000051_1.jpg" href="http://www.playg.in/productImages/lrg/PNC000051_PNC000051_1.jpg" onclick="return showPic(this)" onmouseover="return showPic(this)">
<img border="0" alt="PNC000051_PNC000051_1.jpg" src="http://www.playg.in/productImages/thmb/PNC000051_PNC000051_1.jpg"></a>
</div>
i want to return only the src value of image tag and i tried a matching pattern in "preg_match_all()" and the pattern was
"#<div[\s\S]class="smallSku"[\s\S]id="ctl00_ContentPlaceHolder1_smallImages"\><a title=\"\" name="[\w\W]" href="[\w\W]" onclick=\"[\w\W]" onmouseover="[\w\W]"\><img[\s\S]src="(.*)"[\s\S]></a><\/div>#"
please help i tried a lots of time for this also tried with this link too Match image tag not nested in an anchor tag using regular expression
Regular expression is not the right tool for parsing HTML. See this FAQ: How to parse and process HTML/XML?
Here is an example on how to get the src property using your example:
$doc = new DOMDocument();
$doc->loadHTML($your_html_string);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//div[#class="smallSku"]/a/img/#src') as $attr) {
$src = $attr->value;
print $src;
}
try this sunith
$content = file_get_contents('your url');
preg_match_all("|<div class='items'>.*</div>|", $content, $arr, PREG_PATTERN_ORDER);
preg_match_all("/src='([^']+)'/", $arr[0][0], $arrr, PREG_PATTERN_ORDER);
echo '<pre>';
print_r($arrr);
I'm trying to figure out how to replace the title portion of an image (title="Title is here") in PHP, but I cannont get it to work, so could someone please help?
The title could be literally anything, so I need to find title"{anything here}" and replace that (as below).
I'm trying to us preg_replace(), but if there is a better way, I'm open to suggestions.
I've tried several different variations, but I think this is not too far off the mark -
$pattern = '#^title="([a-zA-Z0-9])"$#';
$replacement = 'title="Visit the '.$service['title'].' page';
$service_image = preg_replace($pattern, $replacement, $service_image);
<?php
$html = '<img src="whatever.jpg" title="Anything">';
$dom = new DOMDocument;
$dom->loadHTML($html);
$img = $dom->getElementsByTagName("img")->item(0);
/** #var $img DOMElement Now, $img contains the DOM note representing the image. */
$img->setAttribute("title", "Whatever you want here!");
/* Export the image alone (if not used like this,
* you'd get a complete HTML document including head and body).
*
* This ensures you only get the image.
*/
echo $dom->saveXML($img);
No regex for HTML please. This will work for you.
Use this snippet :
$tag = '<img title="My Old Title" src="localhost" alt="this is the alt"/>';
echo preg_replace('/(title)=("[^"]*")/i','title="My New Title"',$tag);
// <img title="My New Title" src="localhost" alt="this is the alt">
I have some text with images within it. I want to replace specific images within the text with something else.
i.e. the text contains an a youtube img url that I want to replace with the actual video link.
<img class="mceItem" src="http://img.youtube.com/vi/1MsVzAkmds0/default.jpg" alt="1MsVzAkmds0">
and replace it with the youtube Iframe code:
<iframe title="'.$id.'" class="youtube-player" type="text/html" width="576" height="400" src="http://www.youtube.com/embed/'.$id.'" frameborder="0"></iframe>
my function looks like this:
function replacelink($link) {
$find= ("/<img src=[^>]+\>/i");
$replace = youtube("\\2");
return preg_replace($find,$replace);
}
What do I need to change in the regex to do the above?
Your regex is looking for <img src=, but there is a class attribute between img and src. Using $find= '/<img.*src=[^>]+>/i'; corrects the problem; however, this illustrates why you shouldn’t use regex to parse HTML.
You wrote:
I have some text with images within it.
If the text you’re referring to is actually HTML, then there are better alternatives to using regex for this.
Update
I believe this is what you’re looking for.
<?php
function replacelink($text) {
$replace = '<iframe title="$2" class="youtube-player" type="text/html" width="576" height="400" <iframe title="$2" class="youtube-player" type="text/html" width="576" height="400" src="http://www.youtube.com/embed/$2" frameborder="0"></iframe>';
$find = '/(<img.*?alt="([\da-z]+)".*?>)/i';
return preg_replace($find, $replace, $text);
}
$imagestr = '<img class="mceItem" src="http://img.youtube.com/vi/1MsVzAkmds0/default.jpg" alt="1MsVzAkmds0">';
echo replacelink($imagestr);
?>
There’s no need for a separate youtube() function.
If you want to replace more than one image, use preg_replace_all() instead of preg_replace().
The following regex would get all the images with a specific url. I not sure if this is what you wanted.
<img [^>]*?src="url"[^>]*?>
Previous anwser would fail if there were more than one image.
I'm trying to use preg_replace to filter member comments. To filter script and img tags. If src is from my site, allow it with tags, if from another site, just show the src
Regex Expression:
<(\w+).+src=[\x22|'](?![^\x22']+mysite\.com[^\x22']+)([^\x22']+)[\x22|'].*>(?:</\1>)?
Using:
preg_replace($pattern, $2, $comment);
Comment :
Hi look at this!
<img src="http://www.mysite.com/blah/blah/image.jpg"></img>
<img src="http://mysite.com/blah/blah/image.jpg"></img>
<img src="http://subdomain.mysite.com/blah/blah/image.jpg"/>
<img src="http://www.mysite.fakesite.com/blah/blah/image.jpg"></img>
<img src="http://www.fakesite.com/blah/blah/image.jpg"></img>
<img src="http://fakesite.com/blah/blah/image.jpg"></img>
Which one is your favorite?
Wanted Outcome:
Hi look at this!
<img src="http://www.mysite.com/blah/blah/image.jpg"></img>
<img src="http://mysite.com/blah/blah/image.jpg"></img>
<img src="http://subdomain.mysite.com/blah/blah/image.jpg"/>
http://www.mysite.fakesite.com/blah/blah/image.jpg (notice that it's just url, because it's not from my site)
http://www.fakesite.com/blah/blah/image.jpg
http://fakesite.com/blah/blah/image.jpg
Which one is your favorite?
Anyone see anything wrong?
I'm trying to use preg_replace to filter member comments. To filter script and img tags.
HTML Purifier is going to be the best tool for this purpose, though you want a whitelist of acceptable tags and attributes, not a blacklist of specific harmful tags.
The biggest thing wrong I can see is trying to use regex to modify HTML.
You should use DOMDOcument.
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($content);
foreach($dom->getElementsByTag('img') as $element) {
if ( ! $element->hasAttribute('src')) {
continue;
}
$src = $element->getAttribute('src');
$elementHost = parse_url($src, PHP_URL_HOST);
$thisHost = $_SERVER['SERVER_NAME'];
if ($elementHost != $thisHost) {
$element->parentNode->insertBefore($dom->createTextNode($src), $element);
$element->parentNode->removeChild($element);
}
}
you shoud use im mode;
#<(\w+).+src=[\x22|'](?![^\x22']+mysite\.com[^\x22']+)([^\x22']+)[\x22|'].*>(?:</\1>)?#im