Replace the title of an HTML image - php

I'm trying to figure out how to replace the title portion of an image (title="Title is here") in PHP, but I cannont get it to work, so could someone please help?
The title could be literally anything, so I need to find title"{anything here}" and replace that (as below).
I'm trying to us preg_replace(), but if there is a better way, I'm open to suggestions.
I've tried several different variations, but I think this is not too far off the mark -
$pattern = '#^title="([a-zA-Z0-9])"$#';
$replacement = 'title="Visit the '.$service['title'].' page';
$service_image = preg_replace($pattern, $replacement, $service_image);

<?php
$html = '<img src="whatever.jpg" title="Anything">';
$dom = new DOMDocument;
$dom->loadHTML($html);
$img = $dom->getElementsByTagName("img")->item(0);
/** #var $img DOMElement Now, $img contains the DOM note representing the image. */
$img->setAttribute("title", "Whatever you want here!");
/* Export the image alone (if not used like this,
* you'd get a complete HTML document including head and body).
*
* This ensures you only get the image.
*/
echo $dom->saveXML($img);
No regex for HTML please. This will work for you.

Use this snippet :
$tag = '<img title="My Old Title" src="localhost" alt="this is the alt"/>';
echo preg_replace('/(title)=("[^"]*")/i','title="My New Title"',$tag);
// <img title="My New Title" src="localhost" alt="this is the alt">

Related

preg_replace for images in PHP

I have a question about preg_replace. I have the following HTML in WordPress:
<img width="256" height="256" src="http://localhost/wp-content/uploads/2015/08/spiderman-avatar.png" class="attachment-post-thumbnail wp-post-image" alt="spiderman-avatar">
I change it to the following:
<img src="" data-breakpoint="http://localhost/wp-content/uploads/2015/08/" data-img="theme-{folder}.jpg" class="srcbox" alt="spiderman-avatar">
with the following preg_replace:
$html = preg_replace(
'/src="(https?:\/\/.+\/)(.+\-)([0-9]+)(.jpg|.jpeg|.png|.gif)"/',
'src="" data-breakpoint="$1" data-img="$2{folder}$4"', // Replace and split src attribute into two new attributes
preg_replace(
'/(width|height)="[0-9]*"/',
'', // Remove width and height attributes
preg_replace(
'/<img ?([^>]*)class="([^"]*)"?/',
'<img $1 class="$2 srcbox"', // Add class srcbox to class attribute
$html
)
)
);
I have the feeling I have written some serious slow code, and it can be done in a single preg_replace.
Chris85 mentioned the HTML parser, so I found this and got this so far:
http://nimishprabhu.com/top-10-best-usage-examples-php-simple-html-dom-parser.html
include('simple_html_dom.php');
$html = file_get_html($html);
From here I COULD loop through all images and change the th attribute. But how do I put the new element were it came from?
you should better use DOM
http://php.net/manual/de/domdocument.loadhtml.php
and extract the attributes with it.

How to chain in phpquery (almost everything can be a chain)

Good day everyone,
I'm very new with phpquery and this is my first post here at stackoverflow for a reason that i cant find the correct for syntax for the phpquery chaining. I know someone knows what i been looking for.
I only want to remove the a certain div inside a div.
<div id = "content">
<p>The text that i want to display</p>
<div class="node-links">Stuff i want to remove</div>
</content>
This few lines of codes works perfect
pq('div.node-links')->remove();
$text = pq('div#content');
print $text; //output: The text that i want to display
But when I tried
$text = pq('div#content')->removeClass('div.node-links'); //or
$text = pq('div#content')->remove('div.node-links');
//output: The text that i want to display (+) Stuff i want to remove
Can someone tell me why the second block of code is not working?
Thanks!
The first line of code will only work if your trying to remove the class from div.node-links, it won't remove the node.
If you are trying to remove the class you need to change it from:
$text = pq('div#content')->removeClass('div.node-links');
// to
$text = pq('div#content')->find('.node-links')->removeClass('node-links')->end();
which will output:
<div id="content">
<p>The text that i want to display</p>
<div>Stuff i want to remove</div>
</div>
As for the second line of code.. I'm not exactly sure why it is not working, it seems like your not selecting .node-links but I was able to get the desired results using these.
// $markup = file_get_contents('test.html');
// $doc = phpQuery::newDocumentHTML($markup);
$text = $doc->find('div#content')->children()->remove('.node-links')->end();
// or
$text = pq('div#content')->find('.node-links')->remove()->end();
// or
$text = pq('div#content > *')->remove('.node-links')->parent();
Hope that helps
Since remove() does not take any parameter, you can do:
$text = pq('div#content div.node-links')->remove();

Failing Regex Syntax for html in PHP

I have a bit of a situation. The site am working on has two sections the mobile and the main site. They both fetch content from the same db/table. Its a blog-site. When admins create content that has images using the text editor (CKEditor), the style attribute is attached to the resulting img tag. so the output looks like this.
<img alt="some content" src="some location" style="width:520px; height:600px;" />
this works great on the main site but on the mobile site the images are poorly scaled and stretched.
i have a thumbnailing script that could address that but i want a way to get the src attribute before the page loads and a way to remove the style attribute.
i did this using regex.
$str=$blog_post_column_from_database
$pattern=array ('#\<img alt="(.*?)" src="(.*)" style="(.*?)" /> #' );
$replacement=array ( '<img src="$my_thumbnailer_here.php?src=\\2" width="100%" />' );
$a=(string)$str; //converts text to string to avoid code lines from executing
return preg_replace($pattern,$replacement,$a);
please what am i doing wrong?..Regex is not my strong points thanks.
...as already suggested in the comments, you'll be better off using PHPs DOMDocument:
Something like this should do the trick:
example: http://3v4l.org/Gv4dp
//get new domdoc instance
$dom=new DOMDocument();
//load your html
$dom->loadHTML($your_html);
//get all images
$imgs = $dom->getElementsByTagName("img");
//iterate over those
foreach($imgs as $img){
//remove style attribute
$img->removeAttribute('style');
//prefix src attribute with scriptname
$img->setAttribute( 'src' , 'thumbnail.php?img=' . $img->getAttribute('src') );
}
//output modified html
echo $dom->saveHTML();
you might want to remove the <doctype>, <html> and <body> elements, created when saving the doc as html by replacing the last line with:
echo preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), '', $dom->saveHTML()));
see removing doctype while saving domdocument
Try next regexp
$pattern=array ('#<img alt="(.*?)" src="(.*)" style="(.*?)" />#' );
There is removed / from begin and space from end.
And for correct work you should in first find all img tags and then change it.
Your regexp will not work attribute tag alt is missed or when attributes are in other orders

Getting the first image in string with php

I'm trying to get the first image from each of my posts. This code below works great if I only have one image. But if I have more then one it gives me an image but not always the first.
I really only want the first image. A lot of times the second image is a next button
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sara" title="Sara" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $texthtml, $matches);
$first_img = $matches [1] [0];
now I can take this "$first_img" and stick it in front of the short description
<img alt="Sara" title="Sara" src="<?php echo $first_img;?>"/>
If you only need the first source tag, preg_match should do instead of preg_match_all, does this work for you?
<?php
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sara" title="Sara" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i', $texthtml, $image);
echo $image['src'];
?>
Don't use regex to parse html.
Use an html-parsing lib/class, as phpquery:
require 'phpQuery-onefile.php';
$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sarahehe" title="Saraxd" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
$pq = phpQuery::newDocumentHTML($texthtml);
$img = $pq->find('img:first');
$src = $img->attr('src');
echo "<img alt='foo' title='baa' src='{$src}'>";
Download: http://code.google.com/p/phpquery/
After testing an answer from here Using regular expressions to extract the first image source from html codes? I got better results with less broken link images than the answer provided here.
While regular expressions can be good for a large variety of tasks, I find it usually falls short when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately (and by accurately I mean 100% success rate with no false positive) extract a tag.
For more consistent results use this object http://simplehtmldom.sourceforge.net/ which allows you to manipulate html.
An example is provided in the response in the first link I posted.
function get_first_image($html){
require_once('SimpleHTML.class.php')
$post_html = str_get_html($html);
$first_img = $post_html->find('img', 0);
if($first_img !== null) {
return $first_img->src';
}
return null;
}
Enjoy

Help with Regex expression?

I'm trying to use preg_replace to filter member comments. To filter script and img tags. If src is from my site, allow it with tags, if from another site, just show the src
Regex Expression:
<(\w+).+src=[\x22|'](?![^\x22']+mysite\.com[^\x22']+)([^\x22']+)[\x22|'].*>(?:</\1>)?
Using:
preg_replace($pattern, $2, $comment);
Comment :
Hi look at this!
<img src="http://www.mysite.com/blah/blah/image.jpg"></img>
<img src="http://mysite.com/blah/blah/image.jpg"></img>
<img src="http://subdomain.mysite.com/blah/blah/image.jpg"/>
<img src="http://www.mysite.fakesite.com/blah/blah/image.jpg"></img>
<img src="http://www.fakesite.com/blah/blah/image.jpg"></img>
<img src="http://fakesite.com/blah/blah/image.jpg"></img>
Which one is your favorite?
Wanted Outcome:
Hi look at this!
<img src="http://www.mysite.com/blah/blah/image.jpg"></img>
<img src="http://mysite.com/blah/blah/image.jpg"></img>
<img src="http://subdomain.mysite.com/blah/blah/image.jpg"/>
http://www.mysite.fakesite.com/blah/blah/image.jpg (notice that it's just url, because it's not from my site)
http://www.fakesite.com/blah/blah/image.jpg
http://fakesite.com/blah/blah/image.jpg
Which one is your favorite?
Anyone see anything wrong?
I'm trying to use preg_replace to filter member comments. To filter script and img tags.
HTML Purifier is going to be the best tool for this purpose, though you want a whitelist of acceptable tags and attributes, not a blacklist of specific harmful tags.
The biggest thing wrong I can see is trying to use regex to modify HTML.
You should use DOMDOcument.
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($content);
foreach($dom->getElementsByTag('img') as $element) {
if ( ! $element->hasAttribute('src')) {
continue;
}
$src = $element->getAttribute('src');
$elementHost = parse_url($src, PHP_URL_HOST);
$thisHost = $_SERVER['SERVER_NAME'];
if ($elementHost != $thisHost) {
$element->parentNode->insertBefore($dom->createTextNode($src), $element);
$element->parentNode->removeChild($element);
}
}
you shoud use im mode;
#<(\w+).+src=[\x22|'](?![^\x22']+mysite\.com[^\x22']+)([^\x22']+)[\x22|'].*>(?:</\1>)?#im

Categories