Failing Regex Syntax for html in PHP

Failing Regex Syntax for html in PHP - php

I have a bit of a situation. The site am working on has two sections the mobile and the main site. They both fetch content from the same db/table. Its a blog-site. When admins create content that has images using the text editor (CKEditor), the style attribute is attached to the resulting img tag. so the output looks like this.
<img alt="some content" src="some location" style="width:520px; height:600px;" />
this works great on the main site but on the mobile site the images are poorly scaled and stretched.
i have a thumbnailing script that could address that but i want a way to get the src attribute before the page loads and a way to remove the style attribute.
i did this using regex.
$str=$blog_post_column_from_database
$pattern=array ('#\<img alt="(.*?)" src="(.*)" style="(.*?)" /> #' );
$replacement=array ( '<img src="$my_thumbnailer_here.php?src=\\2" width="100%" />' );
$a=(string)$str; //converts text to string to avoid code lines from executing
return preg_replace($pattern,$replacement,$a);
please what am i doing wrong?..Regex is not my strong points thanks.

...as already suggested in the comments, you'll be better off using PHPs DOMDocument:
Something like this should do the trick:
example: http://3v4l.org/Gv4dp
//get new domdoc instance
$dom=new DOMDocument();
//load your html
$dom->loadHTML($your_html);
//get all images
$imgs = $dom->getElementsByTagName("img");
//iterate over those
foreach($imgs as $img){
//remove style attribute
$img->removeAttribute('style');
//prefix src attribute with scriptname
$img->setAttribute( 'src' , 'thumbnail.php?img=' . $img->getAttribute('src') );
}
//output modified html
echo $dom->saveHTML();
you might want to remove the <doctype>, <html> and <body> elements, created when saving the doc as html by replacing the last line with:
echo preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), '', $dom->saveHTML()));
see removing doctype while saving domdocument

Try next regexp
$pattern=array ('#<img alt="(.*?)" src="(.*)" style="(.*?)" />#' );
There is removed / from begin and space from end.
And for correct work you should in first find all img tags and then change it.
Your regexp will not work attribute tag alt is missed or when attributes are in other orders

Related

Edit iframe content using PHP, and preg_replace()

I need to load some 3rd party widget onto my website. The only way they distribute it is by means of clumsy old <iframe>.
I don't have much choice so what I do is get an iframe html code, using a proxy page on my website like so:
$iframe = file_get_contents('http://example.com/page_with_iframe_html.php');
Then I have to remove some specific parts in iframe like this:
$iframe = preg_replace('~<div class="someclass">[\s\S]*<\/div>~ix', '', $iframe);
In this way I intend to remove the unwanted section. And in the end i simply output the iframe like so:
echo ($iframe);
The iframe gets output alright, however the unwanted section is still there. The regex itself was tested using regex101, but it doesn't work.

You should try this way, Hope this will help you out. Here i am using sample HTML remove the div with given class name, First i load the document, query and remove that node from the child.
Try this code snippet here
<?php
ini_set('display_errors', 1);
//sample HTML content
$string1='<html>'
. '<body>'
. '<div>This is div 1</div>'
. '<div class="someclass"> <span class="hot-line-text"> hotline: </span> <a id="hot-line-tel" class="hot-line-link" href="tel:0000" target="_parent"> <button class="hot-line-button"></button> <span class="hot-line-number">0000</span> </a> </div>'
. '</body>'
. '</html>';
$object= new DOMDocument();
$object->loadHTML($string1);
$xpathObj= new DOMXPath($object);
$result=$xpathObj->query('//div[#class="someclass"]');
foreach($result as $node)
{
$node->parentNode->removeChild($node);
}
echo $object->saveHTML();

scraping images from url using php

i am trying to make a page that allows me to grab and save images from another link , so here's what i want to add on my page:
text box (to enter url that i want to get images from).
save dialog box to specify the path to save images.
but what i am trying to do here i want to save images only from that url and from inside specific element.
for example on my code i say go to example.com and from inside of element class="images" grab all images.
notes: not all images from the page, just from inside the element
whether element has 3 images in it or 50 or 100 i don't care.
here's what i tried and worked using php
<?php
$html = file_get_contents('http://www.tgo-tv.net');
preg_match_all( '|<img.*?src=[\'"](.*?)[\'"].*?>|i',$html, $matches );
echo $matches[ 1 ][ 0 ];
?>
this gets image name and path but what i am trying to make is a save dialog box and the code must save image directly into that path instead of echo it out
hope you understand
Edit 2
it's ok of Not having save dialog box. i must specify save path from the code

If you want something generic, you can use:
<?php
$the_site = "http://somesite.com";
$the_tag = "div"; #
$the_class = "images";
$html = file_get_contents($the_site);
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//'.$the_tag.'[contains(#class,"'.$the_class.'")]/img') as $item) {
$img_src = $item->getAttribute('src');
print $img_src."\n";
}
Usage:
Change the site, tag, which can be a div, span, a, etc. also change the class name.
For example, change the values to:
$the_site = "https://stackoverflow.com/questions/23674744/what-is-the-equivalent-of-python-any-and-all-functions-in-javascript";
$the_tag = "div"; #
$the_class = "gravatar-wrapper-32";
Output:
https://www.gravatar.com/avatar/67d8ca039ee1ffd5c6db0d29aeb4b168?s=32&d=identicon&r=PG
https://www.gravatar.com/avatar/24da669dda96b6f17a802bdb7f6d429f?s=32&d=identicon&r=PG
https://www.gravatar.com/avatar/24780fb6df85a943c7aea0402c843737?s=32&d=identicon&r=PG

Maybe you should try HTML DOM Parser for PHP. I've found this tool recently and to be honest it works pretty well. It was JQuery-like selectors as you can see on the site. I suggest you to take a look and try something like:
<?php
require_once("./simple_html_dom.php");
foreach ($html->find("<tag>") as $<tag>) //Start from the root (<html></html>) find the the parent tag you want to search in instead of <tag> (e.g "div" if you want to search in all divs)
{
foreach ($<tag>->find("img") as $img) //Start searching for img tag in all (divs) you found
{
echo $img->src . "<br>"; //Output the information from the img's src attribute (if the found tag is <img src="www.example.com/cat.png"> you will get www.example.com/cat.png as result)
}
}
?>
I hope i helped you less or more.

preg_replace for images in PHP

I have a question about preg_replace. I have the following HTML in WordPress:
<img width="256" height="256" src="http://localhost/wp-content/uploads/2015/08/spiderman-avatar.png" class="attachment-post-thumbnail wp-post-image" alt="spiderman-avatar">
I change it to the following:
<img src="" data-breakpoint="http://localhost/wp-content/uploads/2015/08/" data-img="theme-{folder}.jpg" class="srcbox" alt="spiderman-avatar">
with the following preg_replace:
$html = preg_replace(
'/src="(https?:\/\/.+\/)(.+\-)([0-9]+)(.jpg|.jpeg|.png|.gif)"/',
'src="" data-breakpoint="$1" data-img="$2{folder}$4"', // Replace and split src attribute into two new attributes
preg_replace(
'/(width|height)="[0-9]*"/',
'', // Remove width and height attributes
preg_replace(
'/<img ?([^>]*)class="([^"]*)"?/',
'<img $1 class="$2 srcbox"', // Add class srcbox to class attribute
$html
)
)
);
I have the feeling I have written some serious slow code, and it can be done in a single preg_replace.
Chris85 mentioned the HTML parser, so I found this and got this so far:
http://nimishprabhu.com/top-10-best-usage-examples-php-simple-html-dom-parser.html
include('simple_html_dom.php');
$html = file_get_html($html);
From here I COULD loop through all images and change the th attribute. But how do I put the new element were it came from?

you should better use DOM
http://php.net/manual/de/domdocument.loadhtml.php
and extract the attributes with it.

How to remove all 'alt' attribute from all the <img> tags from HTML file in PHP? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Remove style attribute from HTML tags
Current image looks like
<img src="images/sample.jpg" alt="xyz"/>
Now I want to remove all such alt tags present in all the tags in HTML file, the PHP code itself should replace all the alt attribute appearances.
The output should be like
<img src="images/sample.jpg" /> only
How can be done with php?
Thanks in Advance

Use DOMDocument for HTML parsing/manipulation. The example below reads a HTML file, removes the alt attribute from all img tags, then prints out the HTML.
$dom = new DOMDocument();
$dom->loadHTMLFile('file.html');
foreach($dom->getElementsByTagName('img') as $image)
{
$image->removeAttribute('alt');
}
echo $dom->saveHTML(); // print the modified HTML

Read your file. You can use file_get_contents() to read a file
$fileContent = file_get_contents('filename.html');
$fileContent = preg_replace('/alt=\"(.*)\"/', '', $fileContent);
file_put_contents('filename.html', $fileContent);
Make sure your file is writable

First, you need to get a hold on the document source you want to modify. It's not clear if you want to edit some html files on your server, edit the html output generated by a request or what...
In this answer I'm gonna step over on how you get to the HTML. It could be a file_get_contents('filename.html'); or some magic with output buffering.
Since you don't want to parse HTML with regular expressions you need to use a parser:
Since the alt attribute is required for the HTML to be valid, if you want to "remove" it you have to set it to an empty string.
This should work:
$doc = DOMDocument::loadHTML($myhtml);
$images = $doc->getElementsByTagName('img');
foreach($images as $img) {
$image->setAttribute('alt', '');
}
$myhtml = $doc->saveHTML();

For valid xHTML it should have the alt attribute.
Something like this would work:
$xml = new SimpleXMLElement($doc); // $doc is the html document.
foreach ($xml->xpath('//img') as $img_tag) {
if (isset($img_tag->attributes()->alt)) {
unset($img_tag->attributes()->alt);
}
}
$new_doc = $xml->asXML();

Replace the title of an HTML image

I'm trying to figure out how to replace the title portion of an image (title="Title is here") in PHP, but I cannont get it to work, so could someone please help?
The title could be literally anything, so I need to find title"{anything here}" and replace that (as below).
I'm trying to us preg_replace(), but if there is a better way, I'm open to suggestions.
I've tried several different variations, but I think this is not too far off the mark -
$pattern = '#^title="([a-zA-Z0-9])"$#';
$replacement = 'title="Visit the '.$service['title'].' page';
$service_image = preg_replace($pattern, $replacement, $service_image);

<?php
$html = '<img src="whatever.jpg" title="Anything">';
$dom = new DOMDocument;
$dom->loadHTML($html);
$img = $dom->getElementsByTagName("img")->item(0);
/** #var $img DOMElement Now, $img contains the DOM note representing the image. */
$img->setAttribute("title", "Whatever you want here!");
/* Export the image alone (if not used like this,
* you'd get a complete HTML document including head and body).
*
* This ensures you only get the image.
*/
echo $dom->saveXML($img);
No regex for HTML please. This will work for you.

Use this snippet :
$tag = '<img title="My Old Title" src="localhost" alt="this is the alt"/>';
echo preg_replace('/(title)=("[^"]*")/i','title="My New Title"',$tag);
// <img title="My New Title" src="localhost" alt="this is the alt">

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Failing Regex Syntax for html in PHP - php

Related

Edit iframe content using PHP, and preg_replace()

scraping images from url using php

preg_replace for images in PHP

How to remove all 'alt' attribute from all the <img> tags from HTML file in PHP? [duplicate]

Replace the title of an HTML image

Categories

Resources