php dom get image src path encode

php dom get image src path encode - php

I use below codes to extract image src path .but this is a problem when the image filename has special character(eg:~ DQBTZ_UC(G#STWO_1R2U_Q4.gif),the output turn to be like this :~ 6Z6W4%255BO29FQ%255BA4YN_%255BFR9%2529M.gif
How to fix this issue? sorry for my poor English.
function _get_imagepath($content){
$doc = new DOMDocument();
$doc->loadHTML($content);
$imagepaths=array();
$imageTags = $doc->getElementsByTagName('img');
$folder=file_directory_path();
foreach($imageTags as $tag) {
$imagepaths[]=$tag->getAttribute('src');
}
if(!empty($imagepaths)){
return $imagepaths;
}else{
return FALSE;
}
}

It seems your filenames are URL encoded. Take a look at http://php.net/manual/en/function.urldecode.php
i.e:
foreach($imageTags as $tag) {
$imagepaths[]=urldecode($tag->getAttribute('src'));
}

You get the encoded URL.
You want to use urldecode:
Decodes any %## encoding in the given string. Plus symbols ('+') are
decoded to a space character.
urldecode() in PHP Manual

replace
return $imagepaths;
with
return urldecode($imagepaths);
to decode your image url.

Related

PHP decoding square brackets href attr to html file

Saving an html the decodes square brackets.
//My STRing
$teaserTest = "<a href='[CLICK_URL]'><strong>testgerr</strong></a>";
//Calling save function
saveFile($teaserTest);
//Save function
function saveFile($stringToAdd){
$doc = new DOMDocument();
$doc->formatOutput = true;
$doc->loadHTML('<html><head><title>Test</title></head><body>'.$stringToAdd.'</body></html>');
$doc->saveHTMLFile("Campaigns/test.html");
}
file resaults <a href="%5BCLICK_URL%5D">
im trying to keep the"[" decoded.

[] brackets are special chars in url
which is specified in following RFC It is important for the ip address for example: http://[::1]/example/
That because it is good to encoding. But if you have a special approach use a different pattern for it.

php regular expression to remove unwanted code

The editor I am using is adding extraneous coding that I would like to remove via php before writing to the database.
The code looks like this:
<img style="width: 250px;" src="files/school-big.jpg" data-cke-saved-src="files/school-big.jpg" alt="">
<img style="width: 250px;" src="files/firemen.jpg" data-cke-saved-src="files/firemen.jpg" alt="">
What I need to get rid of is the data-cke-saved-src="files/image-name". My understanding of regex is somewhere below weak so how would I build a regex to grab the image name without grabbing the end of the line or the rest of the content?
Thank you kindly,

Try this:
$data = preg_replace('#\s(data-cke-saved-src)="[^"]+"#', '', $data);
Or do it in jQuery before going into PHP with this:
$('img').removeAttr('data-cke-saved-src')

Try adding and using this function:
/*
*I am assuming you get all the data in a single variable.
*/
function remove_data_cke($text) {
// Get all data-cke-saved-src="..." tags from the html.
$result = array();
preg_match_all('|data-cke-saved-src="[^"]*"|U', $text, $result);
// Replace all occurances with an empty string.
foreach($result[0] as $data_cke) {
$text = str_replace($data_cke, '', $text);
}
return $text;
}

You can use DOM to easily remove the attribute:
$doc = new DOMDocument;
#$doc->loadHTML($html); // load the HTML data
foreach ($doc->getElementsByTagName('img') as $img) {
$img->removeAttribute('data-cke-saved-src');
}

replace img src with php

I would like to take a block of code stored in a variable and replace the src of any image tags in there without disturbing the rest of the code block.
For example : the block of code might read :
<img src="image1.jpg">
I would like to change that to (using PHP) :
<img src="altimage.jpg">
I am currently using a solution I found using the PHP DOM module to change the image tag but the function returns just the changed img tag HTML without the rest of the HTML.
The function I am calling is as follows :
function replace_img_src($original_img_tag, $new_src_url) {
$doc = new DOMDocument();
$doc->loadHTML($original_img_tag);
$tags = $doc->getElementsByTagName('img');
if(count($tags) > 0)
{
$tag = $tags->item(0);
$tag->setAttribute('src', $new_src_url);
return $doc->saveXML($tag);
}
return false;
}
This, of course, just returns the changed img tag but strips the other HTML (such as the A tag) - I am passing the entire block of code to the function.
(BTW - It's good for me to have the false return for no image tags as well).
What am I missing here please ?
Many thanks in advance for any help.

You need to use return $doc->saveXML(); instead of return $doc->saveXML($tag);. See the documentation of saveXML:
saveXML ([ DOMNode $node [, int $options ]] )
node: Use this parameter to output only a specific node without XML declaration rather than the entire document.

Strip directory structure in HTML

I have a PHP application that reads in a bit of HTML. In this HTML there may be an img tag. What I want to do is strip the directory structure from the src of the image tag e.g.
<img src="dir1/dir2/dir3/image1.jpg>
to
<img src="image1.jpg">
Anyone have any pointers?
Thanks,
Mark

As a suggestion, rather than using regex, you may be better off using something like the SimpleXML class to traverse the HTML, that way you'd be able to find the img tags and their src attribute then change it easily. Rather than having to try and parse a whole document with regex. After you've done that you'd be able to just explode the string using the "/" delimiter and use the last value of the exploded array as the src attribute.
PHP.net's SimpleXML Manual: http://php.net/manual/en/book.simplexml.php

This is a tutorial how to change all links in a HTMl document: Scraping Links From HTML.
With a slight modification of the example, this could do it:
<?php
require('FluentDOM/FluentDOM.php');
$html = '<img src="dir1/dir2/dir3/image1.jpg">';
$fd = FluentDOM($html, 'html')->find('//img[#src]')->each(
function ($node) use ($url) {
$item = FluentDOM($node);
$item->attr('href', basename($item->attr('src')));
}
);
$fd->contentType = 'xml';
header('Content-type: text/xml');
echo $fd;
?>

If you want to try this with regexp this could work:
$subject = "dir1/dir2/dir3/image1.jpg";
$pattern = '/^.*\//';
$result = preg_replace($pattern, '', $subject);

Extract Image Sources from text in PHP - preg_match_all required

I have a little issue as my preg_match_all is not running properly.
what I want to do is extract the src parameter of all the images in the post_content from the wordpress which is a string - not a complete html document/DOM (thus cannot use a document parser function)
I am currently using the below code which is unfortunately too untidy and works for only 1 image src, where I want all image sources from that string
preg_match_all( '/src="([^"]*)"/', $search->post_content, $matches);
if ( isset( $matches ) )
{
foreach ($matches as $match)
{
if(strpos($match[0], "src")!==false)
{
$res = explode("\"", $match[0]);
echo $res[1];
}
}
}
can someone please help here...

Using regular expressions to parse an HTML document can be very error prone. Like in your case where not only IMG elements have an SRC attribute (in fact, that doesn’t even need to be an HTML attribute at all). Besides that, it also might be possible that the attribute value is not enclosed in double quote.
Better use a HTML DOM parser like PHP’s DOMDocument and its methods:
$doc = new DOMDocument();
$doc->loadHTML($search->post_content);
foreach ($doc->getElementsByTagName('img') as $img) {
if ($img->hasAttribute('src')) {
echo $img->getAttribute('src');
}
}

You can use a DOM parser with HTML strings, it is not necessary to have a complete HTML document. http://simplehtmldom.sourceforge.net/

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php dom get image src path encode - php

It seems your filenames are URL encoded. Take a look at http://php.net/manual/en/function.urldecode.php i.e: foreach($imageTags as $tag) { $imagepaths[]=urldecode($tag->getAttribute('src')); }

You get the encoded URL. You want to use urldecode: Decodes any %## encoding in the given string. Plus symbols ('+') are decoded to a space character. urldecode() in PHP Manual

replace return $imagepaths; with return urldecode($imagepaths); to decode your image url.

Related

PHP decoding square brackets href attr to html file

php regular expression to remove unwanted code

replace img src with php

Strip directory structure in HTML

Extract Image Sources from text in PHP - preg_match_all required

Categories

Resources