I am using simple html dom to do this thing, there are some img tags. Where I want to change some src with specific string. I want to change urls which contains http://localhost.com in given text to https://i0.wp.com/localhost.com
Example:
$data='<p><img class="alignnone wp-image-36109 size-full" src="https://localhost.com/wp-content/uploads/2014/10/WjhzQlNRaDJrYUUx_o_using-freedom-for-unlimited-in-app-purchases-android-.jpg" alt="WjhzQlNRaDJrYUUx_o_using-freedom-for-unlimited-in-app-purchases-android-" width="480" height="360"/></p>
';
I have used below code to search https://localhost.com, But how can I change it.
$html->find('img[src^=https://localhost.com/]');
Result from simple html dom:
It gives me that search value but I want to change the value with something. That I have already told.
I have also use this regex to do this work.
echo preg_replace("/(<img.*src=)[\"'](.*)[\"']/m",'\1"https://i0.wp.com/\2\"',$data);
but it gives me out like
<p><img class="alignnone wp-image-36109 size-full" src="https://i0.wp.com/https://localhost.com/wp-content/uploads/2014/10/WjhzQlNRaDJrYUUx_o_using-freedom-for-unlimited-in-app-purchases-android-.jpg" alt="WjhzQlNRaDJrYUUx_o_using-freedom-for-unlimited-in-app-purchases-android-" width="480" height="360\"/></p>
All the src is with https://i0.wp.com/ but in regex, I want to get this result:
Result from regex:
https://i0.wp.com/https://i0.wp.com/localhost.com/wp-content/uploads/2016/03/sd.png?resize=300%2C300
Want to get result:
https://i0.wp.com/i0.wp.com/localhost.com/wp-content/uploads/2016/03/sd.png?resize=300%2C300
Can someone give me clue to get this, And also can you give me your answer in regex. It will helpful for me. Hope you understand my problem, you can comment below for more information.
The most important thing is that If any one want to give this question a down vote, please do that But Please Please Comment below Why Did you do that, I am not a genius php developer like you, I just learn from my mistake
There you go (this uses the far superior DOMDocument library with xpath and regular expressions):
<?php
$data='<p><img class="alignnone wp-image-36109 size-full" src="https://localhost.com/wp-content/uploads/2014/10/WjhzQlNRaDJrYUUx_o_using-freedom-for-unlimited-in-app-purchases-android-.jpg" alt="WjhzQlNRaDJrYUUx_o_using-freedom-for-unlimited-in-app-purchases-android-" width="480" height="360"/></p>';
$dom = new DOMDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
# filters images
$needle = 'https://localhost.com';
$images = $xpath->query("//img[starts-with(#src, '$needle')]");
# split on positive lookahead
$regex = '~(?=localhost\.com)~';
foreach ($images as $image) {
$parts = preg_split($regex, $image->getAttribute("src"));
$newtarget = $parts[0] . "i0.wp.com/i0.wp.com/" . $parts[1];
$image->setAttribute("src", $newtarget);
}
# just to show the result
echo $dom->saveHTML();
?>
And see a demo on ideone.com.
Related
This is my Regex to fetch all tags with class:
preg_match_all('/<\s*\w*\s*class\s*=\s*"?\s*([\w\s%#\/\.;:_-]*)\s*"?.*?>/',file,$matches);
It matches all tags with class like <a class="abc">
The problem is that if any tag contains extra attribute before class than this Regex are unable to get it.
E.g.: <a id="fig_3_1" class="figure-contents">
I want <a class="figure-contents"> by ignore fig_3_1
Any idea to exclude it?
<\s*\w*.*?\s*class\s*=\s*"?\s*([\w\s%#\/\.;:_-]*)\s*"?.*?>
Probably this works
but you better use simple_html_dom
Take a look at this amazing SO post and reconsider.
You will most likely be better of using a html parser instead. You can do so using the DOM model.
A simple sample of how it can be used below.
$dom = new DOMDocument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$image->setAttribute('src', 'http://example.com/' .$image->getAttribute('src'));
}
$html = $dom->saveHTML();
<div style="display:none">250</div>.<div style="display:none">145</div>
id want:
<div style="display:none">250</div>#.#<div style="display:none">145</div>
or like this:
<div style="display:none">111</div>125<div style="display:none">110</div>
where id want
<div style="display:none">111</div>#125#<div style="display:none">110</div>
id like a preg replace to put those hashtags around the numb, so i asume the REGEX would look something like this:
"<\/div>[.]|<\/div>\d{1,3}"
The digit (in case its a digit, can be 1-3 digits), or it can be a dot.
Anyhow, i dont know hot to preg replace around the value:
"<\/div>[.]|<\/div>\d{1,3}" replace: $0#
Inserts it after the value..
EDIT
I cannot use a HTML parser, because i cannot find one that does not threat styles / classes as plaintext, and i need the values attached, to determine if the element is visible or not :(
and yes, it is driving me insane, but i am almost done :)
You really should not be trying to parse HTML with regex. There are only a couple of people I know who can do it. And even if you would have been one of them regex still is not the right tool for the job. Use PHP's DOMDocument optionally with DOMXPath.
With xpath:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$textNode = $xpath->query('//text()')->item(1);
$textNode->parentNode->replaceChild($dom->createTextNode('#' . $textNode->textContent . '#'), $textNode);
echo htmlspecialchars($dom->saveHTML());
http://codepad.viper-7.com/KLTLDA
With childnodes:
$dom = new DOMDocument();
$dom->loadHTML($html);
$body = $dom->getElementsByTagName('body')->item(0);
$textNode = $body->childNodes->item(1);
$textNode->parentNode->replaceChild($dom->createTextNode('#' . $textNode->textContent . '#'), $textNode);
echo htmlspecialchars($dom->saveHTML());
http://codepad.viper-7.com/Ii4vPb
In your case,
preg_replace("~</div\s*>(\.|\d{1,3})<div~i", '</div>#$1#<div', $string);
That's assuming no spaces between the divs and the content, and nothing otherwise weird is between.
Note that regex is very brittle, and would fail silently on even the slightest change in HTML.
I want tu find url in html code with PHP or JS
e.g i have this text
<description>
<![CDATA[<p>
<img" src="http://2010.pcnews.am/images/stories/2011/internet/chinese-computer-user-smoke.jpg" border="0" align="left" "/>
Երեկ Պեկինի ինտերնետ-սրճարաններից մեկում մահացել է 33-ամյա մի չինացի, ով 27 օր շարունակ անցկացրել էր համակարգչի առաջ: Հաղորդում է չինական «Ցյանլունվան» պարբերականը:</p>
<p>Աշխատանք չունեցող չինացին մեկ ամիս շարունակ չի լքել ինտերնետ-սրճարանը ՝ այդ ամբողջ ընթացքում սնվելով արագ պատրաստվող մակարոնով:</p>
<p />
Նույնիսկ ամանորյա տոները նա անցկացրել է համակարգչի առաջ. Պեկինի բնակիչները նշում են Նոր տարին Լուսնային օրացույցով՝ փետրվարի 3-8-ը: Մահվան պատճառները չեն հաղորդվում:
]]>
</description>
i want take only "http://2010.pcnews.am/images/stories/2011/internet/chinese-computer-user-smoke.jpg" ,
Thank in advance
This is a rather complicated task and while regex may seem easier, it is far too problematic. The following code will go through an XML file (called some.xml, but you’ll obviously need to change that) and gather the image sources into an array, $images.
$images = array();
$doc = new DOMDocument();
$doc->load('some.xml');
$descriptions = $doc->getElementsByTagName("description");
foreach ($descriptions as $description) {
foreach($description->childNodes as $child) {
if ($child->nodeType == XML_CDATA_SECTION_NODE) {
$html = new DOMDocument();
#$html->loadHTML($child->textContent);
$imgs = $html->getElementsByTagName('img');
foreach($imgs as $img) {
$images[] = $img->getAttribute('src');
}
}
}
}
I tested it against the XML you supplied an got the following result:
Array
(
[0] => http://2010.pcnews.am/images/stories/2011/internet/chinese-computer-user-smoke.jpg
)
I put it into an array in case there is more than one description with images.
You can use javascript or jQuery to get the image's src attribute.
document.getElementsByTag("img")[x].src
Use regex to find content between src=" and preceding "
In php could be done like this:
<?php
$txt = 'text here <img src="http://domain.com/something.png" border="0" align="left" "/> more
test and <em>html</em> around here
<p> thats it </p>';
preg_match('/src="([^"]*)"/', $txt, $matches);
var_dump($matches[1]);
?>
Regular expressions are brittle for text parsing and do not take advantage of the document's inherent structure. Using RegEx to find stuff in a marked up document is generally a poor practice.
Use PHP's built in DOMNode and DOMXPath instead.
I have a bunch of text with html in it. Basically what I want to do is for all links found in this text I want to add a rel="noindex" to every link found only if the title attribute is no present.
For example if a link looks like this:
test
I want it to look like:
<a rel="nofollow" href="test.html">test</a>
But if the link looks like this:
<a title="test title" href="test.html">test</a>
I dont want to add the rel="nofollow" attribute to that. How can I do that in php?
EDIT:
Im sorry I didnt mention this but I am using PHP4. Yes I know but Im stuck with PHP4.
Quite simply with DOMDocument:
$dom = new DOMDocument;
$dom->loadHTML($yourHTML);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link) {
if (!$link->hasAttribute('title')) {
$link->setAttribute('rel', 'nofollow');
}
}
$yourHTML = $dom->saveHTML();
This is far more stable and reliable than mucking about with regex.
First use preg match to get if title is added.
$str = 'test';
if(!preg_match('/title=/', $str))
{
$str = str_replace('href=', 'rel="nofollow" href=', $str);
}
I created this function:
<?php
function target_links( $html )
{
$pattern = "/<(a)([^>]+)>/i";
$replacement = "<\\1 target=\"_blank\"\\2>";
$new_str = preg_replace($pattern,$replacement,str_replace('target="_blank"','',$html));
return $new_str;
}
?>
The goal is to add a target="_blank" to all the link tags.
Now my problem is that I need to skip all link tags where the href attribute contains a specific word, but I can't seem to find the proper combination. Can you guys help me?
I'm not to sure about the "not failing because of broken HTML", but if you can get DomDocument to accept your html, try something like:
<?php
$dom = new DOMDocument();
$dom->loadHtml('<html>
some link
some link
</html>');
$xpath = new DOMXpath($dom);
foreach ($xpath->query('//a[not(contains(#href, "protected"))]') as $node) {
$node->setAttribute('target', '_blank');
}
header('Content-Type: text/html; charset="UTF-8"');
echo $dom->saveHtml();
A regex solution can look like this:
<(a)(?!.*?href="[^"]*SPECIFICWORD)([^>]+)>
A negative lookahead (?!.*?href="[^"]*SPECIFICWORD) is used to check if the "SPECIFICWORD" is within the href attribute, if yes the regex does not match.
See here online on Regexr