I would like to replace the link location (of anchor tag) of a page as follows.
Sample Input:
text text text <a href='http://test1.com/'> click </a> text text
other text <a class='links' href="gallery.html" title='Look at the gallery'> Gallery</a>
more text
Sample Output
text text text <a href='http://example.com/p.php?q=http://test1.com/'> click </a> text text
other text <a class='links' href="http://example.com/p.php?q=gallery.html" title='Look at the gallery'> Gallery</a>
more text
I hope I have make it clear. Anyway I am trying to do it with PHP and reg-ex. Would you please light me up with right.
Thank you
Sadi
Don't use regular expressions for parsing HTML.
Do use PHP's built-in XML parsing engine. It works quite well on your question (and answers the question to boot):
<?php
libxml_use_internal_errors(true); // ignore malformed HTML
$xml = new DOMDocument();
$xml->loadHTMLFile("http://stackoverflow.com/questions/3099187/replace-links-location-href");
foreach($xml->getElementsByTagName('a') as $link) {
$link->setAttribute('href', "http://www.google.com/?q=" . $link->getAttribute('href'));
}
echo $xml->saveHTML(); // output to browser, save to file, etc.
Try to use str_replace ();
$string = 'your text';
$newstring = str_replace ('href="', 'href="http://example.com/p.php?q=', $string);
Related
I have the following string:
$linkString="The Following is a link to google <a class='links' href='http://google.com'>
http://google.com
</a>
";
In this string the hypertext of the html link in new line. I want to remove and may be replace all of the link (its html tag and the hypertext) from the string, so I tried the following:
<?php
$linkString="The Following is a link to google <a class='links' href='http://google.com'>
http://google.com
</a>
";
//Remove link tag:
echo preg_replace('/<[^>]*>/','',$linkString);
However, the above example prints out:
The Following is a link to google
http://google.com
This is an online DEMO: http://codepad.org/whw81bwa
I want to know a regex that able to remove all the link (tag and hypertext)
Instead of using regex, make effective use of DOM to do this for you.
$doc = new DOMDocument;
#$doc->loadHTML($html); // load the HTML data
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//a') as $tag) {
$tag->parentNode->removeChild($tag);
}
echo $doc->saveHTML();
The following regex solve the issue:
/(?i)<a([^>]+)>(.+?)<\/a>/'
So,
<?php
$linkString="The Following is a link to google <a class='links' href='http://google.com'>
http://google.com
</a>
";
//Remove link tag:
echo preg_replace('/(?i)<a([^>]+)>(.+?)<\/a>/','A Hidden Link',$linkString);
summary of my code:
foreach($html->find('a') as $element) {
..
I use for inner text this:
$element->innertext
It is any chance to echo only the text from anchor text unsing Simple HTML DOM, i try to crawl about 10k links but in some cases it prints IF is inside <a tag ,divs code, images code, etc.
if the <a tag is standard(simple) like:
Anchor Text
so in this case $element->innertext will be "Anchor Text"
BUT
if the cases is like this:
1 <div id=whatever>Anchor Text</div>
or
2 <img src="whatever" />
my $element->innertext will be:
Result1 <div id=whatever>Anchor Text</div>
Result2 <img src="whatever" />
Is there any change to print ONLY the text or should i write my own custom conditions for each case: div, img, etc?
It's as simple as strip_tags($element->innertext);
The result will be an empty string if the anchor is an image.
Use Plaintext
strip_tags($element->plaintext)
$mbHtml = mb_convert_encoding($element->innertext, 'HTML-ENTITIES', 'utf-8');
$mbHtml = mb_eregi_replace('<(div|option|ul|li|table|tr|td|th|input|select|textarea|form)', ' <\\1', $mbHtml );
I tried using preg_match_all to get all the contents between a given html tag but it produces an empty result and I'm not good at php.
Is there a way to get get contents between tags? Like this -
<span class="st"> EVERYTHING IN HERE INCLUDING TAGS<B></B><EM></EM><DIV></DIV>&+++ TEXT </span>
preg_match is not very good at HTML parsing, especially in your case which is a bit more complex.
Instead you use a HTML parser and obtain the elements you're looking for. The following is a simple example selecting the first span element. This can be more differentiated by looking for the class attribute as well for example, just to give you some pointers for the start:
$html = '<span class="st"> EVERYTHING IN HERE INCLUDING TAGS<B></B><EM></EM><DIV></DIV>&+++ TEXT </span>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$span = $doc->getElementsByTagName('span')->item(0);
echo $doc->saveHTML($span);
Output:
<span class="st"> EVERYTHING IN HERE INCLUDING TAGS<b></b><em></em><div></div>&+++ TEXT </span>
If you look closely, you can see that even HTML errors have been fixed on the fly with the &+++ which was not valid HTML.
If you only need the inner HTML, you need to iterate over the children of the span element:
foreach($span->childNodes as $child)
{
echo $doc->saveHTML($child);
}
Which give you:
EVERYTHING IN HERE INCLUDING TAGS<b></b><em></em><div></div>&+++ TEXT
I hope this is helpful.
Try this with preg_match
$str = "<span class=\"st\"> EVERYTHING IN HERE INCLUDING TAGS<B></B><EM></EM><DIV></DIV>&+++ TEXT </span>";
preg_match("/<span class=\"st\">([.*?]+)<\/span>/i", $str, $matches);
print_r($matches);
Here's what I am tryin to accomplish. The CMS editor of our Magento webshop, has a button to insert a <!-- pagebreak --> tag. I would like to use this, to create a read more functionality. I thought I would search/replace for this tag to do this.
I want to search inside <p> tags, and I want people to be able to use this tag as often as they want.
Suppose this is my original HTML:
<p>This is my example text, but<!-- pagebreak --> this should be readable after 'click more'<!-- pagebreak --> with even more click more possible</p>
I would like to convert it to something like this.. I think the first one is the easiest to accomplish, maybe by doing an preg_replace in a while loop? The second one is probably cleaner/better html (less nesting)
<p>This is my example text, but <a href="#" onClick='#'>read more</a><div class='hiddenreadmore' id='hiddenreadmore-1'> this should be readable after 'click more'<a href="#" onClick='#'>read more</a><div class='hiddenreadmore' id='hiddenreadmore-2'> with even more click more possible</div></div></p>
or
<p>This is my example text, but <a href="#" onClick='#'>read more</a><div class='hiddenreadmore' id='hiddenreadmore-1'> this should be readable after 'click more'<a href="#" onClick='#'>read more</a></div><div class='hiddenreadmore' id='hiddenreadmore-2'> with even more click more possible</div></p>
So I came up with this, but I think there should be a way to do it with one replace.
$pattern = '#\<p\>(.+?)\<\!-- pagebreak --\>(.+?)\<\/p\>#s';
$count = true;
while ($count) {
$text = preg_replace($pattern, '<p>$1 read more<div class="hidden">$2</div></p>', $text, -1, $count);
}
Well if it you dont need to check if it's in a <p> tag you can use something like this:
str_replace ( "<!-- pagebreak -->" , '<p>$1 read more<div class="hidden">$2</div></p>' , $text, $count );
It's a lot lighter to the system.
I guess this would do the job:
$pattern = '#\<p>(.*?)\<!-- pagebreak -->(.*?)\</p>#s';
$text = "<p>some test <!-- pagebreak --> hidden content</p> second test <p>lolo <!-- pagebreak --> more hidden content</p>";
echo preg_replace($pattern, '<p>$1 read more<div class="hidden">$2</div></p>', $text, -1, $count);
Well I have a html text string in a variable:
$html = "<h1>title</h1><h2>subtitle 1</h2> <h2>subtitle 2</h2>";
so I want to create anchors in each subtitle that has with the same name and then print the html code to browser and also get the subtitles as an array.
I think is using regex.. please help.
I think this will do the trick for you:
$pattern = "|<h2>(.*)</h2>|U";
preg_match_all($pattern,$html,$matches);
foreach($matches[1] as $match)
$html = str_replace($match, "<a name='".$match."' />".$match, $html);
$array_of_elements = $matches[1];
Just make sure that $html has the existing html before this code starts. Then it will have an <a name='foo' /> added after this completes, and $array_of_elements will have the array of matching text values.