Replace pagebreak tag with regular expression - php

Here's what I am tryin to accomplish. The CMS editor of our Magento webshop, has a button to insert a <!-- pagebreak --> tag. I would like to use this, to create a read more functionality. I thought I would search/replace for this tag to do this.
I want to search inside <p> tags, and I want people to be able to use this tag as often as they want.
Suppose this is my original HTML:
<p>This is my example text, but<!-- pagebreak --> this should be readable after 'click more'<!-- pagebreak --> with even more click more possible</p>
I would like to convert it to something like this.. I think the first one is the easiest to accomplish, maybe by doing an preg_replace in a while loop? The second one is probably cleaner/better html (less nesting)
<p>This is my example text, but <a href="#" onClick='#'>read more</a><div class='hiddenreadmore' id='hiddenreadmore-1'> this should be readable after 'click more'<a href="#" onClick='#'>read more</a><div class='hiddenreadmore' id='hiddenreadmore-2'> with even more click more possible</div></div></p>
or
<p>This is my example text, but <a href="#" onClick='#'>read more</a><div class='hiddenreadmore' id='hiddenreadmore-1'> this should be readable after 'click more'<a href="#" onClick='#'>read more</a></div><div class='hiddenreadmore' id='hiddenreadmore-2'> with even more click more possible</div></p>
So I came up with this, but I think there should be a way to do it with one replace.
$pattern = '#\<p\>(.+?)\<\!-- pagebreak --\>(.+?)\<\/p\>#s';
$count = true;
while ($count) {
$text = preg_replace($pattern, '<p>$1 read more<div class="hidden">$2</div></p>', $text, -1, $count);
}

Well if it you dont need to check if it's in a <p> tag you can use something like this:
str_replace ( "<!-- pagebreak -->" , '<p>$1 read more<div class="hidden">$2</div></p>' , $text, $count );
It's a lot lighter to the system.

I guess this would do the job:
$pattern = '#\<p>(.*?)\<!-- pagebreak -->(.*?)\</p>#s';
$text = "<p>some test <!-- pagebreak --> hidden content</p> second test <p>lolo <!-- pagebreak --> more hidden content</p>";
echo preg_replace($pattern, '<p>$1 read more<div class="hidden">$2</div></p>', $text, -1, $count);

Related

How to extract HTML element from a source file

I need to replace a HTML section identified by a tag id in a source code, which is combination of HTML and PHP using PHP. In case it's pure HTML, DOM parser could be used; in case there is no DIV in DIV, I can imagine how to use preg_match. This is what I am trying to do - I have a code (loaded into a string) like:
<div>
<img >
</div>
<? include(); ?>
<div id="mydiv">
<div>
<div>
<img >
</div>
</div>
</div>
and my task is to replace content of "mydiv" DIV with a new one e.g.
<div id="newdiv>
some text
</div>
so the string will look like this after the change:
<div>
<img >
</div>
<? include(); ?>
<div id="mydiv">
<div id="newdiv>
some text
</div>
</div>
I have already tried:
1) parsing the code using DOMdocument's loadHTML => it produces a lot of errors in case PHP code is included.
2) I played around a bit with regexes like preg_match_all('/<div id="myid"([^<]*)<\/div>/', $src, $matches), which fails in case more child divs are included.
The best approach I have found so far is:
1) find id="mydiv" string
2) search for '<' and '>' chars and count them like '<'=1 and '>'=-1 (not exactly, but it gives the idea)
3) once I get sum == 0 I should be on position of the closing tag, so I know, which portion string I should exchange
This is quite "heavy" solution, which can stop working in some cases, where the code is different (e.g. onpage PHP code contains the chars as well instead of just simple "include"). So I am looking so some better solution.
You could try something like this:
$file = 'filename.php';
$content = file_get_contents($file);
$array_one = explode( '<div id="mydiv">' , $content );
$my_div_content = explode("</div>" , $array_one[1] )[0];
Or use preg_match like you said:
preg_match('/<div id="mydiv"(.*?)<\/div>/s', $content, $matches)
Yes there is. First you need to use a function that will get the content of the file. Lets call the file homepage.php:
$homepageString = file_get_contents('homepage.php');
Now you have a string with all the content. The next thing you would do is use the preg_replace() function to take out the part of code that you want to take out:
$newHomepageString = preg_replace('/id="mydiv"/',"", $homepageString);
Now you overwrite the existing homepage.php file with the new source code:
file_put_contents("homepage.php", $newHomepageString);
Let me know if it worked for you! :)

Simple HTML DOM, how to echo only the text from anchor text

summary of my code:
foreach($html->find('a') as $element) {
..
I use for inner text this:
$element->innertext
It is any chance to echo only the text from anchor text unsing Simple HTML DOM, i try to crawl about 10k links but in some cases it prints IF is inside <a tag ,divs code, images code, etc.
if the <a tag is standard(simple) like:
Anchor Text
so in this case $element->innertext will be "Anchor Text"
BUT
if the cases is like this:
1 <div id=whatever>Anchor Text</div>
or
2 <img src="whatever" />
my $element->innertext will be:
Result1 <div id=whatever>Anchor Text</div>
Result2 <img src="whatever" />
Is there any change to print ONLY the text or should i write my own custom conditions for each case: div, img, etc?
It's as simple as strip_tags($element->innertext);
The result will be an empty string if the anchor is an image.
Use Plaintext
strip_tags($element->plaintext)
$mbHtml = mb_convert_encoding($element->innertext, 'HTML-ENTITIES', 'utf-8');
$mbHtml = mb_eregi_replace('<(div|option|ul|li|table|tr|td|th|input|select|textarea|form)', ' <\\1', $mbHtml );

HTML manipulation: Match the first X number of HTML tags and move them

Let's say I've have the code like this:
<img src="001">
<img src="002">
<p>Some content here.</p>
<img src="003">
What I want to do now is to match the first two images (001 and 002) and store that part of the code in variable. I don't want to do anything with third image.
Id used something like preg_match_all('/<img .*>/', $result); but it obviously matched all the images. Not just those which appear on the top of the code. How to modify that regular expression to select just images that are on top of the code.
What I want to do is to now. I've have <h2> tag with title in one variable and the code above in the second. I want to move the first X images before the <h2> tag OR insert that <h2> tag after first X images. All that in back-end PHP. Would be fun to make it with CSS, but flexbox is not yet here.
You need to divide the problem to solve it. You have got two main parts here:
Division of the HTML into Top and Bottom parts.
Doing the DOMDocument manipulation on (both?) HTML strings.
Let's just do that:
The first part is actually quite simple. Let's say all line separators are "\n" and the empty line is actually an empty line "\n\n". Then this is a simple string operation:
list($top, $bottom) = explode("\n\n", $html, 2);
This solves the first part already. Top html is in $top and the rest we actually do not need to care much about is stored into $bottom.
Let's go on with the second part.
With simple DOMDocument operations you can now for example get a list of all images:
$topDoc = new DOMDocument();
$topDoc->loadHTML($top);
$topImages = $topDoc->getElementsByTagname('img');
The only thing you need to do now is to remove each image from it's parent:
$image->parentNode->removeChild($image);
And then insert it before the <h2> element:
$anchor = $topDoc->getElementsByTagName('h2')->item(0);
$anchor->parentNode->insertBefore($image, $anchor);
And you're fine. Full code example:
$html = <<<HTML
<h2>Title here</h2>
<img src="001">
<p>Some content here. (for testing purposes)</p>
<img src="002">
<h2>Second Title here (for testing purposes)</h2>
<p>Some content here.</p>
<img src="003">
HTML;
list($top, $bottom) = explode("\n\n", $html, 2);
$topDoc = new DOMDocument();
$topDoc->loadHTML($top);
$topImages = $topDoc->getElementsByTagname('img');
$anchor = $topDoc->getElementsByTagName('h2')->item(0);
foreach($topImages as $image) {
$image->parentNode->removeChild($image);
$anchor->parentNode->insertBefore($image, $anchor);
}
foreach($topDoc->getElementsByTagName('body')->item(0)->childNodes as $child)
echo $topDoc->saveHTML($child);
echo $bottom;
Output:
<img src="001"><img src="002"><h2>Title here</h2>
<p>Some content here. (for testing purposes)</p>
<h2>Second Title here (for testing purposes)</h2>
<p>Some content here.</p>
<img src="003">

create anchors in a page with the content of <h2></h2> in PHP

Well I have a html text string in a variable:
$html = "<h1>title</h1><h2>subtitle 1</h2> <h2>subtitle 2</h2>";
so I want to create anchors in each subtitle that has with the same name and then print the html code to browser and also get the subtitles as an array.
I think is using regex.. please help.
I think this will do the trick for you:
$pattern = "|<h2>(.*)</h2>|U";
preg_match_all($pattern,$html,$matches);
foreach($matches[1] as $match)
$html = str_replace($match, "<a name='".$match."' />".$match, $html);
$array_of_elements = $matches[1];
Just make sure that $html has the existing html before this code starts. Then it will have an <a name='foo' /> added after this completes, and $array_of_elements will have the array of matching text values.

Replace Links Location (href='...')

I would like to replace the link location (of anchor tag) of a page as follows.
Sample Input:
text text text <a href='http://test1.com/'> click </a> text text
other text <a class='links' href="gallery.html" title='Look at the gallery'> Gallery</a>
more text
Sample Output
text text text <a href='http://example.com/p.php?q=http://test1.com/'> click </a> text text
other text <a class='links' href="http://example.com/p.php?q=gallery.html" title='Look at the gallery'> Gallery</a>
more text
I hope I have make it clear. Anyway I am trying to do it with PHP and reg-ex. Would you please light me up with right.
Thank you
Sadi
Don't use regular expressions for parsing HTML.
Do use PHP's built-in XML parsing engine. It works quite well on your question (and answers the question to boot):
<?php
libxml_use_internal_errors(true); // ignore malformed HTML
$xml = new DOMDocument();
$xml->loadHTMLFile("http://stackoverflow.com/questions/3099187/replace-links-location-href");
foreach($xml->getElementsByTagName('a') as $link) {
$link->setAttribute('href', "http://www.google.com/?q=" . $link->getAttribute('href'));
}
echo $xml->saveHTML(); // output to browser, save to file, etc.
Try to use str_replace ();
$string = 'your text';
$newstring = str_replace ('href="', 'href="http://example.com/p.php?q=', $string);

Categories