Adding id an attribute to paragraph elements - php

Suppose that a variable contains html markups like below:
<p> paragraph 1 </p>
<p> paragraph 2 </p>
...
How can I turn it to something like this:
<p id="1" data-pic="someStaticText"> paragraph 1 </p>
<p id="2" data-pic="someStaticText"> paragraph 2 </p>
...
Of course, it's not just composed of paragraph elements.

Well, I figured it out while doing some more research:
$html_string = preg_replace_callback(
"(<p(.*?)>)is",
function($m) {
static $id = 0;
$id++;
return "<p id=\"p".$id."\"data-pic=\"someStaticText\"".$m[1].">";
},
$html_string);

You could use a while loop with each iteration replacing the first <p> of the string.
$id = 1;
while(strpos($html_string, '<p>') !== FALSE){
$html_string = str_replace('<p>','<p id="'.$id.'" data-pic="someStaticText">',$html_string, 1);
$id++;
}

Related

How to highlight_string content from a form within tags pre using php

I have content that is posted from a WYSIWYG form with ajax where codes are placed between tags
stuf before content <pre class="brush: php; light: true; collapse: true; fontsize: 500; first-line: 1; ">
<?php echo "Jesus is the way the truth and the life and Jesus is Lord and He is God"; ?> </pre> stuf after content <pre class="brush: php; light: true; collapse: true; fontsize: 500; first-line: 1; ">
<?php $jesusis='Lord'; echo "Another content code"; ?> </pre>
Once the content is posted, before displaying what has been posted, on the backend I want to use the php function highlight_string to highlight every thing between the tags <pre> whether they are classes added to the tags pre like in the example above or not.
Once the content is posted, the WYSIWYG editor turns the content to htmlentities so the content becomes
<p> stuf before content <br /> </p> <pre><code><span style="color: #000000"> <br /> &lt;?php echo "Jesus is the way the truth and the life and Jesus is Lord and He is God"; ?&gt; <br /></span> </code></pre> <p> <br /> stuf after content <br /> </p> <pre><code><span style="color: #000000"> <br /> &lt;?php $jesusis="Lord"; echo "Another content code"; ?&gt; <br /></span> </code></pre> <p> <br /> </p>
So basically on the back end I want to replace <pre>...content ...</pre> with <pre>highlight_string ('content')</pre>.
In order to achieve that I first of all tried to retrieve every match between <pre>
preg_match('~<pre.*?>(.*?)</pre>~i', $content, $matches, PREG_OFFSET_CAPTURE);
//Count number of matches
$compter_matches = count($matches);
//if we have at least one match
if($compter_matches>0)
{
###############################
for($i=0; $i<$compter_matches; $i++)
{
$a_remplacer = $matches[$i];
$replacement = '<pre>'.highlight_string($a_remplacer, true).'</pre>';
$note = preg_replace('~<pre.*?>(.*?)</pre>~i', $replacement, $content);
}
##########################################
}
But when i do that i get:
int(2) Notice: Array to string conversion.
How to highlight_string content from a form within tags <pre> and </pre> using php ?
First of all your (.*?) will not match anything that spans across lines because . does not match the newline character unless you specify the s flag.
Try:
<?php
$content = '<p>
stuf before content
<br />
</p>
<pre class="brush: applescript; fontsize: 100; first-line: 1; ">
<?php echo "Jesus is the way the truth and the life and Jesus is Lord and He is God"; ?>
</pre>
<p>
<br />
stuf after content
<br />
</p>
<pre class="brush: applescript; fontsize: 100; first-line: 1; ">
<?php $jesusis="Lord"; echo "Another content code"; ?>
</pre>
<p>
<br />
</p>';
$text = preg_replace_callback(
'~<pre.*?>(.*?)</pre>~is',
function($matches) {
return('<pre>'.highlight_string(html_entity_decode($matches[1]), true).'</pre>');
},
$content
);
echo $text;
See Demo
Hmm,
What about that:
<?php
$content = '<pre>sadfasdfsadf</pre>';
$matches = array();
preg_match('~<pre.*?>(.*?)</pre>~i', $content, $matches, PREG_OFFSET_CAPTURE);
//Count number of matches
$compter_matches = count($matches);
//if we have at least one match
if($compter_matches>0)
{
for($i = 0; $i < $compter_matches; $i += 2) {
$a_remplacer = $matches[$i][0];
$replacement = '<pre>'.highlight_string($a_remplacer[$i+1][0], true).'</pre>';
$note = str_replace($a_remplacer[$i], $replacement, $content);
}
echo $note;
}
online test

Get contents of div up to a certain point

I'm grabbing all the paragraph tags using the PHP Simple HTML DOM Parser with the following code:
// Product Description
$html = file_get_html('http://domain.local/index.html');
$contents = strip_tags($html->find('div[class=product-details] p'));
How can I say grab X amount of paragraphs until it hits the first ul?
<p>
Paragraph 1
</p>
<p>
Paragraph 2
</p>
<p>
Paragraph 3
</p>
<ul>
<li>
List item 1
</li>
<li>
List item 2
</li>
</ul>
<blockquote>
Quote 1
</blockquote>
<blockquote>
Quote 2
</blockquote>
<blockquote>
Quote 3
</blockquote>
<p>
Paragraph 4
</p>
<p>
Paragraph 5
</p>
You can use the following code as per requirements mentioned:-
<?php
$html = file_get_html('http://domain.local/index.html');
$detailTags = $html->find('div[class=product-details] *');
$contents = "";
foreach ($detailTags as $detailTag){
// these condition will check if tag is not <p> or it's <ul> to break the loop.
if (strpos($detailTag, '<ul>') === 0 && strpos($detailTag, '<p>') !== 0) {
break;
}
$contents .= strip_tags($detailTag);
}
// contents will contain the output required.
echo $contents;
?>
OUTPUT:-
Paragraph 1 Paragraph 2 Paragraph 3
EDIT: Nandal's code will work for you because it will not force you to change the library.
If you don't want to be dependent upon 3rd party library then you can use PHP's DOM Document feature for which you would need to enable the extension.
You can look into the below code which prints the paragraphs until you hit any other tag:
<?php
$html = new DOMDocument();
$html->loadHTML("<html><body><p>Paragraph 1</p><p> Paragraph 2</p><p> Paragraph 3</p><ul> <li> List item 1 </li> <li> List item 2 </li> </ul><blockquote> Quote 1</blockquote><blockquote> Quote 2</blockquote><blockquote> Quote 3</blockquote><p> Paragraph 4</p><p> Paragraph 5</p></body></html>");
$xpath = new DOMXPath($html);
$nodes = $xpath->query('/html/body//*');
foreach($nodes as $node) {
if($node->nodeName != "p") {
break;
}
print $node -> nodeValue . "\n";
}

Replace <div> tag with <p> tag using php

<div style = "text-align:left;" class="ref"> Text </div>
I want to replace <div> with <p> without losing attributes.
Any help is appreciated.
Try This:
$str = '<div style = "text-align:left;" class="ref"> Text </div>';
$newstr = preg_replace('/<div [^<]*?class="([^<]*?ref.*?)">(.*?)<\/div>/','<p class="$1">$2</p>',$str);
echo $newstr;
Output : <p class="ref"> Text </p>

php - Simple HTML dom - elements between other elements

I'm trying to write a php script to crawl a website and keep some elements in data base.
Here is my problem : A web page is written like this :
<h2>The title 1</h2>
<p class="one_class"> Some text </p>
<p> Some interesting text </p>
<h2>The title 2</h2>
<p class="one_class"> Some text </p>
<p> Some interesting text </p>
<p class="one_class"> Some different text </p>
<p> Some other interesting text </p>
<h2>The title 3</h2>
<p class="one_class"> Some text </p>
<p> Some interesting text </p>
I want to get only the h2 and p with interesting text, not the p class="one_class".
I tried this php code :
<?php
$numberP = 0;
foreach($html->find('p') as $p)
{
$pIsOneClass = PIsOneClass($html, $p);
if($pIsOneClass == false)
{
echo $p->outertext;
$h2 = $html->find("h2", $numberP);
echo $h2->outertext;
$numberP++;
}
}
?>
the function PIsOneClass($html, $p) is :
<?php
function PIsOneClass($html, $p)
{
foreach($html->find("p.one_class") as $p_one_class)
{
if($p == $p_one_class)
{
return true;
}
}
return false;
}
?>
It doesn't work, i understand why but i don't know how to resolve it.
How can we say "I want every p without class who are between two h2 ?"
Thx a lot !
This task is easier with XPath, since you're scraping more than one element and you want to keep the source in order. You can use PHP's DOM library, which includes DOMXPath, to find and filter the elements you want:
$html = '<h2>The title 1</h2>
<p class="one_class"> Some text </p>
<p> Some interesting text </p>
<h2>The title 2</h2>
<p class="one_class"> Some text </p>
<p> Some interesting text </p>
<p class="one_class"> Some different text </p>
<p> Some other interesting text </p>
<h2>The title 3</h2>
<p class="one_class"> Some text </p>
<p> Some interesting text </p>';
# create a new DOM document and load the html
$dom = new DOMDocument;
$dom->loadHTML($html);
# create a new DOMXPath object
$xp = new DOMXPath($dom);
# search for all h2 elements and all p elements that do not have the class 'one_class'
$interest = $xp->query('//h2 | //p[not(#class="one_class")]');
# iterate through the array of search results (h2 and p elements), printing out node
# names and values
foreach ($interest as $i) {
echo "node " . $i->nodeName . ", value: " . $i->nodeValue . PHP_EOL;
}
Output:
node h2, value: The title 1
node p, value: Some interesting text
node h2, value: The title 2
node p, value: Some interesting text
node p, value: Some other interesting text
node h2, value: The title 3
node p, value: Some interesting text
As you can see, the source text stays in order, and it's easy to eliminate the nodes you don't want.
From the simpleHTML dom manual
[attribute=value]
Matches elements that have the specified attribute with a certain value.
or
[!attribute]
Matches elements that don't have the specified attribute.

PHP preg_match_all + str_replace

I need to find a way to replace all the <p> within all the <blockquote> before the <hr />.
Here's a sample html:
<p>2012/01/03</p>
<blockquote>
<h4>File name</h4>
<p>Good Game</p>
</blockquote>
<blockquote><p>Laurie Ipsumam</p></blockquote>
<h4>Some title</h4>
<hr />
<p>Lorem Ipsum</p>
<blockquote><p>Laurel Ipsucandescent</p></blockquote>
Here's what I got:
$pieces = explode("<hr", $theHTML, 2);
$blocks = preg_match_all('/<blockquote>(.*?)<\/blockquote>/s', $pieces[0], $blockmatch);
if ($blocks) {
$t1=$blockmatch[1];
for ($j=0;$j<$blocks;$j++) {
$paragraphs = preg_match_all('/<p>/', $t1[$j], $paragraphmatch);
if ($paragraphs) {
$t2=$paragraphmatch[0];
for ($k=0;$k<$paragraphs;$k++) {
$t1[$j]=str_replace($t2[$k],'<p class=\"whatever\">',$t1[$j]);
}
}
}
}
I think I'm really close, but I don't know how to put back together the html that I just pieced out and modified.
You could try using simple_xml, or better DOMDocument (http://www.php.net/manual/en/class.domdocument.php) before you make it a valid html code, and use this functionality to find the nodes you are looking for, and replace them, for this you could try XPath (http://w3schools.com/xpath/xpath_syntax.asp).
Edit 1:
Take a look at the answer of this question:
RegEx match open tags except XHTML self-contained tags
$string = explode('<hr', $string);
$string[0] = preg_replace('/<blockquote>(.*)<p>(.*)<\/p>(.*)<\/blockquote>/sU', '<blockquote>\1<p class="whatever">\2</p>\3</blockquote>', $string[0]);
$string = $string[0] . '<hr' . $string[1];
output:
<p>2012/01/03</p>
<blockquote>
<h4>File name</h4>
<p class="whatever">Good Game</p>
</blockquote>
<blockquote><p class="whatever">Laurie Ipsumam</p></blockquote>
<h4>Some title</h4>
<hr />
<p>Lorem Ipsum</p>
<blockquote><p>Laurel Ipsucandescent</p></blockquote>

Categories