Block elements inside "a" tag (htmlpurifier) - php

How I can prevent HTMLPURIFIER break this code:
<a target="_blank" href="http://www.example" class="link_normal opacity">
<blockquote url="http://www.example" class="big">
<div class="center">
<div class="table_cell"><img src="/images/imagenes_urls/4241.jpg" class="img_url"></div>
</div>
<p style="color:#3b5998" class="b">Desde los 80 hasta 2015, así ha sido la impresionante evolución de los móviles</p>
<p>¿Cómo olvidarse de aquellos enormes objetos a los que llamábamos teléfonos móviles? Muchos de vosotros los recordar...</p>
<span class="dominio">andro4all.com</span>
</blockquote></a>
When i use it turns into something like this:
<blockquote class="big">
<a class="link_normal opacity" href="http://www.example"></a>
<div class="center">
<a class="link_normal opacity" href="http://www.example"></a>
<div class="table_cell">
<a class="link_normal opacity" href="http://www.example">
<img alt="4241.jpg" class="img_url" src="/images/imagenes_urls/4241.jpg">
</a>
</div>
<a class="link_normal opacity" href="http://www.example"></a>
</div>
<a class="link_normal opacity" href="http://www.example"></a>
<p class="b" style="color:#3b5998;">
<a class="link_normal opacity" href="http://www.example">Desde los 80 hasta 2015, así ha sido la impresionante evolución de los móviles</a>
</p></blockquote>

you can use a trick like adding JavaScript to your main tag like:
onclick="document.location='http://www.example'"
and you can add pointer style to cursor to make it look like a normal link:
style="cursor:pointer"
and that will be like this:
<blockquote url="http://www.example" class="big" onclick="document.location='http://www.example'" style="cursor:pointer">
<div class="center">
<div class="table_cell"><img src="/images/imagenes_urls/4241.jpg" class="img_url"></div>
</div>
<p style="color:#3b5998" class="b">Desde los 80 hasta 2015, así ha sido la impresionante evolución de los móviles</p>
<p>¿Cómo olvidarse de aquellos enormes objetos a los que llamábamos teléfonos móviles? Muchos de vosotros los recordar...</p>
<span class="dominio">andro4all.com</span>
</blockquote>

That is not valid to wrap grouping tag blockquote with inline a tag in HTML4.
Also you can find here good information over this tag http://www.w3.org/html/wg/drafts/html/master/grouping-content.html#the-blockquote-element

After experimentation, I just discovered a slick answer to this specific question. You can customize HTMLPurifier's behavior, as described at http://htmlpurifier.org/docs/enduser-customize.html.
Specifically, you can overwrite the default behavior of the 'a' anchor element by defining the element again, like this:
include_once('HTMLPurifier.auto.php');
$config = HTMLPurifier_Config::createDefault();
$def = $config->getHTMLDefinition(true);
// here is the magic method that overwrites the default anchor definition
$def->addElement(
'a', // element name
'Inline', // the type of element: 'Block','Inline', or false, if it's a special case
'Flow', // what type of child elements are permitted: 'Empty', 'Inline', or 'Flow', which includes block elements like div
'Common' // permitted attributes
);
$purifier = new HTMLPurifier($config);
// $dirty_html is the html you want cleaned
echo $purifier->purify($dirty_html);
Now you can include block elements inside anchor tags, as allowed in the HTML5 specification (http://dev.w3.org/html5/markup/a.html).
For a more robust HTML5 solution, Christoffer Bubach offers a detailed HTMLPurifier configuration to allow newer HTML5 tags, though it doesn't redefine the anchor tag to permit block elements inside it. See his answer to this SO question: HTML filter that is HTML5 compliant

Related

PHP xpath query not working

I've got the following HTML code:
<html><body><h1> <span class="mw-headline" id="Discussie_over_Titel">Discussie over Titel</span></h1>
<div class="comments">
<div class="comment new">
<div class="newcommenttext">Klik op de button om een nieuwe opmerking te maken over Titel</div><div class="buttons">Nieuwe opmerking</div>
<div class="clear"></div>
</div>
<div class="comments"><div>
</div><div class="commentBlock"><div class="comment" id="WikiSysop-"><div class="poster">Door <span class="usernamehighlight">WikiSysop</span> op <span class="extradata">Type: Suggestie</span>
</div>
<div class="buttons">Reageer op deze opmerking<span class="collapse">Collapse</span></div><div class="content">Wat is dit nou weer?
</div>
</div>
</div><div class="commentBlock"><div class="comment" id="WikiSysop-"><div class="poster">Door <span class="usernamehighlight">WikiSysop</span> op <span class="extradata">Type: Suggestie</span>
</div>
<div class="buttons">Reageer op deze opmerking<span class="collapse">Collapse</span></div><div class="content">Mega mooi!
</div>
</div></div></div></div> <div class="comments">
<div class="comment new"><div class="newcommenttext">
Klik op de button om een nieuwe opmerking te maken over Titel</div><div class="buttons">Nieuwe opmerking</div>
</div>
<p><br />
</p><p><br />
</p>
Klik hier om terug te keren naar Titel.</div>
</body></html>
To fetch all the comments I simply create a new dom parser:
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($text);
$xpath = new DOMXPath($dom);
$xpathResult = $xpath->query("//div[#class='comments']//div[#class='comments']");
But somehow the xpath query ALWAYS returns 0. Even when i use //body. Any body knows why?
The Xpath query you are using will only grab nodes with class comments which are located (deep) inside another node with class comments.
With the HTML snippet you provided, it will return 1 node.
You duplicate selector, right using is:
$xpath->query("//div[#class='comments']"); // returns 3
No need for the duplicate entry:
$xpathResult = $xpath->query("//div[#class='comments']");
# 3 elements found
See a working demo on ideone.com.

Elements In Array Being Ignored

For some strange reason 2 of the elements in my arrays are being ignored. They are...
ETIQUETAS XTRA MINI PARA OBJETOS and ETIQUETAS TERMOAHDESIVAS CLASICAS.
If the product page title is included in the array it renders an info box on the product page below the product image. This is true for all products in the array except the 2 mentioned above.
Below I have the two array vars and below that... the code that renders the info boxes. Any help is greatly appreciated. By the way, this is WooCommerce over Wordpress 4.0.
<?php
//for every label product.
$infoBox1Array = array('Pack Duo','Etiquetas Grandes','Etiquetas Navideñas','Pack Trio','ETIQUETAS PARA CUMPLEAÑOS SUPER PERSONALIZADA','LETREROS PARA CASAS','ETIQUETAS PARA CUMPLEAÑOS','PACK ZAPATOS FORMAS DE PIE','Pack Zapatos','Etiquetas Para Alergias Personalizadas','PACK FULL COLOR','Membretes','Etiquetas Termoadhesivas Mini (fondo blanco)','ETIQUETAS TERMOADHESIVAS FULL COLOR','ETIQUETAS TERMOAHDESIVAS CLASICAS','Saca & Pega','Stickers para Carros','ETIQUETAS PARA ALERGIAS','Etiquetas Kosher','Etiquetas Cocina','Etiquetas Minis','Pack mix 3','Pack mix 2','Pack mix 1','Etiquetas Redondas','Etiquetas Medianas','ETIQUETAS XTRA MINI PARA OBJETOS','Pack Xpress','Pack Guarderia','Pack Regreso al Cole');
//for every product that can be personalized.
$infoBox2Array = array('Pack Duo','Etiquetas Grandes','Etiquetas Navideñas','Pack Trio','ETIQUETAS PARA CUMPLEAÑOS SUPER PERSONALIZADA','LETREROS PARA CASAS','ETIQUETAS PARA CUMPLEAÑOS','PACK ZAPATOS FORMAS DE PIE','Pack Zapatos','Etiquetas Para Alergias Personalizadas','PACK FULL COLOR','Membretes','Etiquetas Termoadhesivas Mini (fondo blanco)','ETIQUETAS TERMOADHESIVAS FULL COLOR','ETIQUETAS TERMOAHDESIVAS CLASICAS','Saca & Pega','Etiquetas Minis','Pack mix 3','Pack mix 2','Pack mix 1','Etiquetas Redondas','Etiquetas Medianas','ETIQUETAS XTRA MINI PARA OBJETOS','Pack Xpress','Pack Guarderia','Pack Regreso al Cole');
?>
<div class="row">
<div>
<?php
include(TEMPLATEPATH . '/info-box-arrays.php');
if(is_single( $infoBox1Array )) {
echo '
<div style="background-color: #ffffff; border: solid 1px #CCCCCC;padding: 7px"><p style="text-align: justify">
El tamaño de la etiqueta y de letra son aproximados.<br/>
Los colores pueden variar de acuerdo a la configuración de su pantalla.
</p></div>
<div style="clear: both"> </div>';
}
if(is_single( $infoBox2Array )) {
echo '
<div style="background-color: #ffffff; border: solid 1px #CCCCCC;padding: 7px"><p style="text-align: justify">
<strong><span style="color: #ff0000;">Nota Importante:</span></strong> Los datos que escribe son los que serán procesados en su pedido.</br/>
Considerar acentos y mayusculas.
</p></div>';
}
?>
</div>
</div>
I have replicated your code and created a post ETIQUETAS TERMOAHDESIVAS CLASICAS. It works fine so I suspect that there is nothing wrong with your array.
I think that there is another reason that is_single is returning false. I think it is either you have a slightly different title for your post or that you are trying to use it for a page.

PHP XPATH parse HTML how to get inner text BEFORE another nested tag

i'm ussing xpath to parse html with no problems until I found the code below.
I usually use the "textContent" property one I got this td with ax xpath query, BUT I need to get only the text BEFORE the <img tag.
<td class="rowdispari">
ZONA NON SERVITA QUOTIDIANAMENTE-PROSSIMA CONSEGNA
<img onmouseover="caricaTool()" src="template/img/infoTip.png" width="17">
<div class="bottom" id='tooool'>
<div class="contenuto">
<div class="top">
<font class="testobold"><font class='testoblubold'>ZONA NON SERVITA QUOTIDIANAMENTE - PROSSIMA CONSEGNA </font><br>La località di destinazione non è tra quelle servite quotidianamente da SDA. La consegna avverrà al più presto possibile, compatibilmente con le operazioni logistiche.</font>
<p> <br><u>Chiudi</u>
</div>
</div>
</div>
</td>
You can probably use:
//td[#class="rowdispari"][img[#src="template/img/infoTip.png"]]/text()[1]
or:
//td[#class="rowdispari"]/text()[following-sibling::img[#src="template/img/infoTip.png"]][1]
Assuming that you already have XPath to get the outer <td> element, you can simply append the XPath with /text()[1] to get the first text node that is direct child of current <td> element :
path_to_td_here/text()[1]
more concrete example :
//td[#class='rowdispari']/text()[1]

how to scrape the external url using php simple html dom parser

<div id="lyrics">
<img />
<span id="line_31" class="line line-s">En medio de este tropico mortal</span
<br>
<span id="line_32" class="line line-s">Roots and creation, come again!</span>
<br>
<span id="line_33" class="line line-s">So mi guardian, mi guardian mi lift up di plan</span>
<span id="line_34" class="line line-s">Now everybody a go' do dis one</span>
<span id="line_35" class="line line-s">Like in down di Caribbean</span>
<span id="line_36" class="line line-s">San Andrés, Providence Island</span>
<br>
</div>
Here I have a div, inside div there is multiple span and br tag between span. I want to scrape the span text and br tag as it is. so how can i scrape with php simple dom parser.
thanks for any help.
Let's say the html file you have above is called "index.html".
$html = file_get_html("index.html");
$element = $html->find('div#lyrics');
$result = $element->innertext;
You want to consult the manual: http://simplehtmldom.sourceforge.net/

[php]remove html code from string

in this variable: $this->item->text I have this string:
<!-- JoomlaWorks "Disqus Comment System for Joomla!" Plugin (v2.2) starts here -->
<div class="itp-fshare-floating" id="itp-fshare" style="position:fixed; top:30px !important; left:50px !important;"></div><p>Lorem Ipsum è un testo segnaposto utilizzato nel settore della tipografia e della stampa. Lorem Ipsum è considerato il testo segnaposto standard sin dal sedicesimo secolo, quando un anonimo tipografo prese una cassetta di caratteri e li assemblòdei fogli di caratteri trasferibili “Letraset”,</p>
<p style="text-align: center;"><span class="easy_img_caption" style="display:inline-block;line-height:0.5;vertical-align:top;background-color:#F2F2F2;text-align:left;width:150px;float:left;margin:0px 10px;"><img src="/joomla/plugins/content/imagesresizecache/8428e9c26f1d8498ece730c0aa6aa023.jpeg" border="0" alt="1" title="1" style="width:150px; height:120px; ;margin:0;" /><span class="easy_img_caption_inner" style="display:inline-block;line-height:normal;color:#000000;font-size:8pt;font-weight:normal;font-style:normal;padding:4px 8px;margin:0px;">1</span></span></p>
<p>che contenevano passaggi del Lorem Ipsum, e più recentemente da software di impaginazione come Aldus PageMaker</p>
<!-- Disqus comments counter and anchor link -->
<a class="jwDisqusListingCounterLink" href="http://clouderize.it/joomla/index.php?option=com_content&view=article&id=8:recensione&catid=3:recensione-dei-servizi-di-cloud-computing&Itemid=4#disqus_thread" title="Add a comment">
Add a comment</a>
<!-- JoomlaWorks "Disqus Comment System for Joomla!" Plugin (v2.2) ends here -->
<div class="cp_tags">
<span class="cp_tag_label">Tags: </span><span class="cp_tag cp_tag_6">Recensioni
</span> </div>
So with this code I extract
<span class="easy_img_caption......></span>
Code (I am using this library called phpQuery http://goo.gl/rSu3k):
include_once('includes/phpQuery.php');
$doc = phpQuery::newDocument($this->item->text);
$extraction=pq('.easy_img_caption:eq(0)')->htmlOuter();
echo"<textarea>".$extraction."</textarea>";
So my question is:
How can I remove $extraction string from $this->item->text?
Thank you.
I'll assume phpQuery is some library aiding with dom-parsing in php?
Anyway, to accomplish this, you don't exactly need this external library. It can easily be accomplished with a regular expression replace:
$text = preg_replace('/<span.*?class="[^"]*?easy_img_caption[^"]*?".*?>.*?<\/span>/s', '', $this->item->text);
echo "<textarea>" . $text . "</textarea>";

Categories