Applying a CSS whitelist to HTML in PHP [closed] - php

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Lets say I have the following $string...
<span style='text-decoration:underline; display:none;'>Some text</span>
I only want to allow the style text-decoration, so I want a PHP function like the following...
$string = stripStyles($string, array("text-decoration"));
Similar to strip_tags, but using an array instead. So $string will now be...
<span style='text-decoration:underline;'>Some text</span>
I am using Cake, so if this can be done with Sanitize then all the better.

This is tricky, but you should be able to do it with DOMDocument. This should get you started, but it's likely to require some serious tweaking.
// Load your html string
$dom = new DOMDocument();
$dom->loadHTML($your_html_string);
// Get all the <span> tags
$spans = $dom->getElementsByTagName("span");
// Loop over the span tags
foreach($spans as $span) {
// If they have a style attribute that contains "text-decoration:"
// attempt to replace the contents of the style attribute with only the text-decoration component.
if ($style = $span->getAttribute("style")) {
if (preg_match('/text-decoration:([^;]*);/i', $style)) {
$span->setAttribute("style", preg_replace('/^(.*)text-decoration:([^;]*);(.*)$/i', "text-decoration:$2;", $style);
}
// Otherwise, erase the style attribute
else $span->setAttribute("style", "");
}
}
$output = $dom->saveHTML;
It's maybe better to attempt to parse the style attributes by explode()ing on ;
// This replaces the inner contents of the foreach ($spans as $span) above...
// Instead of the preg_replace()
$styles = explode(";", $style);
$replaced_style = FALSE;
foreach ($styles as $s) {
if (preg_match('/text-decoration/', $s) {
$span->setAttribute("style", $s);
$replaced_style = TRUE;
}
// If a text-decoration wasn't found, empty out the style
if (!$replaced_style) $span->setAttribute("style", "");
}

Related

Get the HREF from a link with a specific class [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
<a data-track='' _sp= class=s-item__link href=get_this_href>...</a>
With the above link, the data-track contains some json data. The _sp= could contain numbers/letters and a period (.). The class is s-item__link.
I would need the get_this_href and then I can go from there.
This is the regex I tried... but im stuck from here.
<a\b(?=[^>]* class="[^"]*(?<=[" ])s-item__link[" ])(?=[^>]* href="([^"]*))
Here is an example: https://regex101.com/r/rVPeUI/1
$link = ""; //url im scraping
$html = file_get_html($link);
//find is part of simple_html_dom.php. im saying each li item is an $item.
foreach ($html->find('li.s-item ') as $item) {
//$item contains the decent amount of nested divs with spans and links.
}
Without using Regex, its better to use DOMDocument() to parse HTML tags:
$doc = DOMDocument::loadHTML($html);
$xpath = new DOMXPath($doc);
$query = "//a[#class='s-item__link']";
$entries = $xpath->query($query);
foreach ($entries as $entry) {
echo "HREF " . $entry->getAttribute("href");
}

how to insert in x amount of paragraph with str_replace () [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm trying to insert an element with str_replace (), counting the number of paragraphs in the content, for example:
<?php
$result_information = "<p>parrafo 1</p> <p>parrafo 2</p> <p>parrafo 3</p>";
$result_information1 = str_replace("<p>[1]", "<p>cambio", $result_information);
echo $result_information1;
?>
I try to use <p>[1] , unfortunately it doesn't work for me, any way to get the first paragraph and replace it?
It would create an array from $result_information with preg_split() and then replace the first element of the array.
<?php
$result_information = "<p>parrafo 1</p> <p>parrafo 2</p> <p>parrafo 3</p> <p>parrafo 4</p>";
$result_information = preg_replace("/<\/p>(.*?)<p>/", "<p></p>", $result_information); # remove spaces
$array = preg_split("/<p>(.*?)<\/p>/", $result_information, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$array[0] = "cambio";
$array[2] = "cambio";
$result_information1 = "<p>" . implode($array, "</p><p>"). "</p>";
echo $result_information1;
?>
IMHO it would be best to deal with this using DOMDocument. As always this is more complicated than using replace/regexes - but usually it's worth the effort as it handles the content as HTML rather than just plain text.
The main code for being able to process document fragments is heavily based on https://stackoverflow.com/a/29499398/1213708, all I have done is added the ability to refer to the paragraphs as you are after.
The p tags part is just the
$pTags = $doc->getElementsByTagName("p");
$pTags[1]->textContent = "cambio";
so first is to get a list of the p tags - you now have an array which you can set as in the second line of code.
$result_information = "<p>parrafo 1</p> <p>parrafo 2</p> <p>parrafo 3</p>";
$doc = new DOMDocument();
$doc->loadHTML("<div>$result_information</div>");
$pTags = $doc->getElementsByTagName("p");
$pTags[1]->textContent = "cambio";
$container = $doc->getElementsByTagName('div')->item(0);
$container = $container->parentNode->removeChild($container);
while ($doc->firstChild) {
$doc->removeChild($doc->firstChild);
}
while ($container->firstChild ) {
$doc->appendChild($container->firstChild);
}
echo $doc->saveHTML();

Get content from remote html page [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm looking for a way to get specific content from a remote web page
The content I want to get are inside javascript variables, this kind :
var Example1 = 0;
var Example2 = 14;
The name of the variable remain the same and the content is only numbers
Thank you
Find scripts in html source by DomDocument and then variable declaration by regex
$DOM = new DomDocument();
$DOM->loadHTML( $output);
$res = [];
$scripts = $DOM->getElementsByTagName('script');
$lnt = $scripts->length;
for($i=0; $i < $lnt; $i++) {
preg_match_all('/var\s+(\w+)\s*=\s*(\d+)\s*;/', $DOM->saveHtml($scripts->item($i)), $m);
$res = array_merge($res, array_combine($m[1], $m[2]));
}
print_r($res);
demo

Assign text between tags to an variable [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Is it possible to get the text between <p></p> tags and set this in a variable?
<p>blabla</p> So i would like to get the text "blabla" and set this into a php variable so the variable would have the text value like this:.
<?$test = blabla;?>
Try:
$html = "<p>blabla</p>";
$dom = new DOMDocument;
$dom->loadXML($html);
$arr = $dom->getElementsByTagName('p');
foreach ($arr as $value) {
echo $value->nodeValue; // result => blabla
}
There are many methods which can be used based on your needs so take a look on documentation
DOMDocument
You can use this function, it is self explanatory:
function getTextBetweenTags($string, $tagname)
{
$pattern = "/<$tagname>(.*?)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
?>

php replace of keys inside a string [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have an array of keys and a medium/long string.
I need to replace only max 2 keys that I found in this text with the same keys wrapped with a link.
Thanks.
ex.:
$aKeys = array();
$aKeys[] = "beautiful";
$aKeys[] = "text";
$aKeys[] = "awesome";
...
$aLink = array();
$aLink[] = "http://www.domain1.com";
$aLink[] = "http://www.domain2.com";
$myText = "This is my beautiful awesome text";
should became "This is my <a href='http://www.domain1.com'>beautiful</a> awesome <a href='http://www.domain2.com'>text</a>";
Don't really understood what you need but you can do something like:
$aText = explode(" ", $myText);
$iUsedDomain = 0;
foreach($aText as $sWord){
if(in_array($sWord, $aKeys) and $iUsedDomain < 2){
echo "<a href='".$aLink[$iUsedDomain++]."'>".$sWord."</a> ";
}
else{ echo $sWord." "; }
}
So, you could use a snippet like this. I recommend you to update this code by using clean classes instead of stuff like global - just used this to show you how you could solve this with less code.
// 2 is the number of allowed replacements
echo preg_replace_callback('!('.implode('|', $aKeys).')!', 'yourCallbackFunction', $myText, 2);
function yourCallbackFunction ($matches)
{
// Get the link array defined outside of this function (NOT recommended)
global $aLink;
// Buffer the url
$url = $aLink[0];
// Do this to reset the indexes of your aray
unset($aLink[0]);
$aLink = array_merge($aLink);
// Do the replace
return ''.$matches[1].'';
}

Categories