open file and remove first tag - php

This would be a HTML file:
<li class="msgln">hello</li><li class="msgln">hi</li><li class="msgln">hey</li>
And php script:
$fp = fopen("file.html", 'a');
....
fclose($fp);
How to remove first <li class="msgln">hello</li>?
Content in <li> is dynamically changed

This will work even if the first li would contain other nested lis:
<?php
$doc = new DOMDocument();
$doc->loadHTML('<li class="msgln">hello</li><li class="msgln">hi</li><li class="msgln">hey</li>
');
$root = $doc->documentElement;
$p = $doc->documentElement->childNodes->item(0)->childNodes;
$li = $doc->getElementsByTagName('li')->item(0);
$li->parentNode->removeChild($li);
$html = '';
foreach ($root->childNodes->item(0)->childNodes as $child) {
$html .= $doc->saveXML($child);
}
echo $html;
?>
using regex may cause unexpected results.

You can use preg_replace to achieve this:
$html = file_get_contents('file.html');
$html = preg_replace('#^<li[^>]*>[^<]+</li>#i', '', $html);

If the content of the file is exactly as described then you could use strip_tags() such like:
$fp = fopen("file.html", 'a');
$content = fread($fp);
$content = strip_tags($content);
fclose($fp);
Alternatively you could use regular expressions but this would be slower.

$fp = fopen("file.html", 'a');
$content = fread($fp);
$text = preg_replace( "/<li.+?>.+?<\/li>/is", "", $content, 1 );
fclose($fp);

try this (without regex)
//string contains the file value
$string = '<li class="msgln">hello</li><li class="msgln">hi</li><li class="msgln">hey</li>';
$tag = '</li>';
$lis = explode($tag, $string);
if(count($lis) > 0) {
unset($lis[0]);
$string = implode($tag, $lis);
}

Related

How to remove all kind of non-breaking spaces with php

I am saving a string from Html file into my database.
I fail to get the string trimmed and clean of whitespaces.
I created this simplified function to summarize the problem and what I've tried so far.
<?php
function get_content($html)
{
$dom = new DOMDocument();
$dom->loadHTML($html);
$div = $dom->getElementById('whitespace');
$content = $div->textContent;
# Goal: trim leading, trailing, and non-breaking space
$content = str_replace(' ','',$content);
$content = str_replace('U+00A0','',$content);
$content = str_replace('\u00a0','',$content);
$content = str_replace('\xa0','',$content);
$content = str_replace(chr(160),'',$content);
$content = trim($content);
return $content;
}
file_put_contents(
'trim.output',
get_content('<div id="whitespace"> TuffToTrim</div>'
));
?>
The output is:
      TuffToTrim
While I'd like it to be:
TuffToTrim
I'm kind of desperate at this point :) Any ideas?
Instead of
$content = str_replace(' ','',$content);
$content = str_replace('U+00A0','',$content);
$content = str_replace('\u00a0','',$content);
$content = str_replace('\xa0','',$content);
$content = str_replace(chr(160),'',$content);
$content = trim($content);
You should use
$content = preg_replace('/[\s]+/mu', '', $content);
It should be converted to HTML entities first. Then you should be able to replace characters.
$content = htmlentities($content, null, 'utf-8');
$content = str_replace(" ", "", $content);

Move part of string inside the anchor tag

I have a string that is generated via a function.
$string = function();
It generates something like:
$string = '<ul><li>Test(10)</li>';
My question is, how do I move (10) part into the end of the anchor tag, so we have:
$string = '<ul><li>Test (10)</li>';
I want to do this to all anchor tags in the list items.
What's the appropriate PHP approach?
Just use srt_replace function for your string variable like below:
if(strpos($string,"</a>")) {
$string = str_replace('</a>',' ', $string);
// output <ul><li><a href="">Test'(10)</li>
echo $string = str_replace('</li>','</a></li>', $string);
// output <ul><li>Test (10)</li>
}
Using HTML DOM you can change/modify element values.
For e.g.,
<?php
$string = '<ul><li>Test(10)</li></ul>';
echo $string; // Old strings.
$dom_document = new DOMDocument();
$dom_document->loadHTML($string);
$new_string = "";
foreach($dom_document->getElementsByTagName('ul') as $ul){ // For all ul.
$new_string .= "<ul>";
foreach($ul->childNodes as $li){ // For all li.
$new_string .= "<li>";
$i=0;
foreach($li->childNodes as $a){
if(isset($a->attributes[0]->value)){
$href = $a->attributes[0]->value;
}
if($i===0){
$new_string .= '<a href="'.$href.'">';
}
$new_string .= $a->nodeValue;
$i++;
}
$new_string .= "</a>";
$new_string .= "</li>";
}
$new_string .= "</ul>";
}
echo $new_string; // New generated strings.
?>
See Fiddle.

Removing images from paragraph tags

I have the following code which pulls out the blockquote and puts my WordPress post content in <p> tags.
<?php
$content = preg_replace('/<blockquote>(.*?)<\/blockquote>/', '', get_the_content());
$content = wpautop($content); // Add paragraph-tags
$content = str_replace('<p></p>', '', $content); // remove empty paragraphs
echo $content;
?>
However it puts the images in <p> tags which I don't want
Here is some code that should do it (not tested).
<?php
$content = preg_replace('/<blockquote>(.*?)<\/blockquote>/', '', get_the_content());
$content = wpautop($content); // Add paragraph-tags
$content = str_replace('<p></p>', '', $content); // remove empty paragraphs
$content = preg_replace('/<p>\s*(<a .*>)?\s*(<img .* \/>)\s*(<\/a>)?\s*<\/p>/iU', '\1\2\3', $content); // remove paragraphs around img tags
echo $content;
?>
On the line after the str_replace you could use this domDocument method:
$dom = new domDocument;
$dom->loadHTML($content);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagname('img');
$removeList = array();
foreach ($images as $domElement) {
$removeList[] = $domElement;
}
foreach ($removeList as $toRemove) {
$toRemove->parentNode->removeChild($toRemove);
}
$content = $dom->saveHTML();
(ps: this will also give you a non preg_replace method, not that it really matters)

parsing script isnt putting out

<?
$file = "http://www.google.com";
$doc = new DOMDocument();
echo #$doc->loadHTML(file_get_contents($file));
$element = $doc->getElementsbyTagName('span');
echo trim($element->item(0)->nodeValue);
echo trim($element->item(0)->textContent);
if (!is_null($element)) {
$content = $element->nodeValue;
if (empty($content)) {
$content = $element->textContent;
}
echo $content . "\n";
}
?>
i am trying to test this script and am wondering why can't i parse google? if you look into the source page, hit ctrl+f type in span there is obviously a span tag. why isn't it giving me results??
<?php
$file = 'http://www.google.com';
$doc = new DOMDocument();
# $doc->loadHTML(file_get_contents($file));
$element = $doc->getElementsByTagName('span');
if (0 != $element->length)
{
$content = trim($element->item(0)->nodeValue);
if (empty($content))
{
$content = trim($element->item(0)->textContent);
}
echo $content . "\n";
}
?>
Not 100% sure, but doesnt allow_url_fopen need to be enabled in php.ini for this to work?
code removed

Why am I having issues with variable scope in an external file function?

Within a function on my parent file, I am calling a function from an external php file. Here is my (simplified) code:
Parent file:
include "HelperFiles/htmlify.php";
function funcName(){
$description = "some sample text";
$description = htmlify($description, "code");
echo $description;
};
funcName();
htmlify.php file with called function:
$text = "";
function htmlify($text, $format){
if (is_array($_POST)) {
$html = ($_POST['text']);
} else {
$html = $text;
};
$html = str_replace("‘", "'", $html); //Stripping out stubborn MSWord curly quotes
$html = str_replace("’", "'", $html);
$html = str_replace("”", '"', $html);
$html = str_replace("“", '"', $html);
$html = str_replace("–", "-", $html);
$html = str_replace("…", "...", $html);
if ($format == "code"){
$html = str_replace(chr(149), "•",$html);
$html = str_replace(chr(150), "—",$html);
$html = str_replace(chr(151), "—",$html);
$html = str_replace(chr(153), "™",$html);
$html = str_replace(chr(169), "©",$html);
$html = str_replace(chr(174), "®",$html);
$trans = get_html_translation_table(HTML_ENTITIES);
$html = strtr($html, $trans);
$html = nl2br($html);
$html = str_replace("<br />", "<br>",$html);
$html = preg_replace ( "/(\s*<br>)/", "\n<br>", $html ); // seperate lines for each <br>
//$text = str_replace ( "&#", "&#", $text );
//return htmlspecialchars(stripslashes($text), ENT_QUOTES, "UTF-8");
return htmlspecialchars($html, ENT_QUOTES, "UTF-8");
}
else if ($format == "clean"){
return $html;
}
};
I'm getting the following error:
Notice: Undefined index: text in C:_Localhost_Tools\HelperFiles\htmlify.php on line 25
I've tried declaring the $text variable inside and outside of scope in multiple places but can not seem to get around this error (warning). Any help would be greatly appreciated! Thanks.
replace
if (is_array($_POST)) {
with
if (isset($_POST['text'])) {
and you should not get the warning anymore.
However I would recommend to remove this alltogether. The function parameter should always be used - everything else is confusing.
And you can also remove the first line in htmlify.php - that does basically nothing.
The error message reads undefined Index, not undefined variable. Look at all the places where you're trying to access an associative variable with text as key, $_POST['text'] seems to me to be your best bet, there's nothing that suggests that you're dealing with $_POST data AFAIK...

Categories