how to give line breaks in xml file using php? - php

I am display mysql data in xml file using php.
there I used this one in here i want to give line breaks .if we give line breaks that will display line break tag in content ..we have to give html tags .but we dont show them in xml content ...
the output is coming like this..
At the same time I want to remove that empty p tags also ...that is.
<![CDATA[ <p> </p>]]>
this is the code i have written for xml ...
please solve this problems
header("Content-Type: application/xml; charset=utf-8");
date_default_timezone_set("Asia/Calcutta");
$this->view->data=$this->CallModel('posts')- >GalleryAndContent();
$xml = '
';
$xml.='Thehansindia
http://www.thehansindia.com
Newspaper with a difference';
foreach($this->view->data as $values)
{
$output=strip_tags($values['text_data'],"");
$output = preg_replace('/(<[^>]+) style=".?"/i', '$1',$output);
$output = preg_replace('/(<[^>]+) class=".?"/i', '$1', $output);
$output=preg_replace( '/style=(["\'])[^\1]?\1/i', '', $output, -1 );
$output=preg_replace("/<([a-z][a-z0-9])[^>]*?(/?)>/i",'',$output);
$output=str_replace(array("",""),array("",""),$output);
$output=str_replace(array("",""),array("",""),,$output);
//$xml.="<CONTENT>"."<![CDATA[".$output."]]>"."</CONTENT>
$xml.= '<item>';
$dom = new DOMDocument;
#$dom->loadHTML($output);
$xml.="<CONTENT>";
foreach ($dom->getElementsByTagName('p') as $tag){
//$tag->nodeValue=str_replace("<![CDATA[ <p> </p> ]]>","",$tag->nodeValue);
if(!empty($tag->nodeValue)){
//$tag->nodeValue=str_replace("<![CDATA[ <p>& & &</p> ]]>","",$tag->nodeValue);
$xml.="<![CDATA["."<p>".stripslashes($tag->nodeValue)."</p>"."]]>";
}
}
$xml.="</CONTENT>";
$xml.= ' </item>';
}

Example:
//Next replace all new lines with the unicode:
$xml = str_replace("\n","
", $xml);
Reference Link

Related

What is the XPATH query to extract contents of a class from a div on a webpage in php?

I have written the following code but it just returns empty data :
enter code here
$code="CS225";
$url="https://cs.illinois.edu/courses/profile/{$code}";
echo $url;
$html = file_get_contents($url);
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$pokemon_doc->loadHTML($html);
libxml_clear_errors();
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query("//div[#id='extCoursesDescription']");
if($pokemon_row->length > 0){
foreach($pokemon_row as $row){
echo $row->nodeValue . "<br/>";
}
}
}
the website that i am trying to scrape is : https://cs.illinois.edu/courses/profile/CS225
The course content seems to be loaded on the source by the page on loading. But if you go through the source that is loaded you get to ...
<script type='text/javascript' src='//ws.engr.illinois.edu/courses/item.asp?n=3&course=CS225'></script>
From this you can track through to the url http://ws.engr.illinois.edu/courses/item.asp?n=3&course=CS225 and this gives you the actual content your after. So rather than the original URL, use this new one and you should be able to extract the information from there.
Although this content is all wrapped in document.write()'s.
Update:
To remove the document() bits - a simple way is to just process the content...
$html = file_get_contents($url);
$html = str_replace(["document.write('","');"], "", $html);
$html = str_replace('\"', '"', $html);

PHP DOMDocument node.Value Replacement

I have 3 p tags in email.php
$output='<p>Hey Jim</p>';
$output.='<p>We appreciate you are looking at using our services!</p>';
$output.='<p>Thanks Again</p>';
I want to be able to replace the text within those p tags on the fly from test.php with the text from newp1, newp2, and newp3.
$newp1 = "Hello Mark";
$newp2 = "We have scheduled your pick-up for tomorrow morning.";
$newp3 = "Any questions gives us a call.";
$url = 'email.php';
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('p');
foreach($nodes as $item ){
echo $item->nodeValue.'<br>';
}
I am currently echoing them to see them, but have no clue on how to actually replace them.
No DOMDocument required, in this example:
You can use in email.php something like that:
$output='<p>##msg1##</p>';
$output.='<p>##ms2##</p>';
$output.='<p>##msg3##</p>';
and in test.php:
$html = str_replace("##msg1##", $newp1, $html);
$html = str_replace("##msg2##", $newp2, $html);
$html = str_replace("##msg3##", $newp3, $html);

php : parse html : extract script tags from body and inject before </body>?

I don't care what the library is, but I need a way to extract <.script.> elements from the <.body.> of a page (as string). I then want to insert the extracted <.script.>s just before <./body.>.
Ideally, I'd like to extract the <.script.>s into 2 types;
1) External (those that have the src attribute)
2) Embedded (those with code between <.script.><./script.>)
So far I've tried with phpDOM, Simple HTML DOM and Ganon.
I've had no luck with any of them (I can find links and remove/print them - but fail with scripts every time!).
Alternative to
https://stackoverflow.com/questions/23414887/php-simple-html-dom-strip-scripts-and-append-to-bottom-of-body
(Sorry to repost, but it's been 24 Hours of trying and failing, using alternative libs, failing more etc.).
Based on the lovely RegEx answer from #alreadycoded.com, I managed to botch together the following;
$output = "<html><head></head><body><!-- Your stuff --></body></html>"
$content = '';
$js = '';
// 1) Grab <body>
preg_match_all('#(<body[^>]*>.*?<\/body>)#ims', $output, $body);
$content = implode('',$body[0]);
// 2) Find <script>s in <body>
preg_match_all('#<script(.*?)<\/script>#is', $content, $matches);
foreach ($matches[0] as $value) {
$js .= '<!-- Moved from [body] --> '.$value;
}
// 3) Remove <script>s from <body>
$content2 = preg_replace('#<script(.*?)<\/script>#is', '<!-- Moved to [/body] -->', $content);
// 4) Add <script>s to bottom of <body>
$content2 = preg_replace('#<body(.*?)</body>#is', '<body$1'.$js.'</body>', $content2);
// 5) Replace <body> with new <body>
$output = str_replace($content, $content2, $output);
Which does the job, and isn't that slow (fraction of a second)
Shame none of the DOM stuff was working (or I wasn't up to wading through naffed objects and manipulating).
To select all script nodes with a src-attribute
$xpathWithSrc = '//script[#src]';
To select all script nodes with content:
$xpathWithBody = '//script[string-length(text()) > 1]';
Basic usage(Replace the query with your actual xpath-query):
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
foreach($xpath->query('//body//script[string-length(text()) > 1]') as $queryResult) {
// access the element here. Documentation:
// http://www.php.net/manual/de/class.domelement.php
}
$js = "";
$content = file_get_contents("http://website.com");
preg_match_all('#<script(.*?)</script>#is', $content, $matches);
foreach ($matches[0] as $value) {
$js .= $value;
}
$content = preg_replace('#<script(.*?)</script>#is', '', $content);
echo $content = preg_replace('#<body(.*?)</body>#is', '<body$1'.$js.'</body>', $content);
If you're really looking for an easy lib for this, I can recommend this one:
$dom = str_get_html($html);
$scripts = $dom->find('script')->remove;
$dom->find('body', 0)->after($scripts);
echo $dom;
There's really no easier way to do things like this in PHP.

php dom change nodeValue in anchor

I am trying to change NodeValue and save it to variable (or print it)
$html = '<html><body>
some a
some b
</body></html>';
libxml_use_internal_errors(true); // ignore malformed HTML
$xml = new DOMDocument();
$xml->loadHTML($html);
foreach($xml->getElementsByTagName('a') as $link) {
$link->nodeValue = $link->nodeValue . ' --- ' . $link->getAttribute('href');
}
print_r($html);
should print
<html><body>
some a --- a.html
some b --- b.html
</body></html>
but it won't. What am I doing wrong?
You're not actually changing $html, you are changing your DomDocument variable $xml. Instead of
print_r($html);
You need to:
echo $xml->saveHTML()

How can I use php to remove tags with empty text node?

How can I use php to remove tags with empty text node?
For instance,
<div class="box"></div> remove
remove
<p></p> remove
<span style="..."></span> remove
But I want to keep the tag with text node like this,
link keep
Edit:
I want to remove something messy like this too,
<p><strong></strong></p>
<p><strong></strong></p>
<p><strong></strong></p>
I tested both regex below,
$content = preg_replace('!<(.*?)[^>]*>\s*</\1>!','',$content);
$content = preg_replace('%<(.*?)[^>]*>\\s*</\\1>%', '', $content);
But they leave something like this,
<p><strong></strong></p>
<p><strong></strong></p>
<p><strong></strong></p>
One way could be:
$dom = new DOMDocument();
$dom->loadHtml(
'<p><strong>test</strong></p>
<p><strong></strong></p>
<p><strong></strong></p>'
);
$xpath = new DOMXPath($dom);
while(($nodeList = $xpath->query('//*[not(text()) and not(node())]')) && $nodeList->length > 0) {
foreach ($nodeList as $node) {
$node->parentNode->removeChild($node);
}
}
echo $dom->saveHtml();
Probably you'll have to change that a bit for your needs.
You should buffer the PHP output, then parse that output with some regex, like this:
// start buffering output
ob_start();
// do some output
echo '<div id="non-empty">I am not empty</div><a class="empty"></a>';
// at this point you want to output the contents to the client
$contents = ob_get_contents();
// end buffering and flush
ob_end_flush();
// replace empty html tags
$contents = preg_replace('%<(.*?)[^>]*>\\s*</\\1>%', '', $contents);
// echo the sanitized contents
echo $contents;
Let me know if this helps :)
You could do a regex replace like:
$updated="";
while($updated != $original) {
$updated = $original;
$original = preg_replace('!<(.*?)[^>]*>\s*</\1>!','',$updated);
}
Putting it in a while loop should fix it.

Categories