I am using PHP HTML DOM Parser to get data from another site. First i get URLs of my trades on this site and than i send another request on each trade url to get comments .I want to make an array of comments so i can sort them later. Why i cant create array ?
It looks like this
include_once('simple_html_dom.php');
$result = array();
$html = file_get_html('http://csgolounge.com/profile?id='.$steamid);
foreach($html->find('div.tradepoll') as $trade)
{
$tradeid = $trade->find('.tradeheader')[0]->find('a')[0]->href;
$html = file_get_html('http://csgolounge.com/'.$tradeid);
foreach($html->find('div.message') as $message)
{
if($message->find('p',0)){}
else
{
$left = $message->find('.msgleft')[0];
$right = $message->find('.msgright')[0];
//information about comments
$time = trim(strip_tags_content($left->innertext));
$text = $left->find('.msgtxt')[0];
$result[$time]['time'] = $time;
$result[$time]['text'] = $text;
}
}
}
echo json_encode($result);
If i echo $time or $text i always get data successfully.
I found what was the problem.
The Simple HTML DOM Parser does not clean up memory in the DOM each time file_get_html or str_get_html is called so it needs to be done explicity each time you have finished with the current DOM.
So I added $html->clear(); at the end of the loop.
Credits: electrictoolbox.com
Related
I want to extract the content of the script on this page, which has the ID __NEXT_DATA__ using PHP Simple HTML DOM, the code I wrote is this:
foreach($html_base->getElementsByTagName('script') as $element) {
if (isset($element->id)){
$id = $element->id;
if ($id == "__NEXT_DATA__"){
$f = $element->nodeValue;
echo $f;
break;
}
}
}
but unfortunately it gives me the following error:
Undefined property: DOMElement::$id
You can use simple html dom documentation but here's my suggestion:
$html = file_get_html("url");
$script = $html->find("script[id=__NEXT_DATA__]", 0)->innertext;
the second parameter which is 0, is the index of the searched results and because it's only one script with this id, you can take the first result.
I'm trying to parse RSS feeds from some medias. My script works for most of them. The problem is that I need to agregate all of them, eventhough they are malformed.
I don't manage to get the description of these two feeds. How could I proceed anyway ?
Here is my script :
<?php
function RSS_items ($url) {
$i = 0;
$doc = new DOMDocument();
$doc->load($url);
$channels = $doc->getElementsByTagName('channel');
foreach($channels as $channel) {
$items = $channel->getElementsByTagName('item');
foreach($items as $item) {
$i++;
$y[$i]['title'] = $item->getElementsByTagName('title')->item(0)->firstChild->textContent;
$y[$i]['link'] = $item->getElementsByTagName('link')->item(0)->firstChild->textContent;
$y[$i]['updated'] = $item->getElementsByTagName('pubDate')->item(0)->firstChild->textContent;
$y[$i]['description'] = $item->getElementsByTagName('description')->item(0)->firstChild->textContent;
}
}
echo '<pre>';
print_r ($y);
echo '</pre>';
}
// the two malformed feeds
RSS_items ('http://www.lefigaro.fr/rss/figaro_actualites-a-la-une.xml');
RSS_items ('https://francais.rt.com/rss');
?>
Problem of your code is in useing firstChild property that select first child of element. But in target XML, description tag hasn't any childs that you want to select first of them. Remove it from code. The result should be like this
$item->getElementsByTagName('description')->item(0)->textContent;
I load a html page with PHP Dom Document :
$doc = new DOMDocument();
#$doc->loadHTMLFile($url);
I search in my page all "a" elements, and if they realize my condition i need to replace for example My link is beautiful by just My link is beautiful
Here my loop :
$liens = $div->getElementsByTagName('a');
foreach($liens as $lien){
if($lien->hasAttribute('href')){
if (preg_match("/metz2/i", $lien->getAttribute('href'))) {
//HERE I NEED TO REPLACE </a>
}
$cpt++;
}
}
Do you have any ideas ? Suggestions ? Thanks :)
Every time i need to manage DOM with PHP, i use a framework called PHP Simple HTLM DOM parser. (Link here)
It's very easy to use, something like this might work for you:
// Create DOM from URL or file
$html = file_get_html('http://www.page.com/');
// Find all links
foreach($html->find('a') as $element) {
//Do your custom logic here if you need it, for example this extracts the inner contents of the a-tag, and puts it freely.
$inner = $element->innertext;
$element->outertext($inner);
}
//To echo modified html again:
echo $html;
Could be done with preg_replace as well:
$sText = 'Stackoverflow';
$sText = preg_replace( '/<a.*>(.*)<\/a>/', '$1', $sText );
echo $sText;
I have a php file setup to pull through ONE XML data feed, What I would like to do is load up to 4 feeds into it and if possible make it select a random item too. Then parse that into an jQuery News Ticker.
My current PHP is as follows...
<?php
$feed = new DOMDocument();
$feed->load('/feed');
$json = array();
$json['title'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$json['description'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
$json['link'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('link')->item(0)->firstChild->nodeValue;
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
$json['item'] = array();
$i = 0;
foreach($items as $item) {
$title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$description = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
$pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
$guid = $item->getElementsByTagName('guid')->item(0)->firstChild->nodeValue;
$json['item'][$i++]['title'] = $title;
$json['item'][$i++]['description'] = $description;
$json['item'][$i++]['pubdate'] = $pubDate;
$json['item'][$i++]['guid'] = $guid;
echo '<li class="news-item">'.$title.'</li>';
}
//echo json_encode($json);
?>
How can I modify this to load more than one feed into the file?
Thanks in advance
The simplest approach to doing this is wrapping another loop around the code you have. It's not the cleanest way but will probably suffice for the purpose.
In general, IMO, it's always beneficial to learn the basics of the language first. E.g. PHP manual on foreach
This is roughly what the loop needs to look like:
$my_feeds = array("http://.....", "http://.....", "http://.....");
foreach ($my_feeds as $my_feed)
{
// This is where your code starts
$feed = new DOMDocument();
$feed->load($my_feed); <--------------- notice the variable
$json = array();
... and the rest of the code
}
this will walk through all the URLs in $my_feeds, open the RSS source, fetch all the items from it, and output them.
If I'm reading your question right, what you may want to do is turn your code into a function, which you would then run inside a foreach loop for each url (which you could store in an array or other data structure).
Edit: If you don't know much about functions, this tutorial section might help you. http://devzone.zend.com/9/php-101-part-6-functionally-yours/
I need read in and parse data from a third party website which sends XML data. All of this needs to be done server side.
What is the best way to do this using PHP?
You can obtain the remote XML data with, e.g.
$xmldata = file_get_contents("http://www.example.com/xmldata");
or with curl. Then use SimpleXML, DOM, whatever.
A good way of parsing XML is often to use XPP (XML Pull Parsing) librairy, PHP has an implementation of it, it's called XMLReader.
http://php.net/manual/en/class.xmlreader.php
I would suggest you to use DOMDocument (PHP inline built class)
A simple example of its power could be the following code:
/***********************************************************************************************
Takes the RSS news feeds found at $url and prints them as HTML code.
Each news is rendered in a <div class="rss"> block in the order: date + title + description.
***********************************************************************************************/
function Render($url, $max_feeds = 1000)
{
$doc = new DOMDocument();
if(#$doc->load($url, LIBXML_NOCDATA|LIBXML_NOBLANKS))
{
$feed_count = 0;
$items = $doc->getElementsByTagName("item");
//echo $items->length; //DEBUG
foreach($items as $item)
{
if($feed_count > $max_feeds)
break;
//Unfortunately inside <item> node elements are not always in same order, therefor we have to call many times getElementsByTagName
//WARNING: using iconv function instead of utf8_decode because this last one did not convert properly some characters like apostrophe 0x19 from techsport.it feeds.
$title = iconv('UTF-8', 'CP1252', $item->getElementsByTagName("title")->item(0)->firstChild->textContent); //can use "CP1252//TRANSLIT"
$description = iconv('UTF-8', 'CP1252', $item->getElementsByTagName("description")->item(0)->firstChild->textContent); //can use "CP1252//TRANSLIT"
$link = iconv('UTF-8', 'CP1252', $item->getElementsByTagName("link")->item(0)->firstChild->textContent); //can use "CP1252//TRANSLIT"
//pubDate tag is not mandatory in RSS [RSS2 spec: http://cyber.law.harvard.edu/rss/rss.html]
$pub_date = $item->getElementsByTagName("pubDate"); $date_html = "";
//play with date here if you want
echo "<div class='rss'>\n<p class='title'><a href='" . $link . "'>" . $title . "</a></p>\n<p class='description'>" . $description . "</p>\n</div>\n\n";
$feed_count++;
}
}
else
echo "<div class='rss'>Service not available.</div>";
}
I have been using simpleXML for a while.