Store and display scraped content from xml file - php

As of my previous question, I scraped the content and displayed in a html page using the code below:
<?php
include_once('simple_html_dom.php');
$target_url="http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031";
$html = new simple_html_dom();
$html->load_file($target_url);
?>
<html>
<body>
<div style="margin:auto;width:900px">
<?php
foreach($html->find('div[class=zg_itemWrapper]') as $post)
{
echo $post;
}
?>
</div>
</body>
</html>
I want to store the same in an Xml file and display 10 items each time I scroll down the page using jQuery's window.scroll() function.
My question is how do I store this scraped data in Xml file for displaying? (instead of using a database or similar ways to store)I couldn't find any proper solution for doing the same.I'm new to using xml this way. An implementation would really help,
thank you

You need to create page that will send only post with page parameter that will print n page. and then you can use infinite ajax scrol jquery plugin.
UPDATE: Here is the code how to use the plugin, using this just create a php script that will have just the foreach loop with post just for n page.

Related

Form to pull page elements using file_get_contents and getElementsByClassName in PHP

I'm attempting to create a page where I input a url and the PHP code uses that to pull page elements from another website to be displayed on my blog post. I haven't even made it as far as the form, right now I just need to understand how to get this code to work so that it displays the page elements within the div with the class "products-grid first odd".
<?php
$homepage = file_get_contents('website');
$dochtml = new DOMDocument();
$dochtml->loadHTML($strhtml);
$dochtml->getElementsByClassName('products-grid first odd');
echo ????
?>
The PHP DOMDocument object does not appear to have the method getElementsByClassName().
Instead, I think you would have to getElementsByTagName() and then loop through those DOMElements and getAttribute('class') on each and check until you find the right one.

simple html php dom parser a way to gather information from all pages?

I wanted to know if its possible to make this code search every page from that website so it pulls every image src from all pages. Currently it will only pull the image src from that one page. I tried using a while loop but it only repeats the same results from the main page over and over. Any help would be great.
<?php
include_once('simple_html_dom.php');
//show errors
ini_set('display_errors', true);
error_reporting(E_ALL);
$html = file_get_html('http://betatv.net/');
$result=($html);
while($html = ($result)) {
// find the show img and echo it out
foreach($html->find('.entry-content') as $cover_img)
foreach($cover_img->find('img') as $cover_img_link)
//echo the images src
echo $cover_img_link->src .'<br>';
echo '<br>';
}
// clean up memory
$html->clear();
unset($html);
?>
Proof that i own betatv.net i added a link to this question on the front page.
Here is a nice example of a page crawler:
How do I make a simple crawler in PHP?
You just need to use your piece of code for each link it finds.
Also if you own this page I bet there is a better way to find all images instead of crawling it from frontend.

setting data via Jquery/Javascript in a ob_get_contents

i might not be clear with my question title but here is the code..
<?php
$filename = 'myfile.htm';
ob_start();
<?PHP
<div id='test'>my original value</div>
?>
$htmlcontent = ob_get_contents();
file_put_contents("$filename", $htmlcontent);
ob_end_clean();
so this code will eventually create a new file and with the text 'my original value
is it possible if i want to alter the div's value through javascript/jquery before it could be transferred to the file?
why am i doing this? because i would eventually be adding a jquery graph library and want to save it to the file..
later using wkhtmltopdf to generate a pdf version of that html page..
No; You'll have to display the page along with all of the javascript you want to use. Then you create a form to gather the contents of the page (after its been manipulated by your graph library) and post it back to PHP, where it can be saved to file.
Hmm, well you can try one thing. I don't know how the content of myfile.htm looks like, but you can try to load this content with something like DOMDocument, use the loadHTML method, and getElementById.
so:
<div id="test1">value</div>
could be retrieved with
// pseudo
$dom = new DOMDocument::loadHTML('myfile.htm');
$dom->getElemenyById('test1');
$dom->saveHTMLFile('etc ..
execute a $.post and 'manipulate' the existing myfile.htm and overwrite it.
cheers

If a div has class "one-two-three" then

I'm currently working on a website. And I need to use PHP to select a div with a certain class. For example:
<body>
<div class="header"></div>
<div class="other-class"></div>
<div class="one-two-three"></div>
<div class="footer"></div>
</body>
<?php
if(div.hasClass('one-two-three'))
{
//function code here...
}
?>
But keep in mind that I want to use PHP and not jQuery for this...
If you want do manipulate the DOM, prior to sending it to the client then the Dom object is offers what you need. It even has similar methods to what you might know already from JS.
$htmlString = '<html>...</html>';//either generated, or loaded from a file
$dom = new DOMDocument;
$dom->loadHTML($htmlString);//or loadHTMLFile
$divs = $dom->getElementsByTagName('div');
foreach($divs as $div)
{
if ($div->hasAttribute('class') && strstr($div->getAttribute('class'), 'one-two-three'))
{
//$div ~= <div class='one-two-three'>
$div->appendChild();//check the docs to see what you can do
}
}
Here's an overview of the methods at your disposal
You probably will fill the div with content using php and then you can process your content first and then print it in the div. If you need to do stuff to the div's contents when the div's already in the browser you will need to use javascript to do it or else call a php script to do the processing using ajax to call the php script on the backend and retrieve the response.
use simplephpdom library for this. You can select form elements with selectors like you do in jquery.
http://simplehtmldom.sourceforge.net

Parsing HTML that has javascript with PHP

I'm trying to parse a HTML page where the majority of the content is contained in javascript. When I use the Chrome development tools I can see that the div class I'm trying to grab the content from is called div class=doodle-image. However when I either view the page as a source or try to grab it with php:
<?php
include_once('simple_html_dom.php');
$html = new simple_html_dom();
$html->load_file('http://www.google.com/doodles/finder/2012/All%20doodles');
$doodles = $html->find('.doodle-image');
echo $html;
?>
It returns the frame of the page but contains none of the divs or content. How can I grab the full content of the page?
That's because the element is empty when your PHP client fetches it, Google is loading in a JSON-object with JavaScript to populate the list of doodles. It does a Ajax-request to this page, and probably you can too.

Categories