How to scrape a javascript site using PHP, CURL [duplicate] - php

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do I render javascript from another site, inside a PHP application?
This is the site http://www.oferta.pl/strona_v2/gazeta_v2/ . This site is built totally on JavaScript. I want to scrape using PHP and curl. Currently I use DOMXPath. In the left menu there are some category to be selected. I see no 'form' there. How can I use curl to submit that form and scrap the output page?
I have used file_get_contents() only. It doesn't get all of the page. How can I proceed?
N.B : http://www.html-form-guide.com/php-form/php-form-submit.html I have found this example which have a 'form'. But my specified site has no 'form'.

You can not scrape it. Its possible. But its way too hard.
Simulate the http request by curl. Check every request it makes by ajax and try to simulate it.
Simulate Javascript executions (this part is almost impossible). Some requests contains values which are generated by Javascript. You need to do it in php. If they has some complicated algorithm implemented in JS you can invoke v8 javascript engine.

Related

How to access the HTML table from PHP [duplicate]

This question already has answers here:
What is the difference between client-side and server-side programming?
(3 answers)
Closed 8 years ago.
I have a page with a java script that adds a row to a table. On the same page there is a PHP script that should read the table and add it to the database. How do I read the table using PHP ?
What you're thinking of will not directly work.
A Javascript script is executed on the client side only, whereas a PHP is executed on the server side only, and before the datas are sent to the client.
PHP can not directly read the table.
What you can do is use AJAX to send the data from the user's browser to a PHP script that'll use them.
You'll find more infos here.
PHP is serverside programing, the PHP code serves data to the client - PHP can't read from the client.
If you add a row using javascript, PHP has no way of getting that info.
You'll need to use Ajax to solve your specific problem.
You cannot "read the table" with PHP. You will have to send the data of the table to the server (and read it with PHP) using JavaScript or CGI.
A well-known technique to do this is called Ajax.

Web scraper that handles JavaScript [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Make a JavaScript-aware Crawler
I'm trying to figure out what to use as the basis for a PHP based web scraper that can handle pages that render using JavaScript. Many web site scrape attempts (at least the ones I handle) now fail unless the JS in those pages is executed. The pages are not built to gracefully fall back to no-script implementations. This includes those that make heavy use of AJAX.
Would anyone have suggestions for where to start with the development of a web scraper that can handle modern and heavily JavaScript dependent web pages?
Something that can be used by PHP would be best.
It's possible to use a web browser engine in headless mode to load the page and analyze the DOM. Some googling pointed me at http://phantomjs.org/
Those sites that have heavy ajax usage, just call the same urls as the page does, and build your site content on that response rather than requesting the page.
Those sites that have heavy document.write or framework equivalent thereof, you could probably just strip space or match tags or relevant content using simple regex and again request the script responsible rather than the page that requests it ...
You could use Selenium which is a browser automation tool and then use one of the PHP bindings here, here, or here so you can automate Selenium from PHP.
You would have to have a JavaScript engine in PHP. Or some headless Webkit on the command line. And even then it would get hugely complicated. So the short answer would be: No, sorry, you can't do that.
PHP supports the V8 engine, so I guess you could pass over javascript to V8. Not a pretty thing to do though, I would use something else than straight PHP to do this.

Count Hyperlinks of a Website [duplicate]

This question already exists:
Closed 11 years ago.
Possible Duplicate:
How to parse HTML with PHP?
i want to write a php-program that count all hyperlinks of a website, the user can enter.
how to do this? is there a libary or something which i can parse and analyze the html about the hyperlinks?
thanks for your help
Like this
<?php
$site = file_get_contents("someurl");
$links = substr_count($site, "<a href=");
print"There is {$links} in that page.";
?>
Well, we won't be able to give you a finite answer but only pointers. I've done a search engine once out of php so the principle will be the same:
First of all you need to code your script as a console script, a web script is not really appropriate but it's all a question of tastes
You need to understand how to work with sockets in PHP and make requests, look at the php socket library at: http://www.php.net/manual/ref.network.php
You will need to get versed in the world of HTTP requests, learn how to make your own GET/POST requests and split the headers from the returned content.
Last part will be easy with regexp, just preg_match the content for "#()*#i" (the last expression might be wrong, i didn't test it at all ok?)
Loop the list of found hrefs, compare to already visited hrefs (remember to take into account wildcard GET params in your stuff) and then repeat the process to load all the pages of a site.
It IS HARD WORK... good luck
You may have to use CURL to fetech the contents of the webpage. Store that in a variable then parse it for hyperlinks. You might need regular expression for that.

Assign JavaScript variable to Smarty variable [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
how to assign a javascript variable to a smarty variable
I want to assign a textfield value to Smarty variable using JavaScript and call Smarty function but I could not do it.
Please guide me.
You cannot do that in an easy way. PHP, and by extension Smarty, is parsed and run on the server side before the browser get the data. JavaScript is run on the client side when the browser parses the HTML, CSS and Javascript that the server sent.
You will have to make a new HTTP request in some way, sending the new data. You can do that by reloading the entire web page and sending stuff in the querystring (after ? in the URL), or slightly more advanced by doing an Ajax call from your JS code and make JS do the changes of the page that you desire. The latter is more complex and requires some knowledge of Javascript, but the page does not need to be reloaded in its entirety.
Did you mean something like this?
<script>
var foo_bar = {$foo.bar|escape:javascript};
</script>
Note that, as mentioned above, the value is computed server-side.
UPD. I get it now, you wanted to pass value the other way around. No, that’s not possible.

using anchors #value in URL, can I call and use this value in php? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Can PHP read the hash portion of the URL?
Hi,
I want to go to a part of my web page where a div called bla bla bla is located.
Using this: http://www.mysite.com/mypage#28 I get there.
But I also need that number to process in php. Does the # work like a ? ($_get) as well?
How do I do that otherwise?
Thanks
I agree with middaparka, Hash values is a client side values can not be loaded from server side, as Facebook Dynamic Url Loading technique
so you can read it is value from a function, and call that function onload of page to do what you need.
In theory you can use the parse_url function to obtain this, via the PHP_URL_FRAGMENT option. However, in practice I'm pretty sure that there's no guarantee that the browser will pass this information to the server. (i.e.: It won't show up in $_SERVER['REQUEST_URI'], so there's no way to pass this information into parse_url in the first place.)
As such, you'd need to obtain this client-side via JavaScript and forward it to the server. To do this you can use window.location.hash.
e.g.: <script>alert('The hash value is: '+window.location.hash);</script>
There is JQuery Plugin "hashchange" using that you can send Ajax request when the hash is changed everytime. You can even bookmark it.
http://benalman.com/projects/jquery-hashchange-plugin/
Check out the demo at http://benalman.com/code/projects/jquery-hashchange/examples/hashchange/

Categories