Scrape data from AJAXREQUEST - php

I would like to crab data from a website that uses an ajax request to load new data from the server into a DIV.
When I click on the button of the website, that will load new data into the website, I can see that the browser does only 1 POST request with the following post string:
AJAXREQUEST=_viewRoot&j_id376=j_id376&javax.faces.ViewState=j_id3&j_id376%3Aj_id382=j_id376%3Aj_id382&valueChanged=false&AJAX%3AEVENTS_COUNT=1&
When I do the above post request using php curl I don't get any useful data.
Does someone know how to crab data for this kind of request?
UPDATE1:
This is what I use in php:
$ch = curl_init ('http://www.website.com');
$post_string = 'AJAXREQUEST=_viewRoot&j_id376=j_id376&javax.faces.ViewState=j_id3&j_id376%3Aj_id382=j_id376%3Aj_id382&valueChanged=false&AJAX%3AEVENTS_COUNT=1&';
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);
$output = curl_exec ($ch);
I don't get any results, also no errors or messages.

Your problem probably isn't with your PHP code, its more likely with what you are actually sending to the server. I'm assuming you listed website.com as a place holder for whatever service you are trying to interact with, but since you haven't listed any of the information as to where your sending the request or what your getting back I'm assuming that what your posting to the server is simply being ignored because what your sending is invalid, or incomplete, or requires further POST/GET requests. Another possibility is that your attempting to POST to a service that requires an authenticated session (the POST variables you listed could include some sort of token to identify the session) which you have not established.
I would recommend that you first test your code on a simpler "controlled test case". Setup a basic web form that returns true or something when you POST a value to it. Test your code with the simpler case first to make sure your POST code works.
Then using a debugging tool such as LiveHTTPHeaders or Firebug record the entire POST/GET request interaction with the server. It might be a good idea to first try to "replay" this interaction with a debugging tool to prove that your methodology works. Then once you know exactly what you need to do from a high level, repeat this process in your PHP code.
There is not much other advice anyone can give you with the information you have given us.

Related

Complicated: 4 consecutive curls with XML in between

I've a tricky question on how to deal with consecutive curls in PHP. I have this incredibile data flow:
example.com/one.php post data via curl to another.com/two.php
another.com/two.php post data via curl to another.com/three.php
another.com/three.php responds me with XML (or JSON) back to another.com/two.php
another.php/two.php transforms the XML into a php array and then into a query string that i post via curl back to the origin example.com/one.php
It works. If you are wondering why i have this insane data flow it's due to the fact that another.com/three.php is an obfuscated file with Ioncube. I can't edit it but i have to add some checks before i can send data to him. Don't waste time trying to figure out how i can make it in an alternative way because there isn't one (trust me).
On example.com/one.php there's a form where users fill in data. When they press "Submit" they remain on this page while "silently" i make 1->2->3->4 to get their response (the $_POST of step 4) then can save it into example.com/log.txt. Again it works.
Now my question is: how can i display the $_POST response (which is the same i get in log.txt file) in example.com/one.php? I mean this is what i have in step 4.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, example.com/one.php);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $query_string);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$xml = curl_exec($ch);
$_POST arrives on example.com/one.php but of course users and this curl are on two different level. I tryed playing with file_get_contents(), sleep() and CURLOPT_RETURNTRANSFER with no success. What's the answer this this question? I would go for sessions or ob_start() but i'm not sure of it.

Using CURL - Post and redirect help

I've been banging my head against a wall for a few hours now - and it's probably something really obvious I've missed!
I'm trying to connect to a payment service provider (PSP) using CURL, post data and follow the post so the user actually ends up on the PSP's site.
Using the following:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://psp.com/theirpage');
curl_setopt($ch, CURLOPT_REFERER, "http://mysite.com/mypage");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$params);
curl_setopt($ch, CURLOPT_POST, 1);
$result=curl_exec($ch);
curl_close($ch);
This successfully connects, verifies the data I've passed, but instead of redirecting the user to the PSP, it just loads the HTML on my site. Safe mode is off, and open_basedir is blank.
What am I doing wrong?
CURL would do an internal redirect and it wont have any effect on the user viewing your curl script. Keep in mind that the payment was made by your server NOT the users computer, hence expecting the session to work for the user is incorrect. cURL 'is the browser'.
If you just want a redirect after payment is made via cURL, you will have to do it via header() or by using some JS like window.location.
The curl request is being made from your server, and as such your server is receiving the response page. There's no way to initiate the request from the server and have the client receive the response. Either return the HTML to the user from your site (as you're doing), or make the request from the client's browser using Javascript. Hope that helps

Trying to AVOID an ASP.NET session using cURL

I'm using a web-service from a provider who is being a little too helpful in anticipating my needs. They have given me a HTML snippet to paste on my website, for users to click on to trigger their services. I'd prefer to script this process, so I've got a php script which posts a cURL request to the same url, as appropriate. However, this provider is keeping tabs on my session, and interprets each new request as an update of the first one, rather than each being a unique request.
I've contacted the provider regarding my issue, and they've gone so far as to inform me that their system is working as intended, and that it's impossible for me to avoid using the same ASP.NET session for each subsequent cURL request. While my favored option would be to switch to a different vendor, that doesn't appear to be an option right now. Is there a reliable way to get a new ASP.NET session with each cURL request?
I've tried the following set of CURLOPT's, to no avail:
//initialize curl
$ch = curl_init($url);
//build a string out of the post_vars
$post_str = http_build_query($post_vars);
//set the necessary curl options
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_str);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIESESSION, 1);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_FORBID_REUSE, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "UZ_".uniqid());
curl_setopt($ch, CURLOPT_REFERER, CURRENT_SITE_URL."index.php?newsession=".uniqid());
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Pragma: no-cache", "Cache-Control: no-cache"));
//execute the call to the backend script, retrieve the results
$xmlstr = curl_exec($ch);
If cURL isn't helping much, why not try other methods to call the services from your script, like php's file() function, or file_get_contents().
If you see do not see any difference at all, then the service provider might be using your ip to track your requests. Try using some proxy for a test.
Normal Asp.net session is tracked by a cookie called ASP.NET_SessionId. This cookie is sent within the response to your first request. So as long as your curl requests don't send back this asp.net cookie, each of your requests will have no connection to each other. Use the curl -c option to see what cookies are flying in-between you and them. Overriding this cookie with a cookie file should work if you confirm that it is normal asp.net session being used here.
It is quite poor for a service to use session (http has much cleaner ways of maintaining state which ReST exploits) so I wouldn't completely rule out the vendor switch option.
Well given the options you are using, it seems you have covered your basics. Can you find out how their sessions are setup?
If you know how they setup a session, IE what they use (if it is IP or what not) and then you can figure out a work around. Another option is trying to set the cookies in a different cookie file:
CURLOPT_COOKIEFILE - The name of the file containing the cookie data. The cookie file can be in Netscape format, or just plain HTTP-style headers dumped into a file.
But if all they do is check cookies your current code should work. If you can figure out what the cookie's name is, you can pass a custom cookie that is blank with the request to see if that works. But if you can get information out of them on how their session's work, that would be best.
use these two line to handle the session:
curl_setopt($ch, CURLOPT_COOKIEJAR, "path/to/cookies.txt"); // cookies.txt should be writable
curl_setopt($ch, CURLOPT_COOKIEFILE, "path/to/cookies.txt");

Is this the best way to make an API request using PHP CURL?

I have a site that has a simple API which can be used via http. I wish to make use of the API and submit data about 1000-1500 times at one time. Here is their API: http://api.jum.name/
I have constructed the URL to make a submission but now I am wondering what is the best way to make these 1000-1500 API GET requests? Here is the PHP CURL implementation I was thinking of:
$add = 'http://www.mysite.com/3rdparty/API/api.php?fn=post&username=test&password=tester&url=http://google.com&category=21&title=story a&content=content text&tags=Season,news';
curl_setopt ($ch, CURLOPT_URL, "$add");
curl_setopt ($ch, CURLOPT_POST, 0);
curl_setopt ($ch, CURLOPT_COOKIEFILE, 'files/cookie.txt');
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
$postdata = curl_exec ($ch);
Shall I close the CURL connection every time I make a submission? Can I re-write the above in a better way that will make these 1000-1500 submissions quicker?
Thanks all
If you have access to php 5.2+ I would highly recommend php's curl_multi.
This allows you to process several curl requests in parallel, which in this case would definitely come in handy.
Related documentation : http://us3.php.net/manual/en/ref.curl.php
An example usage : http://www.somacon.com/p537.php
PHP's curl, by default, reuses a connection for multiple calls to curl_exec().
So in this case, you just ruse the curl handle, you got by curl_init and if the URL matches between calls to curl_exec(), it will send a "Connection: keep-alive" header and reuse the connection.
Do not close the connection and do not set CURLOPT_FORBID_REUSE
also see here:
Persistent/keepalive HTTP with the PHP Curl library?

PHP / Curl: HEAD Request takes a long time on some sites

I have simple code that does a head request for a URL and then prints the response headers. I've noticed that on some sites, this can take a long time to complete.
For example, requesting http://www.arstechnica.com takes about two minutes. I've tried the same request using another web site that does the same basic task, and it comes back immediately. So there must be something I have set incorrectly that's causing this delay.
Here's the code I have:
$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
$content = curl_exec ($ch);
curl_close ($ch);
Here's a link to the web site that does the same function: http://www.seoconsultants.com/tools/headers.asp
The code above, at least on my server, takes two minutes to retrieve www.arstechnica.com, but the service at the link above returns it right away.
What am I missing?
Try simplifying it a little bit:
print htmlentities(file_get_contents("http://www.arstechnica.com"));
The above outputs instantly on my webserver. If it doesn't on yours, there's a good chance your web host has some kind of setting in place to throttle these kind of requests.
EDIT:
Since the above happens instantly for you, try setting this curl setting on your original code:
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
Using the tool you posted, I noticed that http://www.arstechnica.com has a 301 header sent for any request sent to it. It is possible that cURL is getting this and not following the new Location specified to it, thus causing your script to hang.
SECOND EDIT:
Curiously enough, trying the same code you have above was making my webserver hang too. I replaced this code:
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
With this:
curl_setopt($ch, CURLOPT_NOBODY, true);
Which is the way the manual recommends you do a HEAD request. It made it work instantly.
You have to remember that HEAD is only a suggestion to the web server. For HEAD to do the right thing it often takes some explicit effort on the part of the admins. If you HEAD a static file Apache (or whatever your webserver is) will often step in an do the right thing. If you HEAD a dynamic page, the default for most setups is to execute the GET path, collect all the results, and just send back the headers without the content. If that application is in a 3 (or more) tier setup, that call could potentially be very expensive and needless for a HEAD context. For instance, on a Java servlet, by default doHead() just calls doGet(). To do something a little smarter for the application the developer would have to explicitly implement doHead() (and more often than not, they will not).
I encountered an app from a fortune 100 company that is used for downloading several hundred megabytes of pricing information. We'd check for updates to that data by executing HEAD requests fairly regularly until the modified date changed. It turns out that this request would actually make back end calls to generate this list every time we made the request which involved gigabytes of data on their back end and xfer it between several internal servers. They weren't terribly happy with us but once we explained the use case they quickly came up with an alternate solution. If they had implemented HEAD, rather than relying on their web server to fake it, it would not have been an issue.
If my memory doesn't fails me doing a HEAD request in CURL changes the HTTP protocol version to 1.0 (which is slow and probably the guilty part here) try changing that to:
$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); // ADD THIS
$content = curl_exec ($ch);
curl_close ($ch);
I used the below function to find out the redirected URL.
$head = get_headers($url, 1);
The second argument makes it return an array with keys. For e.g. the below will give the Location value.
$head["Location"]
http://php.net/manual/en/function.get-headers.php
This:
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
I wasn't trying to get headers.
I was just trying to make the page load of some data not take 2 minutes similar to described above.
That magical little options has dropped it down to 2 seconds.

Categories