Create new etherpad using PHP and CURL - php

I'm trying to write a simple PHP script which automatically sets up new etherpads (see http://etherpad.com/).
They don't have an API (yet) for creating new pads so I'm trying to figure if I can do things another way.
After playing around some, I found that if you append a random string to etherpad.com to a not-yet-created pad, it'll come back with a form asking if you want to create a new etherpad at that address. If you submit that form, a new pad will be created at that URL.
My thought then was I could just create a PHP script using CURL that would duplicate that form and trick etherpad into creating a new pad at whatever URL I give it. I wrote the script but so far I can't get it working. Can someone tell me what I'm doing wrong?
First, here's the HTML form on the etherpad creation page:
`
<p><tt id="padurl">http://etherpad.com/lsdjfsljfa-fdj-lsdf</tt></p>
<br/>
<p>There is no EtherPad document here. Would you like to create one?</p>
<input type="hidden" value="lsdjfsljfa-fdj-lsdf" name="padId"/>
<input type="submit" value="Create Pad" id="createPad"/>
`
Then here's my code which tries to submit the form using CURL
$ch = curl_init();
//set POST variables
$url = "http://etherpad.com/ep/pad/create?padId=ldjfal-djfa-ldkfjal";
$fields = array(
'padId'=>urlencode("ldjfal-djfa-ldkfjal"),
);
$useragent="Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)";
// set user agent
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
//url-ify the data for the POST
foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value; }
print_r($fields_string);
//open connection
$ch = curl_init();
//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,count($fields));
curl_setopt($ch,CURLOPT_POSTFIELDS,$fields_string);
//execute post
$result = curl_exec($ch);
print_r($result);
//close connection
curl_close($ch);
When I run the script, PHP reports back that everything executed correctly but etherpad doesn't create my pad. Any clues what's going on?

I have not investigated this specific site but I guess there are some important headers which are missing. Here is a very general approach that is applicable for nearly any website:
Use a network sniffer such as Wireshark to capture all connectons. Then compare the sent POST fields with yours.
An even easier way is to use Netcat. Just save the page to disk, change the form-URL to http://localhost:3333/ and run
$ nc -l -p 3333
Now open the local HTML file and fill in the fields appropriately. Immediately you will see all headers that would have been transmitted to the host.
(There are also extensions for Mozilla Firefox but in general they just slow down the browser without providing much benefit.)
Also read what I have posted on To auto fill a text area using php curl as it might help you with your realization in PHP.
By the way, you are sending the parameter "padId" via GET and POST. That is not necessary. Check what the Etherpad-form actually uses and stick with it.

My guess is that you're missing the cookies and/or the referrer. It may be checking the referrer to ensure people aren't creating pads without confirmation.
Wireshark will help, but add that to your curl and see if it works.

Here's the answer a friend helped me come up with:
They're apparently doing some cookie
validation, that's why your script
isn't working. You can find this out
by loading the new pad creation prompt
page, clearing your cookies, and then
reloading the page. It won't work.
Tricky, but effective for most casual
bots.
Here's a script that gets around the
limitation. Just insert your desired
$padId and away you go.
<?php
$padId = 'asdfjklsjfgadslkjflskj';
$ch = curl_init();
# for debugging
curl_setopt($ch, CURLOPT_HEADER, true);
# parse cookies and follow all redirects
curl_setopt($ch, CURLOPT_COOKIEFILE, '/dev/null');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
# first, post to get a cookie
curl_setopt($ch, CURLOPT_URL, 'http://etherpad.com/' . urlencode
($padId));
$result = curl_exec($ch);
echo $result;
# next, post to actually create the etherpad
curl_setopt($ch, CURLOPT_URL, 'http://etherpad.com/ep/pad/create');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'padId=' . urlencode($padId));
$result = curl_exec($ch);
echo $result;
curl_close($ch);

To create a file directly from HTML or TEXT
Use the setText or setHTML API endpoint. http://etherpad.org/doc/v1.5.0/#index_sethtml_padid_html
To easily do this use the Etherpad PHP Client https://github.com/TomNomNom/etherpad-lite-client
To post from a file
This feature is provided by an Etherpad plugin. To enable it...
Install the Etherpad ep_post_data plugin by typing npm install ep_post_data on your Etherpad instance.
At your client machine CLI type: curl -X POST -d #yourfile.here http://youretherpad/post
Replace yourfile.here with your file
Replace the url with the Etherpad instance you want to work to.
Source: http://blog.etherpad.org/2014/12/17/post-to-etherpad-with-this-simple-plugin/

Related

Crawl website with asynchonous content behind a login with PHP

I have a website in my local network. It hidden behind a login. I want my PHP code to get into this website and copy content of it. The content isn't posted right away, it is loaded only after 1-3 seconds.
I already figured out how to log in and copy website via cURL. But it shows only what was posted right away, the content that I'm aiming for is added after this 1-3 seconds.
<?php
$url = "http://#192.168.1.101/cgi-bin/minerStatus.cgi";
$username = 'User';
$password = 'Password';
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_HTTPHEADER,array('User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0'));
curl_setopt($ch, CURLOPT_USERPWD, $username . ":" . $password);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
if(curl_errno($ch)){
//If an error occured, throw an Exception.
throw new Exception(curl_error($ch));
}
echo $response;
?>
The output are empty tables. And I'm expecting them to be filled with data that shows up a bit later on this website.
The problem is that curl simply makes an HTTP-request and returns the response body to you. The table on the target page is probably populated asynchronously using JavaScript. You have two options here:
Find out what resources are requested and use curl to get them directly. For this open the page in your browser and check the developer tools for outgoing AJAX requests. Once you figured out what file is actually loaded there simply request that instead of your $url.
Use an emulated / headless browser to execute JavaScript. If for any reason the first option does not work for you, you could use a headless browser to simulate a real user navigating the site. This allows for full JavaScript capabilities. For PHP there is the great Symfony/Panther library that uses facebooks webdriver under the hood and works really well. It will be more work than the first solution so try that first.

PHP Redirect a file from another server to end user

I want to be able to allow user to enter in variable URL which file they would like to download from remote server URL e.g /download.php?url=fvr_anim_foxintro_V4_01.jpg
<?php
$url = $_GET['url'];
header("Location: http://fvr.homestead.com/files/animation/" . $url);
?>
The above is purely an example I grabbed from google images. The problem is I do not want the end user to be allowed to see where the file is originally coming from so it would need to get the file download to the server and the server passes it along to the end user. Is there a method of doing this?
I find many examples for files hosted on the server but no examples for serving files hosted on a remote server. In other words I would be passing them along. The files would be quite large (up to 100MB)
Thanks in advance!
You can use cURL for this:
<?php
$url = "http://share.meebo.com/content/katy_perry/wallpapers/3.jpg";
$ch = curl_init();
$timeout = 0;
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
// Getting binary data
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
$image = curl_exec($ch);
curl_close($ch);
// output to browser
header("Content-type: image/jpeg");
echo $image;
?>
Source: http://forums.phpfreaks.com/topic/120308-solved-curl-get-image/
Of course, this example is just for an image (as you've suggested) but you can use cURL for all kinds of remote data retrieval via HTTP GET, POST, PUT, DELETE, etc. Search around the web for "php curl" and you'll find an endless supply of information.
The ideal solution would be to use PHP's cURL Library, but if you're using shared hosting keep in mind this library may be disabled.
Assuming you can use cURL, you simply echo the Content-type header with the appropriate MIME Type and echo the results from curl_exec().
To get a basic idea of how to use the cURL library, look at the example under the curl_init() function.

CURL php code returns http_code=0 and empty reply from server message

I have the same code running on multiple sites/servers. 2 days ago the code started returning http_code = 0 and the error message "empty reply from server" on one of the servers.
Can anyone shed any light as to why a particular server would be working one day, then not working the next? I have submitted a ticket to the ISP explaining the issue but they cannot seem to find what is wrong (yet).
I guess the question really is, what would/could change on a server to stop this from working?
What is interesting tho is the url I am referencing doesnt get touched on the server returning the error. If I change the url to point to something that doesnt exist, the same error is returned. So it appears that CURL POST references in total are being rejected by the server. I currently have other CURL scripts that are hitting these problem sites that are still working, but they do not have POST options in them.
The issue is definitely related to CURL POST requests on this server, and they are being rejected pretty much immediately.
On the server in question I have 15+ separate accounts and every one of them returns the same result so I dont think its anything I have changed as I know I havent made any wholesale changes to ALL the sites at the time when this issue arose. Of the 6 other sites I have hosted elsewhere, everything is still working fine with exactly the same code.
I have tried various combinations/changes to options from posts I have read but nothing has really made a difference, the working sites still work and the non-working sites still dont.
function sendWSRequest($url, $xml) {
// $headers[] = 'Content-Type: application/xml; charset=utf-8';
$headers[] = 'Content-Type: text/xml; charset=utf-8';
$headers[] = 'Content-Length: ' . strlen($xml);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, true);
// curl_setopt($ch, CURLINFO_HEADER_OUT, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POSTFIELDS, $xml);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
// curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
$result = curl_exec($ch);
if($result===false) {
print 'error with curl - '.curl_error($ch).'<br />';
}
$info = curl_getinfo($ch);
curl_close($ch);
return $result;
}
Any help would be greatly appreciated.
EDIT
To summarise based on further investigations, when the script errors, nothing registers in the server access logs. So it appears that CURL requests containing POST options are being rejected before access is granted/logged...
Cheers
Greg J
I know this is an old thread, but I found a solution that may save someone else a headache:
I just began encountering this exact problem with a web site hosted at GoDaddy which was working until recently. To investigate the problem I created an HTML page with a form containing the same fields being submitted in the POST data via cURL.
The browser-submitted HTML form worked while the cURL POST resulted in the Empty reply from server error. So I examined the difference between the headers submitted by the browser and those submitted by cURL using the PHP apache_request_headers() function on my development system where both the cURL and browser submissions worked.
As soon as I added the "User-Agent" header submitted by my browser to the cURL POST, the problem site worked as expected instead of returning an empty reply:
CURLOPT_HTTPHEADER =>
array("User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0")
I did not experiment with other/simpler User-Agent headers since this quick fix solved my problem.
According to the PHP manual, upload should be urlencoded:
CURLOPT_POSTFIELDS The full data to post in a HTTP "POST" operation.
[...] This parameter can either be
passed as a urlencoded string like 'para1=val1&para2=val2&...' or as
an array with the field name as key and field data as value. If value
is an array, the Content-Type header will be set to
multipart/form-data. As of PHP 5.2.0, value must be an array if files
are passed to this option with the # prefix. As of PHP 5.5.0, the #
prefix is deprecated and files can be sent using CURLFile.
So you might try with
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'xml=' . urlencode($xml));
and see what happens. Or, anyway, start with an empty or very simple FIELD to see if it at least arrives to the destination server.
Update
I've checked this setup on a test machine and it works. The problem is then likely not to be PHP or cURL side at all, at this point. Can you request a list of software/hardware updates on that machine and network in the last days?
Otherwise, I'd try to capture outgoing traffic so as to determine whether the request leaves the server (and the problem is in between, e.g. a misconfigured firewall: hence my inclusion of "hardware" in the change list), or doesn't leave the server at all. In this latter case the culprits could be:
updates to cURL library
updates to PHP cURL module and/or PHP binaries
updates to "software" firewall rules
updates to ancillary network libraries (unlikely; they should be HTTP agnostic and not differentiate a POST from, say, a GET or HEAD)
OK, as it turns out, a rather reluctant host recompiled Apache2 and PHP which has resolved the issue.
The host claims (their opening statement to my support ticket) that no updates to either Apache2 or PHP had been performed around the time the issue occurred.
the behavior was as such that it wasnt even acknowledging a CURL request that contained the POST commands. The target URL was never reached.
Thank you so much to all who provided their advice. Particularly Isemi who has gone to great lengths to find a resolution.

Scrape data from AJAXREQUEST

I would like to crab data from a website that uses an ajax request to load new data from the server into a DIV.
When I click on the button of the website, that will load new data into the website, I can see that the browser does only 1 POST request with the following post string:
AJAXREQUEST=_viewRoot&j_id376=j_id376&javax.faces.ViewState=j_id3&j_id376%3Aj_id382=j_id376%3Aj_id382&valueChanged=false&AJAX%3AEVENTS_COUNT=1&
When I do the above post request using php curl I don't get any useful data.
Does someone know how to crab data for this kind of request?
UPDATE1:
This is what I use in php:
$ch = curl_init ('http://www.website.com');
$post_string = 'AJAXREQUEST=_viewRoot&j_id376=j_id376&javax.faces.ViewState=j_id3&j_id376%3Aj_id382=j_id376%3Aj_id382&valueChanged=false&AJAX%3AEVENTS_COUNT=1&';
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);
$output = curl_exec ($ch);
I don't get any results, also no errors or messages.
Your problem probably isn't with your PHP code, its more likely with what you are actually sending to the server. I'm assuming you listed website.com as a place holder for whatever service you are trying to interact with, but since you haven't listed any of the information as to where your sending the request or what your getting back I'm assuming that what your posting to the server is simply being ignored because what your sending is invalid, or incomplete, or requires further POST/GET requests. Another possibility is that your attempting to POST to a service that requires an authenticated session (the POST variables you listed could include some sort of token to identify the session) which you have not established.
I would recommend that you first test your code on a simpler "controlled test case". Setup a basic web form that returns true or something when you POST a value to it. Test your code with the simpler case first to make sure your POST code works.
Then using a debugging tool such as LiveHTTPHeaders or Firebug record the entire POST/GET request interaction with the server. It might be a good idea to first try to "replay" this interaction with a debugging tool to prove that your methodology works. Then once you know exactly what you need to do from a high level, repeat this process in your PHP code.
There is not much other advice anyone can give you with the information you have given us.

PHP curl - posting asp.net viewstate value

I have the following code to login into an external site application (asp.net app) from a local site login form (written in php):
<?php
$curl_connection = curl_init('www.external.com/login.aspx');
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT,
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);
// Post data array
$post_data['LoginControl$UserName'] = 'ExampleUName';
$post_data['LoginControl$Password'] = 'ExamplePWord';
// Add form fields into an array to get ready to post
foreach ($post_data as $key => $value)
{
$post_items[] = $key . '=' . $value;
}
$post_string = implode ('&', $post_items);
// Tell cURL which string to post
curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_string);
// Execute and post
$result = curl_exec($curl_connection);
?>
I get directed to the login form of the external site instead of being directed to the application logged in. I think the problem is that I need to pass the viewstate values through, but i'm not sure how to go about doing that?
I don't have control over the external application. But we want users to be able to login to the application through our website, to maintain branding etc.
I've posted a couple of other threads recently about the use of php cURL, but I'm at the stage now where I think the viewstate is the problem ...
Thanks, Mark.
This seems to be a real problem when trying to scrape the asp.net pages.
The pages contain a hidden field named "__VIEWSTATE" which contains a base64 encoded set of va;ues containing some or all of the page state when the page was sent. It usually also contains the SHA1 of the viewstate.
What this means is that your post must contain everything in the _VIEWSTATE or it will fail.
I have been able to post a simple login page that has only 2 fields but not a more complex page in which the author has chosen to put the entire page state in the viewstate.
As yet I have not been able to come up with a solution.
Change:
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);
To:
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, false);
You also need to set up a cookie file, take a look at CURLOPT_COOKIEFILE
CURLOPT_COOKIEFILE:
The name of the file containing the cookie data. The cookie file can be in Netscape format, or just plain HTTP-style headers dumped into a file.
CURLOPT_COOKIE:
The contents of the "Cookie: " header to be used in the HTTP request. Note that multiple cookies are separated with a semicolon followed by a space (e.g., "fruit=apple; colour=red")
CURLOPT_COOKIEJAR:
he name of a file to save all internal cookies to when the connection closes.
#see http://www.php.net/manual/en/function.curl-setopt.php
curl_setopt($curl_connection, CURLOPT_COOKIEFILE, 'cookiefile.txt');
curl_setopt($curl_connection, CURLOPT_COOKIEJAR, 'cookiefile.txt');
Don't expect it to work without encoding the __VIEWSTATE string in php using
rawurlencode($viewstate);
I've encountered the same problem recently, so I just leave my way to go about it here, in case someone else stumbles on this thread looking for an answer too.
I solved this by preceding every POST request with a GET request to the same url, and scraping all the input fields into an array of key-value pairs out of the response from that GET. Then I replaced some values in that array (login field values, for example), and sent the whole thing back in the subsequent POST. This way my POST request contained all the valid __VIEWSTATE, __EVENTVALIDATOR and yada-yada data generated for that particular url too.
This way the site allowed me to log in and visit subdomains normally.

Categories