POSTing a file to PHP from Python - php

I have a GAE PHP script that accepts a POSTed message consisting of $_POST['version_name'], $_POST['version_comments'] and $_FILES['userfile']['tmp_name'][0].
It runs a file_get_contents against $_FILES['userfile']['tmp_name'][0] and stores the binary away in a CloudSQL DB.
This is the end point for a PHP-driven form, so users can upload new versions (with names / comments) through a friendly GUI from their browser. It works fine.
Now I want to be able to use the same handler as the end point for a Python script. I've written this:
r = requests.post('http://handler_url_here/',
data={'version_name': "foo", 'version_comments': "bar"},
files={'userfile': open('version_archive.tar.gz', 'rb')})
version_archive.tar.gz is a non-empty file, but file_get_contents($_FILES['userfile']['tmp_name'][0]) is returning null. Uploading files is a bit tricky with GAE, so I'd prefer to not change the listener - is there some way I can make Python send its payload in the same format the listener is expecting?
$_POST['version_name'] and $_POST['version_comments'] are working as expected.

I'd start by looking at the middle-man, which in this case is the HTTP request. Keep in mind, your Python script isn't posting directly to PHP; it's making an HTTP POST request, which is then getting interpreted by PHP into the $_POST variables and whatnot.
Figure out a way to "capture" or "dump" the HTTP request that Python is sending so you can inspect its contents. (You can find a number of free tools that help you do this in various ways. Reading the HTTP request should be pretty self-explanatory if you're familiar with working with $_GET and $_POST variables in PHP.) Then send a supposedly identical request from PHP, capture the HTTP request, and determine how and why they're different.
Good luck!

Related

Parsing raw POST XML into a text field

I need to store XML data sent over HTTP POST to my server. In the log files I see that the data is successfully sent to my server. But I have no idea how to get the data.
I tried to catch them with the php://input stream like in the code below. The problem I see is that php://input is just read when the file containing the code is called.
$xml = file_get_contents("php://input");
$var_str = var_export($xml, true);
file_put_contents('api-test/test.txt', $var_str);
Is there any way to set some kind of listener/watcher to the php://input stream? Maybe PHP is the wrong technology to realize this. Is there some other way like AJAX?
The problem I see is that php://input is just read when the file containing the code is called.
Yes.
That's how PHP (in a server-side programming context) works.
The client makes an HTTP request to a URL
The server receives the HTTP request and determines that that URL is handled by a particular PHP program (typically by matching the path component of the URL to a directory and file name unless the Front Controller Pattern is being used)
The PHP program is executed and has access to data from the request
The server sends the output of the PHP program back
Is there any way to set some kind of listener/watcher to the php://input stream?
You get a new stream every time a request is made. So the typical way to watch it is to put a PHP script at the URL that the request is being made to.
Then make sure each request is made to the same URL.
(If you need to support requests being made to different URLs, then look into the Front Controller Pattern).
Maybe PHP is the wrong technology to realize this.
It's a perfectly acceptable technology for handling HTTP requests.
Is there some other way like AJAX?
Ajax is a buzzword meaning "Make an HTTP request with JavaScript". Since you are receiving the requests and not making them, Ajax isn't helpful.

Scraping dictionary using curl

for academic reasons I need to scrape a North Korean dictionary (having already informed myself about the copyright-related issues), which 'actually' should be quite simple: The website is returned by a PHP script, which just uses ascending numbers in the URLs for each dictionary entry is:
uriminzokkiri.com/uri_foreign/dic/index.php?page=1
and the last entry is located at:
uriminzokkiri.com/uri_foreign/dic/index.php?page=313372
So basically I'd assume that the easiest way to do this is writing a simple shell script where the number of entry gets incremented using a loop construction, plus checking whether a site got downloaded successfully, since the connection is not good, so that it repeats trying to download the site until it was successful (also trivial).
But then I tried to download a site containing an entry to test this, which failed. The site makes use of session cookies, so I first saved the according cookie in a file using the "-c" parameter and then invoked curl with the "-v" (verbose) and "-b" (get cookie(s) from file) parameter, which produced the following output:
curl output
These are the request and response headers as being shown by Firebug:
Request/Response headers
I also tried to pass all these request headers using the "-H" parameter, however this didn't work neither.
Someone started coding a Python-based scraper for scraping this dictionary, but if this could be realized using a simple bash script, then this looks a bit like an overkill to me.
Does anyone know why the approach I tried so far doesn't work and how this could be achieved?
Many thanks in advance and kind regards
you could put some more Http-Headers like:
Origin: witch is domain of original site you scrap.
User-Agent: witch is your client configuration witch you can get from internet.
otherwise you can get bash curl script from your browser code inspection then convert it to php code. all automated and exist online.

Send JSON from php processing to public coldfusion page

I am modifying someones existing coldfusion web app. I am adding php processing pages to do various tasks. Up to this point I have just been calling the php pages, and interacting with the web app by passing variables via the url.
Current usage:
public.cfm calls processing.php?id=69
Then processing will do what it has too, then ultimately:
header("Location: $publichome?id=$id&importantstuff=$stuff");
exit();
And the webapp will pick up where it has too. But now one of my scripts has to send a JSON object back instead of simple variables. I don't know how to get this done. I tried doing a post with cURL but that wasn't working because I need the public facing coldfusion page to take over and curl returns to the php script (I know I can echo the body of the curl result but this keeps me on the php script domain which I dont want). Is there a way to do the above header location and send an object because thats what I need - the php script to stop and the coldfusion page to be served up with the object to work with.
Do I have to create some sort of JSON service in php, that the coldfusion page will call an retrieve the result? I can also modify the coldfusion page any way I want.
You should be able to pass the json string as a URL variable, just like you are passing simpler strings, through the location header. You will need some means of json-serializing the object in php, if you haven't done that yet. There is likely a json library available to do that.
edit
Based on more information, I now have a better suggestion:
From CF, make a <cfhttp> request to your PHP code, passing whatever parameters are necessary to PHP as arguments in cfhttp. From PHP, simply "output" the JSON as the response body. After the CFHTTP call returns to CF, you'll have access to the JSON via the cfhttp.fileContent variable, which you can then run through DeserializeJSON to get back a real object. Here some sample CF code:
<cfhttp url="processing.php?id=69" method="get"></cfhttp>
<cfset importantStuff=DeserializeJSON(cfhttp.fileContent)>

How is http post method implemented?

I know want to know what happens behind the scene of a HTTP post method.
i.e browser sends a HTTP post request to a server side script in PHP (eg).
How does PHP's $_POST variable get the values from the client.
Could someone explain in details or point to a guide.
The HTTP protocol(*) specifies how the browser should send the request.
HTTP basically consists of a set of headers in plain text, separated by line feeds, followed by the data being transmitted. Inside the HTTP request, POST data is actually formatted pretty much the same as GET data; it's just in a different part of the HTTP headers.
You can use tools like Firebug or Fiddler to see exactly how the headers and data are formatted for incoming and outgoing HTTP requests. It's actually all quite simple to read, so you should be able to work it out just by looking.
Once it gets to the server, the PHP interpreter is responsible for translating the raw HTTP request data into its standard $_GET, $_POST, etc variables. This is something that PHP does for you.
Other languages (eg Perl) do not have this functionality built in, so a Perl programmer would have to have code in their program to parse the incoming request data into useful variables. Fortunately, even Perl has a standard library which can be included that does the job, so even Perl programmers don't generally have to write the code themselves any more.
The way PHP, and any other language, does it is simply string manipulation. As I said, the HTTP data is plain text and is received in simple string format, so it's just a case of breaking it down by splitting it on question mark and equal sign characters.
As PHP does it all behind the scenes, you probably don't need to worry about the exact mechanisms it uses, but the PHP source code is available if you really want to find out.
I said it's all in plain text. HTTPS, of course, is encrypted. However by the time PHP gets hold of it, the Apache server has already done the decryption, so as far as PHP is concerned it's still plain text.
(*) Before anyone pulls me up on it, yes, I know that saying "HTTP protocol" is a redundancy, like "ATM machine" or "PIN number".
The browser encodes the data according to the content-type of the form, then transmits it as the body of a POST request. PHP then picks it up and populates $_POST with the names and values (performing special handling when the name includes the characters [ and ] or .).
I'd suggest to get a capturing proxy (e.g. Fiddler) or a network capture tool (e.g. Wireshark) and watch your own browsing traffic for a while; it will give you a nice view of the issue.
Other than that, POST is rather similar to GET, except that the data is sent in the body of the request instead of the URL, and there are two ways to encode them (multipart-form-data in addition to the urlencode that's shared with GET)
Well, let's ilustrate step by step, starting with a page containing a [form action="foo.php" method="post"]
Once you click submit (or hit enter), browser will trigger an event named "submit". This event can be catched internally for processing with javascript/dom, and this is what most sites do for validation or Ajax routines.
If routines does not stop the flow with a return false, browser continues to process the post request (this process is the same as making a post with XMLHttpRequest Object).
Browser will check first method, action and content encoding, then parse inputs values to know the size of data it will send, and encode it.
Finally it send something like this (raw values):
POST /foo.php HTTP/1.1
Host: example.org
Content-Type: application/x-www-form-urlencoded
Content-Length: 7
foo=bar
This is a POST request. But note that it can send content-length and send variables in chunks. Browser and server know this can happen (this is the POST method purpose). When a server receives a POST request, it keeps listening to the browser until the content received match the informed content length.
Now the other side. Server receives the request, listen the content, parse it (foo = bar; xxx = baz), and make it available on its environment for that specific request, thus you can catch it with PHP or Python, or Java...
That's it. Ah note you can pass both GET and POST variables in the same request!
Using a [form action="foo.php?someVar=123&anotherVar=TRUE" method="post"]
Will make the browser send the request as
POST /foo.php?someVar=123&anotherVar=TRUE HTTP/1.1
Host: example.org
Content-Type: application/x-www-form-urlencoded
Content-Length: 7
foo=bar
And server when parsing this request will make the following variables available:
GET[someVar] = 123
GET[anotherVar] = TRUE
POST[foo] = bar

emulating LiveHTTPheader in server side script or javascript?

I ran into this problem when scraping sites with heavy usage of javascript to obfuscate it's data.
For example,
"a href="javascript:void(0)" onClick="grabData(23)"> VIEW DETAILS
This href attribute, reveals no information about the actual URL. You'd have to manually look and examine the grabData() javascript function to get a clue.
OR
The old school way is manually opening up Live HTTP header add on for firefox, and monitoring the POST perimeters, which reveals the actual URL being POSTed.
So i'm wondering, is there a way to capture the POST parameters in a server side script or Javscript, as Live HTTP header does, for the outgoing and incoming POST parameters? This would make even the most javscript obfuscated web pages easily scrapable.
thanks.
I'm not sure I understand the question but...
In PHP, incoming POST parameters are stored in the $_POST array, you can display them with print_r($_POST);.

Categories