How to download information from a website - php

I'm trying to automatically download information from a website based on a few parameters. Essentially I want to specify the parameters of a search and have the function automatically navigate to the appropriate website and download the file. Note that all of the files are excel file, usually .csv.
Here's the website:http://comtrade.un.org/db/
NOTE: This websites address will be updated depending on the search, so, for instance, if you search trade from the united states to iran (The rest of the parameters are unspecified), the result is:
http://comtrade.un.org/db/dqBasicQueryResults.aspx?px=HS&cc=TOTAL&r=364&p=842&rg=1&y=2010,2009,2008,2007,2006&so=8
More on this here:
http://unstats.un.org/unsd/tradekb/Knowledgebase/Data-Extraction-Using-Comtrade-Web-Service
Look under web service methods and parameters
Two Question:
1) How can I do this?
2) What is the best language to do this in?

There is just no "best language". You can do this by any language with HTTP access availability, this could be either PHP, Java, RoR, Perl, Python...
On the link you posted, you can read they are offering REST service for accessing the data in XML.
In PHP, you would first have to download the file using appropriate URL:
$xml = file_get_contents("http://comtrade.un.org/ws/...");
Than use PHP's XML functions to parse the file.
I'm not sure about their license of data usage - you might not be legally permitted to automatically download data from here.
UPDATE
You cannot directly download the files found in the search results (through PHP e.g.) - so you HAVE to use the REST access, but some parts of it are obviously accessible only if UN allow you to do so. If you would try to download directly the "excel" (in fact CSV) files, you will end up with error like this: http://comtrade.un.org/db/dqBasicQueryResultsd.aspx?action=csv&px=HS&cc=TOTAL&r=364&p=842&rg=1&y=2010,2009,2008,2007,2006&so=8. You can spoof the HTTP_REFERER value, but you will break the terms of the service.

In PHP, use file_get_contents("http://............");
Plug in whatever URL and GET parameters you want, and you instantly have the data, in this case the CSV, which you can then process.

Related

Could I know what has been changed lately in the file?

I'm doing some development and my company does use excel to edit database (I don't agree but, hey... hate the game, not the player! :D ). So, to avoid some encoding problems, I aiming to put it to Google Drive because it's very unlikely to change this sort of things the Microsoft insists in change every d**m version.
What I'm trying to do is: receive a file from Google Drive with only the fields that has been lately manipulated (or even receive the entire file, but somehow I could know the last changes), put it on a variable and manipulate it. I don't have the need to edit de excel file itself. It's going to be on Google Drive's interface. I only need to read it.
NOTE 1: I was using this doc (https://developers.google.com/drive/manage-changes) , but it appears that the API only says if either the file has been changed or not. I would be very nice if I could know what has been changed in the file. It would avoid me a overload in the database (I'll have to re-write the ENTIRE DATABASE).
It's impossible to retrieve field changes, but you can retrieve a change history by listing changes.
https://developers.google.com/drive/v2/reference/changes
You probably want to check Revision.
Google Drive provides file revision and you can get list of changes using this API

Identify a file that contains a particular string in PHP/SQL site

By using the inspect element feature of Chrome I have identified a string of text that needs to be altered to lower case.
Though the string appears on all the pages in the site, I am not sure which file to edit.
The website is a CMS based on PHP and SQL - I am not so familiar with these programs.
I have searched through the files manually and cannot find the string.
Is there a way to search and identify the file I need using, for example, the inspect element feature on browsers or in FTP tool such as Filezilla?
Check if you have a layout page of any kind in your CMS. If you do, then most probably either in that file, or in the footer include file you will find either the javascript for google analytics, or a js include file for the same.
Try doing a site search for 'UA-34035531-1' (which is your google analytics user key) and see if it returns anything. If you find it, what you need would be two lines under it.
Usually people do not put analytics code in DB, so there is a bigger chance you will find it in one of the files, which most probably is included/embedded in a layout file of some sort, as you need it across all pages in the site

How to use function call in external PHP url

I understand that within same folder, I can use include() function for external PHP file, but now I would like to call the function in another PHP file which located in another URL.
For example, my live website (liveexample.com/table.php) has drop-down list and table, but without data.
My another PHP file (dataexample.com/data.php) is connected to database and process to extracting data out. But, it is in another server.
I need to make my data on [dataexample.com/data.php] delivers to [liveexample.com/table.php] and let the looping to draw table with data out on [liveexample.com/table.php] page.
Anyone has idea to design this method of delivering data from another server to another by using function call in PHP?
Or any other better solution to deliver my data between two different servers such as make the data record set into array and send to [liveexample.com/table.php]?
Please give me advise or consultation. Appreciate much!
I think SOAP webservice would be perfect for you to attain what you want but if possible just copy the same codes you have from the separate server.
If you make [dataexample.com/data.php] output your data as XML, then you can use it as a web service. What that means is, you can take that XML output (by sending a request the the data URL), and then parse it to load the data. This way, you can use that service any way you want. One way would be like you wanted, other examples would be via AJAX, or Flash etc.
So here are a few topics worth looking into:
using PHP for web services: http://wso2.org/library/3032
parsing XML data: http://www.w3schools.com/php/php_xml_simplexml.asp
I hope this will give a pretty good idea of how to achieve what you want to accomplish, because there a few options you can go by. Like Cristopher said, SOAP is one of them.
Have a great day.

REST: Using PUT to update with a file upload

I'm coding an API and got stuck on the UPDATE part of things. From what I've read about REST the update operation should be exposed by using HTTP PUT.
Ok, PUT gives me just a stream of data. At least in PHP the decoding of this data is my responsibility. So how do I mix string data and file upload and use PUT? I know I can do it in POST but I'm trying to do it the RESTful way.
Should I use multipart/form-data and is that portable for PUT (I mean is it easy to send this kind of request in different languages)? I'm trying to figure out the proper way to do this. Again, if I use multipart/form-data I'm responsible for the parsing so there might be some errors or performance degradation. Can you suggest a parser if this multipart/... is the way to do what I'm asking?
Thanks
General rule of PUT is that is idempotent
Calling 2x PUT /user/{userId}/files/foo.txt ends up in the same state, with the 2nd call you would simply override the foo.txt. You are 'setting' things.
Calling 2x POST /user/{userId}/files would end up in two different files. You are 'adding' things.
Therefore I would use PUT if you want to write to a dedicated target. What kind of files do you want to upload. E.g. if it is a picture-upload I would use POST (where you would get the target url inside response). If you are designing a kind of file-storage for a user I would use PUT, because most likely users want to write (set) to a certain location (like you would on a ordinary file-system).
Maybe you have more details/requirements for a concrete case?
What kind of data are you attempting to PUT? Remember that PUT is a directed publishing method. The client sends data to the server and essentially says "PUT this file into /home/sites/.../myfile.txt".
Useful for when you're publishing data to a site and are creating a new page. Not so useful if it's a standard file upload form ("Upload an avatar image here!"). You don't want to allow potentially malicious users to specify where an uploaded file should go.
That's when you use POST, which translates into "here's a file, it's called myfile.txt, do what you want with it".

Best Practice: Legitimate Cross-Site Scripting

While cross-site scripting is generally regarded as negative, I've run into several situations where it's necessary.
I was recently working within the confines of a very limiting content management system. I needed to include database code within the page, but the hosting server didn't have anything usable available. I set up a couple bare-bones scripts on my own server, originally thinking that I could use AJAX to import the contents of my scripts directly into the template of the CMS (thus retaining dynamic images, menu items, CSS, etc.). I was wrong.
Due to the limitations of XMLHttpRequest objects, it's not possible to grab content from a different domain. So I thought iFrame - even though I'm not a fan of frames, I thought that I could create a frame that matched the width and height of the content so that it would appear native. Again, I was blocked by cross-site scripting "protections." While I could indeed load a remote file into the iFrame, I couldn't execute JavaScript to modify its size on either the host page or inside the loaded page.
In this particular scenario, I wasn't able to point a subdomain to my server. I also couldn't create a script on the CMS server that could proxy content from my server, so my last thought was to use a remote JavaScript.
A remote JavaScript works. It breaks when the user has JavaScript disabled, which is a downside; but it works. The "problem" I was having with using a remote JavaScript was that I had to use the JS function document.write() to output any content. Any output that isn't JS causes script errors. In addition to using document.write() for every line, you also have to ensure that the content is escaped - or else you end up with more script errors.
My solution was as follows:
My script received a GET parameter ("page") and then looked for the file ({$page}.php), and read the contents into a variable. However, I had to use awkward buffering techniques in order to actually execute the included scripts (for things like database interaction) then strip the final content of all line break characters (\n) followed by escaping all required characters. The end result is that my original script (which outputs JavaScript) accesses seemingly "standard" scripts on my server and converts their standard output to JavaScript for displaying within the CMS template.
While this solution works, it seems like there may be a better way to accomplish the same thing. What is the best way to make cross-site scripting work specifically for the purpose of including content from a completely different domain?
You've got three choices:
Create a server side proxy script.
Create a remote script to read in remote dynamic HTML. Use a library like jQuery to make this easier. You can use the load function to inject HTML where needed. EDIT What I originally meant for example # 2 was utilizing JSONP, which requires the server side script to recognize the "callback=?" param.
Use a client side Flash proxy and setup a crossdomain.xml file on your server's web root.
Personally, I would call to that other domain on the server and get and parse the data there for use in your page. That way you avoid any problems and you get the power of a server-side language/platform for getting and parsing the data.
Not sure if that would work for your specific scenario...hard to know even with your verbose description...
You could try easyXDM, by including very little code, you can pass data or method calls between documents of different domains.
I've come across that YDN server side proxy script before. It says it's built to work with Yahoo's Search APIs.
Will it work with any domain, if you simply trim the Yahoo API code out? Or do you need to replace it with the domain you want it to work with?
iframe remote content can be accessed by local javascript.
The remote server just have to set the document.domain of the page.
Eg:
Site A contain an iframe with src='Site B/home.php'
home.php looks like this :
[php stuff]...[/php]
[script type='text/javascript']document.domain='Site A'[/script]

Categories