Ok this is quite complicated and not even sure if it is possible. Need some insight from knowledgeable people to advise on how I should proceed.
I need to process a form on a remote site, screen scrape the results (on the fly), parse the information and display it back to the end user.
--More clearly explained by example--
1 my site is -> sitea.com
[2] the form is on -> somebodyelseswebsite.com (no DB access, but form is public)
Here's my logic:
i can replicate the form from site [2] and make an exact copy on my site1.
when the user submits the form i need some kind of object in the POST (javascript?) that will assign the users input to ... and process the form on site [2], screen scrape the results, and return the data in an array, which i can display on my site1.
key points:
The user must not be aware of the transaction with site[2].
This must happen in real-time and fast
So can this be done? If YES, How? I know about PHP cURL can I use only PHP or do I need to use something else?
--further clarification--
Yes, this can be done. cURL is one way to do it, yes. You need some pretty heavy error-checking and validation for any sort of reliability though. You'd use a cURL POST (assuming the remote host doesn't have any sort of form key, ip block, referer checking, etc.) to replicate the behavior of that form's fields. Then you'd need to scrape the return and I think that's the difficult part.
For me, I'd use a DOM Parser to get very specific. Here is a post on how to do that.
Related
There's a PHP based website that I'd like to replicate the data from.
The problem is that the website's data is only accessible via a company name search page - www.example.com/companynamesearch.php
The results are displayed under the same URL, so it does not have separate company name URLs to crawl for data.
Can anyone suggest an easy way to extract the data from the site?
Thanks
First, you need to query the data. Figure out if the data is truly on this page and the data comes in via AJAX as suggested by #JonathanM. You can use a tool like Fiddler or your browser's developer tools to monitor for this.
If you find the data comes in via AJAX, you're all set. It's probably JSON, but can be in any type so watch for that.
If the data is on this page and the page is queried by POST data, then you are going to have to make those POST requests and then parse the page. Now, don't do this yourself. Use DOMDocument to dig at the page for you. See this question for details: How do you parse and process HTML/XML in PHP?
If your chosen language is php you should look at curl's automated form submission capabilities, which will enable you to automate the internal search engine's form.
There is a useful stackoverflow answer here
fill out a form automaticly using curl and php
Or you can look at these basic tutorials to get you started:
http://phpsense.com/2007/php-curl-functions/
http://devzone.zend.com/160/using-curl-and-libcurl-with-php/
Using curl with php will save you plenty of time but be warned, if the site's owners aren't wanting you to scrape their site, you could be in for a tough time. And of course there are copyright issues to think of, etc, etc.
Have you tried searching google for site:www.example.com ? You may get a list of all pages back.
They might have submitted a sitemap or Google might have found another way.
I want to get the details form the paypal form. I redirect my clients to this form after they select a certain amount. Can i get the details of the form below?
I am not sure. Since paypal is asking for the creditcard number and all that, for security purposes it should not allow to get this form data. But again, just wondering, is it possible?
Short answer: no.
Certainly not using PHP (going by your tags here), which is server side, and this would be a javascript hack. The way that immediately leapt to mind would be to invoke Javascript in a child iframe that contained the Paypal form, but there are two immediately apparent problems with that:
I doubt Paypal would allow that page to be opened in an iframe
You can't invoke javascript in an iframe if the page in that frame is not on the same domain as the calling page.
The best way I can think of to achieve this would be to make a Greasemonkey/Chrome/whatever extension using javascript to fish the data and send it off, but then there's this: No-one will willingly install something that they know to steal credit card information on their computer. Why on earth do you want to do this?
On a related, though unhelpful note, if you are interested in trying this for a purpose that is less illegal and immoral, one thing you might want to look at is this. It shows how to do cross-domain communication using frames if you have permission to write javascript on both pages (or have found an unsanitised field to inject it with)...
For some reason I need to process PHP behind the scenes but not using AJAX (I know that might sound silly to you). I need this since I am getting the content dynamically through another page loading.
By using PHP's curl functions I can get the login page of a website inside my 1.php file. But then I use javascript to set form values and hit login and it takes me to the site url (not already localhost/1.php). So the question is: I need to somehow store the content of the page that I am redirected and retrieve it .
The impression i got was that, you have a resource intensive process, which would perform some action in background , while user still interacts whit the page.
I actually would make more sense to do this with some sort of service ( as shell script or standalone application ), but it is possible to do with php: you would need to fork [1] [2] the process. Just don' forget to check, if one such process is already running on the system.
It actually works pretty well in combination with XHR (also know as AJAX by marketing department), because you can kick off the process with a request, and then repeatedly check the status .. and then collect the data, when status is "done".
Since we're all taking stabs in the dark, here's what I think you're trying to do (let me know if I'm way off):
You have a site (let's call it userfriendly.org) and you are trying to add an interface of some kind to another site (we'lll call this site mean-corp.com). Essentially, when you load the page, you use curl to fetch some of the data from mean-corp.com so that your users can login and get some info but without having to deal with their site (maybe it's ugly, maybe it just fits really well into your site, whatever).
You are able to get to the site okay to get whatever initial data you need, but when you try to pass in the user login and password to actually get their info, it's redirecting back to the login URL for the site.
Long story short, you are trying to make a front-to-back web service for another site, but you're running into hiccups with redirects and whatnot?
Am I totally off? If not, I've made similar attempts in the past for my own nobel reasons,and I could pass along some tips as I'm sure others can.
But if I'm totally off, sorry for the distraction.
I'm looking at a domain registration site that looks like it uses jquery to process users data inputed and to register domains.
What I was wondering is if it's possible for users to be able to fill in data on a form on my website and then when the user is ready to complete payment, be taken to the actually domain registration site where all the data they typed in on my site will be posted to the domain reg site.
So basically, the users fills in a load of info on my site, AND attempts to check for domain availability on my site. Once the users has found the domain they want, they will be redirected over to the actual domain reg site where all their info will be posted.
Now I know if the domain reg site used PHP to process all the stuff, it wouldn't be a problem. But they don't use PHP.
Do you guys reckon this could be possible?
I'm not sure this would be possible in any amount of time that would make it worth it to you. Without knowing any of their back-end code, it's going to be extraordinarily difficult. Edit: I should add that I did look through some of their jQuery code and it looks as though they're using ajax .post() to submit data. Where this data goes and what responses are expected is anyone's guess, though...
That said... there are quite a few domain registrars that offer real APIs to let you do what you want... or even let you go one step further and offer the ability to register domains directly through your website. Sometimes you can set your own price, as well.
Here are links to some of these APIs:
Namecheap: http://developer.namecheap.com/docs/
GoDaddy: http://www.godaddy.com/reseller/domain-reseller-api.aspx
eNom: http://www.enom.com/resellers/Interfaceinfo.asp
I'd personally recommend NameCheap, but for the purposes of your question, any of these should do.
I can't make any promises but say you used jquery ajax to pull in the form the would have to fill out. Said form would then be on your client side so in theory I think you could use their input ids to fill out the form using javascript/jquery. All this would technically be client side. To bad that other site does not have an api for purchases.
Do you have control over the domain registration site? There are many ways you can send the user's input over to that site, but of course it has to be looking for this posted data and know how to handle it. PHP is not necessary to handle the data that is passed in. For example, if you send your info to the domain registration site via a form GET method, the info will become part of the URL, which can be accessed and parsed via javascripts window.location property.
Howdy folks, I am wanting to build a script or something to take a single row from my MySQL database and use that data to pre-populate form fields on one of multiple sites that aren't mine. What I'd like to do is to take information a user has entered on my site and when they click a link to one of the sites in my system it loads the external site with certain pre-mapped fields populated with the info they entered. But I can't seem to get my head around a way to do this, seeing as I can't add anything to these pages. Do you guys have any suggestions?
The flow you described is not possible due to cross-site scripting constraints. This post is relevant: Browser Automation and Cross Site Scripting
The closest thing I can think of is Greasemonkey, which would force the user to download the plugin from Mozilla, plus a new userscript from your website.
Another option would be reproducing the form on your own web server, and hoping the form action doesn't perform referrer checks.
i am not very sure but you can use wget and pass xml data...i.e you can build an xml string with the data you want to send across and then do a wget to the other site...hope this helps