Can PHP analyze another web page? - php

I'm making a search engine that (in theory) analyzes online encyclopedias to get answers to a user's question from a form. However, I want to know if I'm wasting my time with the PHP. If I am, what language would be best suited to this task? If I'm not, what function in PHP would allow me to do this? Thanks!

PHP works as well as anything else. If you want to read data off of another webpage, you'll probably want to use cURL, which is built in to PHP.
All of the requisite pieces are there: PHP does fine with processing text and HTML. If you already know PHP, it's best to stick with what you know.

This is easy enough to do with PHP. If the sites you are getting the data from are valid xhtml it will be extremely easy to process the page and extract the data using the simplexml extension.

Related

How do you process invalid HTML in PHP?

I've seen this question, which is very nice and informative. However, it doesn't deal with a rather common scenario.
Say I need to scrape a multitude of websites (or even pages in the same domain), but the author of that website didn't care enough for his code, and has some seriously malformed code "that kinda works". I need to take information from that website.
How do I do it in this case? Ideally without going í͞ń̡͢͡s̶̢̛á̢̕͘ń̵͢҉e̶̸̢̛.
Is it possible? Do I have to revert to RegExp?
You need a DOM Parser. Php has one. And then there are some alternatives (and more... just google for them). You can even run the "garbled HTML" trhu HTML Purifier if you want.
I don't know how your are scraping the site, but working with RegExp will allow you to add many conditions to the scrap code. This may take time, depending on the number of footprints and your RegExp skills.
You may also use Tidy on the site HTML, but this will lead to strange results as well IMO.
Does it have to be PHP? Python has a wonderful library called Beautiful Soup ("You didn't write that awful page. You're just trying to get some data out of it"). From my experience I'd recommend it so much that I'd say if you have the option, write a quick Python script to parse your nodes into a clean file that your PHP can pick up.
(Know PHP is in the title & this doesn't directly answer your question. Apologies if you don't have the option of (or dislike) Python, just wanted to present a good alternative.)

Rendering a page in PHP: How?

This may be a inappropriate question for SO, but I thought lets see :)
I'm writing a website in php. Every pageload may have 10-20 DB requests.
Using the result of the DB queries I need to generate a page.
The page would contain a topic (should be image or text) followed by comments. There could be mutiple topics like this.
Currently, I'm creating a string using the DB result and sending it to the browser.
When browser receives the string (as an ajax response), it parses using split functions and creates the HTML dynamically.
I'm basically a C++ programmer; relatively new to web development. So, I do not have fair understanding of the JS objects. How long of a string can JS variable hold? Is it ok to use split and generate HTML at the client.
I'm not generating the complete HTML at the server side to avoid any overhead because of string concatenation. I believe sending less no. of characters to the client (like I'm doing) is better as compared to sending complete HTML code.
Is something (or everything) wrong in my understanding :)
Any help is appreciated.
EDIT:
Well, I'll be highly grateful if I could get opinions in yes/no. What would you recommend. Sending HTML to the client or a string that will be used at the client to generate HTML?
Unless you have a specific reason for doing so, I think you should look into generating the HTML with PHP and sending it directly to the browser. PHP was built specifically for this purpose.
I think you be best off to look at jQuery and more specific to the AJAX method of that library. Also, take a look at JSON and you should be all good to go.
Have you considered using a templating engine like Smarty?
It's pretty simple to use, take a look at the crash course, you might like it! http://www.smarty.net/crash_course

PhoneGap: use PHP to create the HTML?

I want to make apps for iPhone and Android, but as an enthusiastic PHP-programmer, I'm not really willing to learn Java or C++. So I ran into PhoneGap.
BUT... I don't really like programming in pure HTML and Javascript either! (all those hooks, commas in JQuery etc. are just too messy, in my opinion...and I hate CSS too).
The way I use PHP now, is that I have written a quite advanced framework, that processes clean xml-templates into HTML/Javascript. This way I can make my own custom HTML-tags, that do all the work of CSS, extra HTML and it creates all Javascript for me automatically...
It's a bit how Delphi for PHP and Prado work. Once the (visual!) PHP-components are done, I can use them over and over again... and only have to think about HTML, CSS and Javascript once, while building the component...
Okay, now my question: since I can't use PHP on the client with PhoneGap, but DO need the HTML, would it be a crazy idea to let my web-server create the HTML for me the first time the app runs, then store the HTML locally using PhoneGap, and then the next time the app is loaded, reuse the locally stored HTML ?
So my question is : can I create HTML on a webserver, and then store that HTML locally so my PhoneGap-app can use that? Or is the HTML in a PhoneGap app somehow 'compiled' and cannot be changed afterwards ?
Or is this a really stupid idea and should I abandon my nice PHP-components framework? What are your thoughts?
It's hard to give a meaningful answer to a question like this without some context. The big question that you've left unanswered is: What are you really trying to do? What will the apps that you create do, and what will make yours different and better than apps that are already out there? Do you want to sell these apps in the app store? Are you trying to collect and/or disseminate mission-specific information for your company?
On the face of it, writing a bunch of HTML and PHP that'll execute on your server just to generate a bunch of HTML and JavaScript that'll execute in a PhoneGap application seems like a lot of trouble. OTOH, if that's what you're most comfortable with, and if you can get it to do what you want, go for it.
If you give your framework a catchy name and make some bold assertions about how it's the newest, fastest best way to develop mobile applications, you can probably turn it into a book deal. ;-) Until that happens though, you'll have a hard time finding answers to questions about writing iPhone apps using PHP.

Passing of Web Scraping Data

I'm currently writing an application that will extract data from a few different websites to be passed back to my app, parsed, formatted, and displayed. The problem I keep running into is being able to pass in and display the data in a graphical manner. I was hoping to use HTML5 to do this, and all of my scraping is set up in php. Of course, to draw in HTML5 requires using JavaScript, and getting my php output to JavaScript seems messy. Am I missing a better way to architect this solution?
It seems like a good way to me, as good as any, except it's not very backwards compatible, it might be better to do the graphing server side
If you want to do graphics directly in php you may want to look at using GD or ImageMagick.

Can I execute js files via php?

The situation is next:
I have php file, which parses a web-page. on that web-page is a phone number and it's digits are mixed with each other. The only way to put each digit on the correct place is to use some JS functions (on the client side). So, when I execute that php file in linux console, it gives me all that I need, except js function's result (no wonder - JavaScript is not a server-side language). So all I see from JS - only a code, that I have written.
The question: can I execute js files via php and how?
Results of a quick google search (terms = javascript engine php)
J4P5 -- not developed since 2005 [BAD](according to its News)
PECL package spidermonkey
a 2008 post by jeresig points to PHPJS but can't see when it was last updated.
I'm sure you'll find many more links on that google search.
Alternatively:
you say that the digits are scrambled and you need to "unscramble" them using js. Can you code that unscrambling logic into a PHP function and just use it? Will sure save you a lot of trouble, but if learning to use js in php is what you're after, then its a whole different story...
Zend tutorial: "Using javascript in PHP with PECL and spidermonkey"?
http://en.wikipedia.org/wiki/Comparison_of_Server-side_JavaScript_solutions
Alternatively, PHP has simple functions for executing other programs and retrieving their output (I used this along with GPG once to create a PHP file manager which could live-encrypt files as you were uploading them and live-decrypt as you were downloading them)
Using those functions along with http://code.google.com/p/v8/ you should be able to interpret any javascript.
Not unless you know someone who's implemented a Javascript engine in PHP.
In other words, probably not.
Without some sort of browser emulation or passing the unparsed js off to a server side implementation of javascript (maybe node.js?), you won't be able to execute it.
However, does the page use the same js function to unscramble the phone number every time? You should be able to read the incorrect digits and shuffle them with PHP.
If you're prepared to do a bit of work building your own JS runtime to work with it, Tim Whitlock has written a javascript tokenizer and parser in pure PHP
node.js is server-side... but full JS :) no PHP in it so I don't it answer your needs...
Anyway, here is an example : a chat in JS both client & server-side : http://chat.nodejs.org/
Plus, not every host allows you to use the v8 engine...
If you have Javascript data objects, and you need to convert them to/from PHP arrays, that's quite easy using PHP's json_encode() and json_decode() functions.
But actually running Javascript code? No. You can't. You might be able to find a JS interpreter written in PHP (a few other answers have pointed a links that may or may not help you here), or more likely execute the JS using a stand-alone JS interpreter on your server which you call out to from PHP. However if the JS code includes references to the browser's DOM (which is highly likely), that's a whole other set of issues which will almost certainly make it impossible.
Given the way you describe the question, I'd say the easiest solution for you would just be to re-implement the JS code as PHP code; it's unlikely that all the work arounds being suggested would be appropriate for what sounds like a fairly simple bit of utility code.

Categories