How to retrieve webpage data (not source code) using Android?

How to retrieve webpage data (not source code) using Android? - php

I am trying to pass http variables to a page on my website containing some PHP code, and retrieve a response using Android.
I directed URL.openStream() to the desired website, and collected the first string using BufferedRaader, but it gave me the first line of the source code, as opposed to what a browser would see if it were navigating the page.
This question is difficult to ask because I am not familiar enough with web language to describe exactly what I want, but...
Using Android, How would I retrieve what my browser sees on a page, and not the actual source code for the page?

I think "webview" is what you are looking for.
Useful link: http://developer.android.com/reference/android/webkit/WebView.html

Related

Get OuterHtml from another website with php

I feed my website with info coming from a table in another website. I used to get the needed info with:
$html = file_get_contents('http://www.example.ex');
and then work with it through regular expressions.
Unfortunately, the other website has changed, and now the source code is not an HTML table anymore.
But, if I Inspect the element with the info (Chrome browser) I find out it is a table, and I can "copy" the "Outer-HTML" of that element and "paste" it into my files.
Is there any other way, more "professional", to capture that info (the Outer-HTML of an element or the whole page), than copy-paste? Thanks to everyone.

Maybe this post is useful to you : Stackoverflow Post
But if this doesn't work. Someone over there suggests a PHP web scraper Framework called Goutte which could be (more) useful to you if the website changes again.

Is it possible yet to make a single page web page without using server side such as PHP?

I'm looking to make a simple web page (mainly used on the local machine) that would just be a single file (such as htm or html) but would dynamically change based on the url.
For example, if I went to 'file:///C:/Sandbox/test.htm' it might display the following...
Hello World
But if I went to 'file:///C:/Sandbox/test.htm?page=2' it might display the following...
You are visiting my second page!
I know I can do this type of thing with PHP or ASP but is it possible to do it with HTML or javascript or anything "native" to the browser yet?
Thanks in advance for the help!

Client side javascript can know about the query string on a URL. In javascript, it can be accessed through location.search.
You could have your javscript show and hide different sections of the page based on the information in the query string.

Pointing crawler to HTML snapshot

I'm trying to make my AJAX website crawlable:
Here is the website in question.
I've created a htmlsnapshot.php that generates the page (this file needs to be passed the hash fragment to be able to generate the right content).
I don't know how to get the crawler to load this file while getting normal users to load the normal file.
I don't really understand what the crawler does to the hash fragment (and this probably is part of my problem.)
Does anybody have any tips?

The crawler will divert itself. You just need to configure your PHP script to handle the GET parameters that Google will be sending your site (instead of relying on the AJAX).
Basically, when Google finds a link to yourdomain.com/#!something instead of requesting / and running the JavaScript to make an AJAX request for data something, Google will automatically (WITHOUT you doing anything) translate anything that comes after #! in your URL to ?_escaped_fragment_=something.
You just need to (in your PHP script) check if $_GET['_escaped_fragment_'] is set, and if so, display the content for that value of something.
It's actually very easy.

Using YQL in javascript/php to scrape article html?

I'm new to YQL, and just trying to learn how to do some fairly simple tasks.
Let's say I have a list of URLs and I want to get their HTML source as a string in javascript (so I can later insert it to a database via ajax). How would I go about getting this info back in Javascript? Or would I have to do it in PHP? I'm fine with either, really - whatever can work.
Here's the example queries I'd run on their console:
select * from html where url="http://en.wikipedia.org/wiki/Baroque_music"
And the goal is to essentially save the HTML or maybe just the text or something, as a string.
How would I go about doing this? I somewhat understand how the querying works, but not really how to integrate with javascript and/or php (say I have a list of URLs and I want to loop through them, getting the html at each one and saving it somewhere).
Thanks.

You can't read other pages with Javascript due to a built-in security feature in web browsers. It is called the Same origin policy.
The usual method is to scrape the content of these sites from the server using PHP.
There is an other option with javascript called a bookmarklet.
You can add the bookmarklet in your bookmarks bar, and each time you want the content of a site click the bookmark.
A script will be loaded in the host page, it can read the content and post it back to your server.
Oddly enough, the same origin policy, does not prevent you to POST data from this host page to your domain. You need to POST a FORM to an IFRAME that has a source hosted on your domain.
You won't be able to read the response you get back from the POST.
But you can poll with a setInterval making a JSONP call to your domain to know if the POST was successful.

Crawl Website using PHP

I've tried a bunch of techniques to crawl this url (see below), and for some reason the title comes back incorrect. If I look at the source of the page with firebug I can see the correct title tag, however, if I view the page source it's different.
Using several php techniques I get the same result. Digg is able to crawl the page and parse the correct title.
Here's the link: http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
The correct title is "How to Make Your iPhone (or Other iOS Device) More Like Android"
The parsed title is "Lifehacker, tips and downloads for getting things done"
Is this normal? How are they doing this? Is there a way to get the correct title?

That's because when you request it using PHP (without any JS support) you're getting the main page of lifehacker - which is lifehacker.com.
Lifehacker switched their CMS recently so that all requests go to an initial page and then everything after the hashbang is read by a JS script in the main page to figure out which page needs to be served. You need to modify your program to take this into account
EDIT
Have a gander at these links
http://code.google.com/web/ajaxcrawling/docs/getting-started.html
http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch

Found the answer:
http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
becomes:
http://lifehacker.com/?_escaped_fragment_=5772420/how-to-make-ios-more-like-android

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to retrieve webpage data (not source code) using Android? - php

I think "webview" is what you are looking for. Useful link: http://developer.android.com/reference/android/webkit/WebView.html

Related

Get OuterHtml from another website with php

Is it possible yet to make a single page web page without using server side such as PHP?

Pointing crawler to HTML snapshot

Using YQL in javascript/php to scrape article html?

Crawl Website using PHP

Categories

Resources