I feed my website with info coming from a table in another website. I used to get the needed info with:
$html = file_get_contents('http://www.example.ex');
and then work with it through regular expressions.
Unfortunately, the other website has changed, and now the source code is not an HTML table anymore.
But, if I Inspect the element with the info (Chrome browser) I find out it is a table, and I can "copy" the "Outer-HTML" of that element and "paste" it into my files.
Is there any other way, more "professional", to capture that info (the Outer-HTML of an element or the whole page), than copy-paste? Thanks to everyone.
Maybe this post is useful to you : Stackoverflow Post
But if this doesn't work. Someone over there suggests a PHP web scraper Framework called Goutte which could be (more) useful to you if the website changes again.
Related
I am trying to pass http variables to a page on my website containing some PHP code, and retrieve a response using Android.
I directed URL.openStream() to the desired website, and collected the first string using BufferedRaader, but it gave me the first line of the source code, as opposed to what a browser would see if it were navigating the page.
This question is difficult to ask because I am not familiar enough with web language to describe exactly what I want, but...
Using Android, How would I retrieve what my browser sees on a page, and not the actual source code for the page?
I think "webview" is what you are looking for.
Useful link: http://developer.android.com/reference/android/webkit/WebView.html
What I am trying to do is to make something similar to what I see all the time on almost any website. The button that says Share to facebook. The goal for me is to let my guests share the item they are viewing in my store (Ran on prestashop) on their blog I run (Running on Oxwall).
The goal is for the button to not only link to a blog post submission webpage but to already have the subject line filled out with the item they are sharing's name and the blog post to display the information about the item. I would like to try and do all this using PHP. I am not sure how to go about doing it but I am sure that I could pass the value. Please note that I can mod BOTH the blog site and the shop as I run both and want to connect them.
As an extra bonus I am also running a forum using phpbb3 if I could do the same thing but onto that as well I would greatly thank you. I am trying to interlink everything into one big network. I know its not an easy task but I am sure there is an easy way to pass data onto the other site so that this can be done.
Facebook a 2 tools to get items informations in the page, it parses the page looking for the most common tags and it uses OpenGraph.
You can also provide product informations in the head of your page (between head tags), then blog side, you retrieve only the contents and parse it as XML.
I advise you to cache this data to avoid useless connections between websites and awful overloads while parsing.
You can use your own specifications, Open Graph or another standard, but i advise to use a standard.
I'm building a website and am looking for a way to implement a certain feature that Facebook has. The feature that am looking for is the link inspector. I am not sure that is what it is called, or what its called for that matter. It's best I give you an example so you know exactly what I am looking for.
When you post a link on Facebook, for example a link to a youtube video (or any other website for that matter), Facebook automatically inspects the page that it leads you and imports information like page title, favicon, and some other images, and then adds them to your post as a way of giving (what i think is) a brief preview of the page to anyone reading that post.
I already have a feature that allows users to share a link (or URLs). What I want is to do something useful with the url, to display something other than just a plain link to a webpage, to give someone viewing a shared link (in the form if a post) some useful insight into the page that the url leads to.
What I'm looking for is a script, or tutorial, or at the very least someone to point me in the right direction, so that it can help me accomplish this (using PHP preferably).
I've tried googling it but I don't know exactly what such a feature would be called and google isn't helpful when you don't exactly know what you're looking for.
I figure someone out there, in this vast knowledge basket called stackoverflow, can help me with this. Can anyone help me?
You would first scan the page for URLs using regex, then you would parse the pages those links reference with a php DOMDocument. You could use the parsed document to obtain any information you need from the webpage.
DOMDocument:
http://php.net/manual/en/class.domdocument.php
DOMDocument->load (loads a file, aka a webpage):
http://php.net/manual/en/domdocument.load.php
the link goes through http://www.facebook.com/l.php
You pass a URL to this and facebook filters it.
I want to add a function to my PHP/mysql/jQuery website.
The function is that if user paste a link in a input box,
the server will retrieve all representative pics
just as facebook does.
Is there any PHP code project or jQuery plugin satisfying my demand?
There are lots of services.
Take a look at websnaper for example:
or just google it
It is not hard to write your own from scratch.
Facebook uses the Open Graph Protocol - it retrieves the page and then looks for special meta tags that describe the images associated with that page (og:image).
I guess you can write a basic HTML parser that would do the same.
EDIT: Someone has already written an Open Graph parser
I've tried a bunch of techniques to crawl this url (see below), and for some reason the title comes back incorrect. If I look at the source of the page with firebug I can see the correct title tag, however, if I view the page source it's different.
Using several php techniques I get the same result. Digg is able to crawl the page and parse the correct title.
Here's the link: http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
The correct title is "How to Make Your iPhone (or Other iOS Device) More Like Android"
The parsed title is "Lifehacker, tips and downloads for getting things done"
Is this normal? How are they doing this? Is there a way to get the correct title?
That's because when you request it using PHP (without any JS support) you're getting the main page of lifehacker - which is lifehacker.com.
Lifehacker switched their CMS recently so that all requests go to an initial page and then everything after the hashbang is read by a JS script in the main page to figure out which page needs to be served. You need to modify your program to take this into account
EDIT
Have a gander at these links
http://code.google.com/web/ajaxcrawling/docs/getting-started.html
http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch
Found the answer:
http://lifehacker.com/#!5772420/how-to-make-ios-more-like-android
becomes:
http://lifehacker.com/?_escaped_fragment_=5772420/how-to-make-ios-more-like-android