anyone have any idea how to generate excerpt from any given article page (so could source from many type of sites)? Something like what facebook does when you paste a url into the post. Thank you.
What you're looking to do is called web scraping. The basic method for doing so would be to capture the page (you can scrape a URL using file_get_contents), and then somehow parse it for the content that you want (ie. pull out content from the <body> tag).
In order to parse the returned HTML, you should use a DOM parser. PHP has its own DOM classes which you can use.
Here is a video tutorial about how to do that:
http://net.tutsplus.com/tutorials/php/how-to-create-blog-excerpts-with-php/
Related
I want to parse a streaming website in PHP to find the streaming links and codes.
The streaming page ie index.htm looks like
<html>
<iframe> something something src= xyz.aspx?id=abc <iframe>
<\html>
When I parse the main page I can see those codes..but I need to parse the code of xyz.asx?id....
If I parse that page directly it doesnt show the streaming codes...
The main page and the iframe page somehow connected...
Is there any way to parse both at the same time..perhaps with referring or something..so the the second page thinks as if it is loaded on the stream page.
Any clue? Curl?
Since it seems like you are only searching for a couple codes, you might want to write a regex (GASP! I KNOW, REGEX + HTML = BAD, hear me out). I'd find a regex that captures all <iframe> tags. Then parse those for the src attribute, and then get the information that you need.
Edit: I don't think I understood your question entirely, you also want to embed the iframe'd page as if it were viewed in a browser?
If so, read on!
Ok, now that you have the src attribute of the <iframe>, just download the linked page (I'm a little rusty with PHP, I'm sure you can find one of the thousands of functions that would work) and str_replace() the <iframe> tag with the downloaded code
I read a lot of articles explaining how to parse a HTML file with PHP but in the case of twitter it uses iframe where texts are hidden. How can I parse the twitter HTML?
I know it is very easy to use API's or .rss page or json to get the tweets/string but I want to be able to work with twitter HTML page directly. Is there any way I could find the tweets using their html page?
The best way would be to use something like Simple HTML DOM. With this you can use CSS selectors like with jQuery to find the elements on the page you are looking for. However Twitter pages use a lot of javascript and ajax so you may be stuck with either using an API or maybe you could try it with the mobile site.
I'm trying to use some data from another site as a news feed on mine that will automatically update. I have permission to use this information.
I'm trying to decide whether to just use the RSS feed and add that to my site or use the Command Curl.
What do you recommend using?
I want the text to go in to a rectangle space in a div on my page and so i can customize this to go with the colour and design of my page.
Thanks
If there is a RSS, use that! That's where it's for.
Get the text of the RSS, and display it in a div.
You should also link to the page, which is provided in the RSS, as that is in most cases something the owner would want.
Use simplexml_load_file() function to read the RSS, and proceed with it using simpleXML. See also some documentation:
http://nl2.php.net/simplexml_load_file
http://nl2.php.net/manual/en/book.simplexml.php
You should be able to handle it on your own. If you don't, try to do some tutorials on simplexml or even PHP
i need to extract data from url
like title , description ,and any vedios images in the given url
like facebook share button
like this :
http://www.facebook.com/sharer.php?u=http://www.wired.com&t=Test
regards
Embed.ly has a nice api for exactly this purpose. Their api returns the site's oEmbed data if available - otherwise, it attempts to extract a summary of the page like Facebook.
Use something like cURL to get the page and then something like Simple HTML DOM to parse it and extract the elements you want.
If the web site has support for oEmbed, that's easier and more robust than scraping HTML:
oEmbed is a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource, without having to parse the resource directly.
oEmbed is supported by sites like YouTube and Flickr.
I am working on a project for this issue, it is not as easy as writing an html parser and expecting sites to be 'semantical'. Especially extracting videos and finding auto-play parameters are killing. You can check the project in http://www.embedify.me, which has also fb-style url preview script. As I see, embed.ly and oembed are passive parser, they need the sites to support them, so called providers, the approach is quite different than fb does.
While I was looking for a similar functionality, I came across a jQuery + PHP demo of the url extract feature of Facebook messages:
http://www.99points.info/2010/07/facebook-like-extracting-url-data-with-jquery-ajax-php/
Instead of using an HTML DOM parser, it works with simple regular expressions. It looks for title, description and img tags. Hence, the image extraction doesn't perform well with a lot of websites, which use CSS for images. Also, Facebook looks first at its own meta tags and then at the classic description tag of HTML but it illustrates well the principe.
On a website I am maintaining for a radio station they have a page that displays news articles. Right now the news is posted in an html page which is then read by a php page which includes all the navigation. I have been asked to make this into and RSS feed. How do I do this? I know how to make the XML file but the person who edits the news file is not technical and needs a WYSIWYG editor. Is there a WYSIWYG editor for XML? Once I have the feed how do I display it on my site? Im working with PHP on this site so a PHP solution would be preferred.
Use Yahoo Pipes! : you don't need programming knowledge + the load on your site will be lower. Once you've got your feed, display it on your site using a simple "anchor" with "image" in HTML. You could consider piping your feed through Feedburner too.
And for the freeby: if you want to track your feed awareness data in rss, use my service here.
Are you meaning that someone will insert the feed content by hand?
Usually feeds are generated from the site news content, that you should already have into your database.. just need a php script that extract it and write the xml.
Edit: no database is used.
Ok, now you have just 2 ways:
Use php regexp to get the content you need from the html page (or maybe phpQuery)
As you said, write the xml by hand and then upload it, but i havent tryed any wysiwyg xml editor, sorry.. there are many on google
Does that PHP site have a database back end? If so, the WYSIWYG editor posts into there then a special PHP file generates an RSS feed.
I've used the following IBM page as a guide and it worked wonderfully:
http://www.ibm.com/developerworks/library/x-phprss/
I decided that instead of trying to find a WYSIWYG for XML that I would let the news editor continue to upload the news as HTML. I ended up writing a php program to find the <p> and </p> tags and creating an XML file out of it.
You could use rssa.at - just put in your URL and it'll create a RSS feed for you. You can then let people sign up for alerts (hourly/daily/weekly/monthly) for free, and access stats.
If the HTML is consistent, you could just have them publish as normal and then scrape a feed. There are programatic ways to do this for sure but http://www.dapper.net/dapp-factory.jsp is a nice point and click feed scraping service. Then, use either MagpieRSS, SimplePie or Feed.informer.com to display the feed.