I read a lot of articles explaining how to parse a HTML file with PHP but in the case of twitter it uses iframe where texts are hidden. How can I parse the twitter HTML?
I know it is very easy to use API's or .rss page or json to get the tweets/string but I want to be able to work with twitter HTML page directly. Is there any way I could find the tweets using their html page?
The best way would be to use something like Simple HTML DOM. With this you can use CSS selectors like with jQuery to find the elements on the page you are looking for. However Twitter pages use a lot of javascript and ajax so you may be stuck with either using an API or maybe you could try it with the mobile site.
Related
The url I want to scrape is https://www.tokopedia.com/juraganlim/info
and I just want to get the number of transaction like in this image (inside box is what I need to take):
I really am confused with ajax because I don't know the url which comes or goes.
When I inspect using Firefox it produces so many links.
Please anyone can give me a clue? Or directly the script?
I'm trying to use some data from another site as a news feed on mine that will automatically update. I have permission to use this information.
I'm trying to decide whether to just use the RSS feed and add that to my site or use the Command Curl.
What do you recommend using?
I want the text to go in to a rectangle space in a div on my page and so i can customize this to go with the colour and design of my page.
Thanks
If there is a RSS, use that! That's where it's for.
Get the text of the RSS, and display it in a div.
You should also link to the page, which is provided in the RSS, as that is in most cases something the owner would want.
Use simplexml_load_file() function to read the RSS, and proceed with it using simpleXML. See also some documentation:
http://nl2.php.net/simplexml_load_file
http://nl2.php.net/manual/en/book.simplexml.php
You should be able to handle it on your own. If you don't, try to do some tutorials on simplexml or even PHP
anyone have any idea how to generate excerpt from any given article page (so could source from many type of sites)? Something like what facebook does when you paste a url into the post. Thank you.
What you're looking to do is called web scraping. The basic method for doing so would be to capture the page (you can scrape a URL using file_get_contents), and then somehow parse it for the content that you want (ie. pull out content from the <body> tag).
In order to parse the returned HTML, you should use a DOM parser. PHP has its own DOM classes which you can use.
Here is a video tutorial about how to do that:
http://net.tutsplus.com/tutorials/php/how-to-create-blog-excerpts-with-php/
i need to extract data from url
like title , description ,and any vedios images in the given url
like facebook share button
like this :
http://www.facebook.com/sharer.php?u=http://www.wired.com&t=Test
regards
Embed.ly has a nice api for exactly this purpose. Their api returns the site's oEmbed data if available - otherwise, it attempts to extract a summary of the page like Facebook.
Use something like cURL to get the page and then something like Simple HTML DOM to parse it and extract the elements you want.
If the web site has support for oEmbed, that's easier and more robust than scraping HTML:
oEmbed is a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource, without having to parse the resource directly.
oEmbed is supported by sites like YouTube and Flickr.
I am working on a project for this issue, it is not as easy as writing an html parser and expecting sites to be 'semantical'. Especially extracting videos and finding auto-play parameters are killing. You can check the project in http://www.embedify.me, which has also fb-style url preview script. As I see, embed.ly and oembed are passive parser, they need the sites to support them, so called providers, the approach is quite different than fb does.
While I was looking for a similar functionality, I came across a jQuery + PHP demo of the url extract feature of Facebook messages:
http://www.99points.info/2010/07/facebook-like-extracting-url-data-with-jquery-ajax-php/
Instead of using an HTML DOM parser, it works with simple regular expressions. It looks for title, description and img tags. Hence, the image extraction doesn't perform well with a lot of websites, which use CSS for images. Also, Facebook looks first at its own meta tags and then at the classic description tag of HTML but it illustrates well the principe.
On a website I am maintaining for a radio station they have a page that displays news articles. Right now the news is posted in an html page which is then read by a php page which includes all the navigation. I have been asked to make this into and RSS feed. How do I do this? I know how to make the XML file but the person who edits the news file is not technical and needs a WYSIWYG editor. Is there a WYSIWYG editor for XML? Once I have the feed how do I display it on my site? Im working with PHP on this site so a PHP solution would be preferred.
Use Yahoo Pipes! : you don't need programming knowledge + the load on your site will be lower. Once you've got your feed, display it on your site using a simple "anchor" with "image" in HTML. You could consider piping your feed through Feedburner too.
And for the freeby: if you want to track your feed awareness data in rss, use my service here.
Are you meaning that someone will insert the feed content by hand?
Usually feeds are generated from the site news content, that you should already have into your database.. just need a php script that extract it and write the xml.
Edit: no database is used.
Ok, now you have just 2 ways:
Use php regexp to get the content you need from the html page (or maybe phpQuery)
As you said, write the xml by hand and then upload it, but i havent tryed any wysiwyg xml editor, sorry.. there are many on google
Does that PHP site have a database back end? If so, the WYSIWYG editor posts into there then a special PHP file generates an RSS feed.
I've used the following IBM page as a guide and it worked wonderfully:
http://www.ibm.com/developerworks/library/x-phprss/
I decided that instead of trying to find a WYSIWYG for XML that I would let the news editor continue to upload the news as HTML. I ended up writing a php program to find the <p> and </p> tags and creating an XML file out of it.
You could use rssa.at - just put in your URL and it'll create a RSS feed for you. You can then let people sign up for alerts (hourly/daily/weekly/monthly) for free, and access stats.
If the HTML is consistent, you could just have them publish as normal and then scrape a feed. There are programatic ways to do this for sure but http://www.dapper.net/dapp-factory.jsp is a nice point and click feed scraping service. Then, use either MagpieRSS, SimplePie or Feed.informer.com to display the feed.