Lets say for example we go to this page:
https://www.facebook.com/feeds/page.php?format=rss20&id=133869316660964
How can I strip out programmatically data from that.. or pull peices of data into a variable / array for example you want the 2 latest posts (part in the tag is it possible to acccess that?
Sorry for such an open question but I havent found a solution in some time of searching.
Thanks.
Parse it with PHP’s DOM extension: http://php.net/manual/en/book.dom.php
Related
I want to get the content of this 1st and 2nd webpage https://www.goodreads.com/search?utf8=%E2%9C%93&q=%D8%B3%D9%8A%D8%A7%D8%B3%D8%A9&search_type=books
and then store it in my database and then make the list is searchable.so after I googled I found that I can do it like this
$url = "https://www.goodreads.com/search?utf8=%E2%9C%93&q=%D8%B3%D9%8A%D8%A7%D8%B3%D8%A9&search_type=books";
function get_links($url){
$input = file_get_contents($url);
echo $input;
}
get_links($url);
my problem is how can I get the 2nd page content also and how can I store these books in my database to the list searchable
The answer is not that easy...
Options
Getting the pages (Not recommended)
To get a later page you can send the "page argument" in your request:
e.g.:
https://www.goodreads.com/search?page=2&q=%D8%B3%D9%8A%D8%A7%D8%B3%D8%A9&search_type=books&tab=books&utf8=%E2%9C%93
But to get the Elements into a nice structure you need to parse the HTML you get which is realy hard.
Use the API (recommended)
At https://www.goodreads.com/api/index you can find the documentation for goodreads API which returns example as response and is easily parseable.
Parse XML in PHP
If you use the API you can parse the XML-response with SimpleXML.
See example 1 & 2: http://php.net/manual/en/simplexml.examples-basic.php
Saving to database
If you are a beginner you might read some tutorials about how to use mysql with php and PDO. But you may also have a look at RedBeanPHP which is an really easy to use ORM for databases.
For getting 2nd page you will have to append on the url ?page=2
In case that you need to get other pages, you can use a for loop if you don't know how to use for loop, please google php for loop
For the database, google php mysql insert
First off, should mention that I have permission from the stores to scrape this data so legality isn't an issue here!
I'm trying to scrape information from various online stores, and store them in a database once every hour.
Example site: http://www.uptherestore.com/department/accessories
I've tried a PHP scrape like this:
<?php
$file_string = file_get_contents('http://www.uptherestore.com/department/accessories');
preg_match('/<div class="view view-uc-products view-id-uc_products view-display-id-page_3 storeview view-dom-id-1">
(.*)<\/div>/i', $file_string, $title);
$title_out = $title[1];?>
<p><strong>Accessories:</strong> <?php echo $title_out; ?></p>
but it's giving me errors of the ilk:
[14-Feb-2013 07:39:49 UTC] PHP Warning: DOMDocument::loadHTML() [<a href='domdocument.loadhtml'>domdocument.loadhtml</a>]: htmlCheckEncoding: encoder error in Entity, line: 7 in scraping.php on line 5
Full error from log file is here: http://pastebin.com/W2Bhkc0s
Even if I do manage to scrape from that site, it will only return the first page of results (when I need all pages). My current solution to this would be:
Use jQuery to check how many elements are in the pager at the bottom of the page
Run a loop that scrapes each of these pages
But this is no ideal - as you can see, at the bottom of the page there are pages 1...9 but if you click "last" there are actually 11 pages of content. In short, what's the best method to scrape data from sites like this? As mentioned the store owners have all given me permission to use their content, but they're not particularly technically minded and cannot give me access to their servers/put any code in their servers' .htaccess to allow requests from my website.
Paging is simple, you just find the link that says 'next' and follow it until it's not there anymore. Unless you're comfortable with xpath want a good html parser library (phpquery, simple-html-dom). Be prepared to spend a good deal of time figuring out the right way to do it, and above all, don't listen to anyone who tells you to use regex.
First of all, your error message does not seem to fit your php code: The php is trying to parse the html using a regex (wrong!) and the error message suggests that you are using an html parser (DOMDocument) to parse the html (the right way).
What you would need to do is the following:
Get the html from a product page (like you are doing now...);
Check if that page has already been parsed in your database (see the next point);
Use an html parser to get the info from that page that you need and store everything in a database - including the link to the product page or another identifying property of that page and some sort of time-stamp so that you know what you have done already;
Use an html parser to get all product links in the html;
Go to 1. for every product link that you have found.
You probably need to build in some logic to make sure your script does not enter in a never-ending loop or runs too long, but that is basically it; no browser / javascript / ajax is required until you actually want to see the results of the operations in your browser.
Use cURL and regex to filter through what you need. Google cURL the php.net site will give you all the information you need
I want to get the url of an image (where it's stored) from the description of a RSS feed, so then I can put that url inside a variable.
I know that for getting the link of the RSS feed post I have to use $feed->channel->item->link; where $feed is $feed=simplexml_load_file("link_of_the_feed";.
But what if I have get the image url of the post, do I have to use something like $feed->channel->item->image;?
I really don't know, maybe a RSS parser like MagPie RSS which I tried without results?
Thanks in advance.
If the image node is at the top level of the item node, then yes. If it's deeper than the item node, you'll have to traverse it accordingly. It would be helpful if you posted your XML.
EDIT: you can also check out my answer here on how to parse through an XML file with PHP.
You're on the right track! But it all depends on the format that the RSS Feed is set up in.
The item node actually contains a whole bunch of different fields, of which link is only one. Take a look here for information on the other fields that the item node contains.
Now, if the RSS feed points directly to the image file, then you can just use item->link. More likely, however, the link points to a blog post or something that has the image embedded in it. In this case, you can undertake some processing on $feed->channel->item->description to find what you need. The description node contains an escaped HTML summary of the post, and then from there, you can just use a regular expression to find the source of the image. Also remember: before you start using the regular expressions, you might need to decode the description using htmlspecialchars_decode() before you start processing it with the regular expressions - in my own experience, descriptions often come formatted with special characters escaped.
I know that's a lot of information, but once you get started it's really not as hard as it sounds. Good luck!
I am trying to build a very simple price comparison script.
Until now, I wrote a code that gets some product xml feeds from shops and with the help of XSLT I create a single-global xml of all those input XMLs. I use the XSLT because the shops have different names for elements.
Now I want to take it one step further and I want to create a search form that will display me the products let's say I have the term "laptop".
I know how to create a form, but I need a coding guidance to understand how to make it to search in my XML file (products.xml) and display let's say the
Thank you
You might want to check out http://php.net/manual/en/class.xmlreader.php
Using that it is pretty easy to navigate through an XML file and grab all the info you need.
EDIT:
On second thought, http://php.net/manual/en/book.simplexml.php is a MUCH simpler way to achieve what you're trying to do. Hence the name, I guess ;)
You can use SimpleXML library to parse your xml file. In my opinion SimpleXML is easier to use than xmlreader. Though SimpleXML is introduced on php5.
I'm trying to parse information from fonefinder.net. I was trying to use simplexmlload_file, but couldn't get the page to load successfully.
Now, I'm looking into Curl. But I'm not sure if this will work either.
I basically just want to take the html from the fonefinder page, and parse it to get phone carrier and city.
Is that possible? How?
SimpleXML will only work if the HTML is formatted correctly - and that is rarely the case ;)
You could do a simple cURL call to fetch the data and the easiest thing would probably be using a regular expression to get the information you need.
The solution however is not easy to supply you with, with nothing to go on. But this was an idea.
I Recommend using:
http://www.de.php.net/manual/en/function.file-get-contents.php to get the document
http://www.php.net/manual/en/domdocument.loadhtml.php to load it
http://www.php.net/manual/en/class.domxpath.php to get the information from it
Or use the search function here, that question must have been asked over and over, for example PHP: Fetch content from a html page using xpath()