PHP-Retrieve specific content from multiple pages of a website

PHP-Retrieve specific content from multiple pages of a website - php

What I want to accomplish might be a little hardcore, but I want to know if it's possible:
The question:
My question is the same as PHP-Retrieve content from page, but I want to use it on multiple pages.
The situation:
I'm using a website about TV shows. All the TV shows have the same URL and then the name of the show:
http://bierdopje.com/shows/NAME_OF_SHOW
On every show page, there's a line which tells you if the show is cancelled or still running. I want to retrieve that line to make an overview of the cancelled shows (the website only supports an overview of running shows, so I want to make an extra functionality).
The real question:
How can I tell DOM to retrieve all the shows and check for the status of the show?
(http://bierdopje.com/shows/*).
The Note:
I understand that this process may take a while because it is reading the whole website (or is it too much data?).

use this code to fetch only the links from the single website.
include_once('simple_html_dom.php');
$html = file_get_html('http://www.couponrani.com/');
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';

I use phpquery to fetch data from a web page, like jQuery in Dom.
For example, to get the list of all shows, you can do this :
<?php
require_once 'phpQuery/phpQuery/phpQuery.php';
$doc = phpQuery::newDocumentHTML(
file_get_contents('http://www.bierdopje.com/shows')
);
foreach (pq('.listing a') as $key => $a) {
$url = pq($a)->attr('href'); // will give "/shows/07-ghost"
$show = pq($a)->text(); // will give "07 Ghost"
}
Now you can process all shows individualy, make a new phpQuery::newDocumentHTML for each show and with an selector extract the information you need.
Get the status of a show
$html = file_get_contents('http://www.bierdopje.com/shows/alcatraz');
$doc = phpQuery::newDocumentHTML($html);
$status = pq('.content>span:nth-child(6)')->text();

Related

Get attributes from post data in PHP

I'm using laravel to create a simple social network. Users can type # in the post area to get a list of their friends to mention them. Every mention in a link like this (using zurb/tribute from github)
<a type="mention" href="/user/Jordan" title="Jordan">Jordan</a>
Normal links other than mentions won't have type='mention'
Now when I get the post and insert it into the database I need to get a list of users mentioned in the post. I'm looking for links which have the type ='mention' and if there's any I want to get the title of everyone to insert into the notification system. What PHP code do I need to add in this if statement?
if(stristr(request('post'),' type="mention" ')){
}

Aside from using an AST (Abstract Syntax Tree), your best bet would be to either use DOM on PHP Side, e.g.:
$string = '<a type="mention" href="/user/Jordan" title="Jordan">Jordan</a>';
$doc = new \DOMDocument();
$doc->loadHTML($string);
$xpath = new \DOMXPath($doc);
foreach ($xpath->query("//a[#type='mention']") as $a) {
$href = $a->getAttribute('href');
$title = $a->getAttribute('title');
echo sprintf("Found mention of %s with href of %s\n", $title, $href);
}
However, I probably wouldn't be sending the A node back to the server. You should consider working out some way to make it a display-only feature implemented on the browser side, and simply send the "#jordan" string back to the server.

Fetch data from site Page by Page & go through sub links

URL : http://www.sayuri.co.jp/used-cars
Example : http://www.sayuri.co.jp/used-cars/B37753-Toyota-Wish-japanese-used-cars
Hey guys , need some help with one of my personal projects , I've already wrote the code to fetch data from each single car url (example) and post on my site
Now i need to go through the main url : sayuri.co.jp/used-cars , and :
1) Make an array / list / nodes of all the urls for all the single cars in it , then run my internal code for each one to fetch data , then move on to the next one
I already have the code to save each url into a log file when completed (don't think it will be necessary if it goes link by link without starting from the top but will ensure no repetition.
2) When all links are done for the page , it should move to the next page and do the same thing until the end ( there are 5-6 pages max )
I've been stuck on this part since last night and would really appreciate any help . Thanks
My code to get data from the main url :
$content = file_get_contents('http://www.sayuri.co.jp/used-cars/');
// echo $content;
and
$dom = new DOMDocument;
$dom->loadHTML($content);
//echo $dom;

I'm guessing you already know this since you say you've gotten data from the car entries themselves, but a good point to start is by dissecting the page's DOM and seeing if there are any elements you can use to jump around quickly. Most browsers have page inspection tools to help with this.
In this case, <div id="content"> serves nicely. You'll note it contains a collection of tables with the required links and a <div> that contains the text telling us how many pages there are.
Disclaimer, but it's been years since I've done PHP and I have not tested this, so it is probably neither correct or optimal, but it should get you started. You'll need to tie the functions together (what's the fun in me doing it?) to achieve what you want, but these should grab the data required.
You'll be working with the DOM on each page, so a convenience to grab the DOMDocument:
function get_page_document($index) {
$content = file_get_contents("http://www.sayuri.co.jp/used-cars/page:{$index}");
$document = new DOMDocument;
$document->loadHTML($content);
return $document;
}
You need to know how many pages there are in total in order to iterate over them, so grab it:
function get_page_count($document) {
$content = $document->getElementById('content');
$count_div = $content->childNodes->item($content->childNodes->length - 4);
$count_text = $count_div->firstChild->textContent;
if (preg_match('/Page \d+ of (\d+)/', $count_text, $matches) === 1) {
return $matches[1];
}
return -1;
}
It's a bit ugly, but the links are available inside each <table> in the contents container. Rip 'em out and push them in an array. If you use the link itself as the key, there is no concern for duplicates as they'll just rewrite over the same key-value.
function get_page_links($document) {
$content = $document->getElementById('content');
$tables = $content->getElementsByTagName('table');
$links = array();
foreach ($tables as $table) {
if ($table->getAttribute('class') === 'itemlist-table') {
// table > tbody > tr > td > a
$link = $table->firstChild->firstChild->firstChild->firstChild->getAttribute('href');
// No duplicates because they just overwrite the same entry.
$links[$link] = "http://www.sayuri.co.jp{$link}";
}
}
return $links;
}
Perhaps also obvious, but these will break if this site changes their formatting. You'd be better off asking if they have a REST API or some such available for long term use, though I'm guessing you don't care as much if it's just a personal project for tinkering.
Hope it helps prod you in the right direction.

how to get page contents

I'm trying to make a recent news like functionality for my site. For this i've made a web crawler and have being able to collect links from a page up till now by doing the following
$dom = new domDocument;
#$dom->loadHTML(file_get_contents($url));
$dom->preserveWhiteSpaces = false;
$linksToStore = $dom->getElementsByTagName('a');
foreach($linksToStore as $tag){
$links[$tag->getAttribute('href')]= $tag->childNodes->item(0)->nodeValue;
}
how can i get contents from the pages pointed by those links related to a particular domain which in my case is 'Medical'??

Use this http://simplehtmldom.sourceforge.net/ library to extract contents from the page. The selector works same as of jQuery, which makes it very familier and efficient to extract the contents.
Also, check this http://davidwalsh.name/php-notifications to know more

Fetch database information on a new page without using new documents

I'm working on a page where I've listed some entries from a database. Although, because the width of the page is too small to fit more on it (I'm one of those people that wants it to look good on all resolutions), I'm basically only going to be able to fit one row of text on the main page.
So, I've thought of one simple idea - which is to link these database entries to a new page which would contain the information about an entry. The problem is that I actually don't know how to go about doing this. What I can't figure out is how I use the PHP code to link to a new page without using any new documents, but rather just gets information from the database onto a new page. This is probably really basic stuff, but I really can't figure this out. And my explanation was probably a bit complicated.
Here is an example of what I basically want to accomplish:
http://vgmdb.net/db/collection.php?do=browse&ltr=A&field=&perpage=30
They are not using new documents for every user, they are taking it from the database. Which is exactly what I want to do. Again, this is probably a really simple process, but I'm so new to SQL and PHP coding, so go easy on me, heh.
Thanks!

<?php
// if it is a user page requested
if ($_GET['page'] == 'user') {
if (isset($_GET['id']) && is_numeric($_GET['id'])) {
// db call to display user WHERE id = $_GET['id']
$t = mysql_fetch_assoc( SELECT_QUERY );
echo '<h1>' . $t['title'] . '</h1>';
echo '<p>' . $t['text'] . '</p>';
} else {
echo "There isn't such a user".
}
}
// normal page logic goes here
else {
// list entries with links to them
while ($t = mysql_fetch_assoc( SELECT_QUERY )) {
echo '<a href="/index.php?page=user&id='. $t['id'] .'">';
echo $t['title'] . '</a><br />';
}
}
?>
And your links should look like: /index.php?page=user&id=56
Note: You can place your whole user page logic into a new file, like user.php, and include it from the index.php, if it turns out that it it a user page request.

Nisto, it sounds like you have some PHP output issues to contend with first. But the link you included had some code in addition to just a query that allows it to be sorted alphabetically, etc.
This could help you accomplish that task:
www.datatables.net
In a nutshell, you use PHP to dynamically build a table in proper table format. Then you apply datatables via Jquery which will automatically style, sort, filter, and order the table according to the instructions you give it. That's how they get so much data into the screen and page it without reloading the page.
Good luck.

Are you referring to creating pagination links? E.g.:
If so, then try Pagination - what it is and how to do it for a good walkthrough of how to paginate database table rows using PHP.

Updating the XML file using PHP script

I'm making an interface-website to update a concert-list on a band-website.
The list is stored as an XML file an has this structure :
I already wrote a script that enables me to add a new gig to the list, this was relatively easy...
Now I want to write a script that enables me to edit a certain gig in the list.
Every Gig is Unique because of the first attribute : "id" .
I want to use this reference to edit the other attributes in that Node.
My PHP is very poor, so I hope someone could put me on the good foot here...
My PHP script :

Well i dunno what your XML structure looks like but:
<gig id="someid">
<venue></venue>
<day></day>
<month></month>
<year></year>
</gig>
$xml = new SimpleXmlElement('gig.xml',null, true);
$gig = $xml->xpath('//gig[#id="'.$_POST['id'].'"]');
$gig->venue = $_POST['venue'];
$gig->month = $_POST['month'];
// etc..
$xml->asXml('gig.xml)'; // save back to file
now if instead all these data points are attributes you can use $gig->attributes()->venue to access it.
There is no need for the loop really unless you are doing multiple updates with one post - you can get at any specific record via an XPAth query. SimpleXML is also a lot lighter and a lot easier to use for this type of thing than DOMDOcument - especially as you arent using the feature of DOMDocument.

You'll want to load the xml file in a domdocument with
<?
$xml = new DOMDocument();
$xml->load("xmlfile.xml");
//find the tags that you want to update
$tags = $xml->getElementsByTagName("GIG");
//find the tag with the id you want to update
foreach ($tags as $tag) {
if($tag->getAttribute("id") == $id) { //found the tag, now update the attribute
$tag->setAttribute("[attributeName]", "[attributeValue]");
}
}
//save the xml
$xml->save();
?>
code is untested, but it's a general idea

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP-Retrieve specific content from multiple pages of a website - php

use this code to fetch only the links from the single website. include_once('simple_html_dom.php'); $html = file_get_html('http://www.couponrani.com/'); // Find all links foreach($html->find('a') as $element) echo $element->href . '<br>';

Related

Get attributes from post data in PHP

Fetch data from site Page by Page & go through sub links

how to get page contents

Fetch database information on a new page without using new documents

Updating the XML file using PHP script

Categories

Resources