Fetch data from site Page by Page & go through sub links - php

URL : http://www.sayuri.co.jp/used-cars
Example : http://www.sayuri.co.jp/used-cars/B37753-Toyota-Wish-japanese-used-cars
Hey guys , need some help with one of my personal projects , I've already wrote the code to fetch data from each single car url (example) and post on my site
Now i need to go through the main url : sayuri.co.jp/used-cars , and :
1) Make an array / list / nodes of all the urls for all the single cars in it , then run my internal code for each one to fetch data , then move on to the next one
I already have the code to save each url into a log file when completed (don't think it will be necessary if it goes link by link without starting from the top but will ensure no repetition.
2) When all links are done for the page , it should move to the next page and do the same thing until the end ( there are 5-6 pages max )
I've been stuck on this part since last night and would really appreciate any help . Thanks
My code to get data from the main url :
$content = file_get_contents('http://www.sayuri.co.jp/used-cars/');
// echo $content;
and
$dom = new DOMDocument;
$dom->loadHTML($content);
//echo $dom;

I'm guessing you already know this since you say you've gotten data from the car entries themselves, but a good point to start is by dissecting the page's DOM and seeing if there are any elements you can use to jump around quickly. Most browsers have page inspection tools to help with this.
In this case, <div id="content"> serves nicely. You'll note it contains a collection of tables with the required links and a <div> that contains the text telling us how many pages there are.
Disclaimer, but it's been years since I've done PHP and I have not tested this, so it is probably neither correct or optimal, but it should get you started. You'll need to tie the functions together (what's the fun in me doing it?) to achieve what you want, but these should grab the data required.
You'll be working with the DOM on each page, so a convenience to grab the DOMDocument:
function get_page_document($index) {
$content = file_get_contents("http://www.sayuri.co.jp/used-cars/page:{$index}");
$document = new DOMDocument;
$document->loadHTML($content);
return $document;
}
You need to know how many pages there are in total in order to iterate over them, so grab it:
function get_page_count($document) {
$content = $document->getElementById('content');
$count_div = $content->childNodes->item($content->childNodes->length - 4);
$count_text = $count_div->firstChild->textContent;
if (preg_match('/Page \d+ of (\d+)/', $count_text, $matches) === 1) {
return $matches[1];
}
return -1;
}
It's a bit ugly, but the links are available inside each <table> in the contents container. Rip 'em out and push them in an array. If you use the link itself as the key, there is no concern for duplicates as they'll just rewrite over the same key-value.
function get_page_links($document) {
$content = $document->getElementById('content');
$tables = $content->getElementsByTagName('table');
$links = array();
foreach ($tables as $table) {
if ($table->getAttribute('class') === 'itemlist-table') {
// table > tbody > tr > td > a
$link = $table->firstChild->firstChild->firstChild->firstChild->getAttribute('href');
// No duplicates because they just overwrite the same entry.
$links[$link] = "http://www.sayuri.co.jp{$link}";
}
}
return $links;
}
Perhaps also obvious, but these will break if this site changes their formatting. You'd be better off asking if they have a REST API or some such available for long term use, though I'm guessing you don't care as much if it's just a personal project for tinkering.
Hope it helps prod you in the right direction.

Related

How can I copy two tables from someones website to my website? Wordpress

I'm attempting to copy two tables from a specific website address and mimic them onto my website, but I'm unsure how to go about it.
I've looked into Simple-HTML-Dom, and I actually got it to work locally on my machine - to a certain extent. Although I have a couple of problems with this approach.
Although I got it to work, it wasn't perfect. I also dragged over some random text, and I copied across 5 tables - instead of the intended 2.
I would rather use a different method - without using 3rd party scripts?
The tables that I'm attempting to copy can be located here:
https://www.gov.uk/government/publications/rates-and-allowances-monthly-euro-conversion-rates-for-calculating-duty/monthly-euro-conversion-rates-for-calculating-duty
I only want the tables containing 2017 and 2016 data (with the headings).
Rates for 2017
Table Headings
Table Contents
Rates for 2016
Table Headings
Table Contents
This would be for my Wordpress website.
I don't know if something could be programmed in PHP to achieve this, or another library that Wordpress natively supports without the use of SQL / 3rd party scripts or anything like that.
Thank You
////
--- HUGE UPDATE ---
Ok, so I've been playing around and trying to debug the code, and I'm almost there!!
I've managed to copy over the first table only (second table will be easy).
The final part is trying to add classes to the createElements line? Is that possible??
Here's my almost finished code.
<h1 class="roeheader">Monthly Industrial Euro Rate:</h1>
[insert_php]
$url = "https://www.gov.uk/government/publications/rates-and-allowances-monthly-euro-conversion-rates-for-calculating-duty/monthly-euro-conversion-rates-for-calculating-duty";
$html = file_get_contents($url);
libxml_use_internal_errors(true);
$doc = new \DOMDocument();
if($doc->loadHTML($html))
{
$result = new \DOMDocument();
$result->formatOutput = true;
$table = $result->appendChild($result->createElement("table"));
$thead = $table->appendChild($result->createElement("thead"));
$tbody = $table->appendChild($result->createElement("tbody"));
$xpath = new \DOMXPath($doc);
$newRow = $thead->appendChild($result->createElement("tr"));
foreach($xpath->query("//table[1]/thead/tr/th[position()>0]") as $header)
{
$newRow->appendChild($result->createElement("th", trim($header->nodeValue)));
}
foreach($xpath->query("//table[1]/tbody/tr") as $row)
{
$newRow = $tbody->appendChild($result->createElement("tr"));
foreach($xpath->query("./td[position()>0 and position()<5]", $row) as $cell)
{
$newRow->appendChild($result->createElement("td", trim($cell->nodeValue)));
}
}
echo $result->saveXML($result->documentElement);
}
[/insert_php]
I'm trying to change this line:
$newRow->appendChild($result->createElement("td", trim($cell->nodeValue)));
To this:
$newRow->appendChild($result->createElement("td class"roe"", trim($cell->nodeValue)));
But it's not working?? The page just refuses to load? I guess it's because of the double quotation.
Thanks
Now you can inspect the tables and see the html and css behind them and copy those to your website to make them exactly the same. A good WordPress plugin for making tables is https://wordpress.org/plugins/tablepress/. Please let me know if you have further information.

PHP-Retrieve specific content from multiple pages of a website

What I want to accomplish might be a little hardcore, but I want to know if it's possible:
The question:
My question is the same as PHP-Retrieve content from page, but I want to use it on multiple pages.
The situation:
I'm using a website about TV shows. All the TV shows have the same URL and then the name of the show:
http://bierdopje.com/shows/NAME_OF_SHOW
On every show page, there's a line which tells you if the show is cancelled or still running. I want to retrieve that line to make an overview of the cancelled shows (the website only supports an overview of running shows, so I want to make an extra functionality).
The real question:
How can I tell DOM to retrieve all the shows and check for the status of the show?
(http://bierdopje.com/shows/*).
The Note:
I understand that this process may take a while because it is reading the whole website (or is it too much data?).
use this code to fetch only the links from the single website.
include_once('simple_html_dom.php');
$html = file_get_html('http://www.couponrani.com/');
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
I use phpquery to fetch data from a web page, like jQuery in Dom.
For example, to get the list of all shows, you can do this :
<?php
require_once 'phpQuery/phpQuery/phpQuery.php';
$doc = phpQuery::newDocumentHTML(
file_get_contents('http://www.bierdopje.com/shows')
);
foreach (pq('.listing a') as $key => $a) {
$url = pq($a)->attr('href'); // will give "/shows/07-ghost"
$show = pq($a)->text(); // will give "07 Ghost"
}
Now you can process all shows individualy, make a new phpQuery::newDocumentHTML for each show and with an selector extract the information you need.
Get the status of a show
$html = file_get_contents('http://www.bierdopje.com/shows/alcatraz');
$doc = phpQuery::newDocumentHTML($html);
$status = pq('.content>span:nth-child(6)')->text();

How to show a users last visited pages in wp / buddypress?

im trying to figure out how i can show the last 3-5 or so pages within my site a person has visited. I did some searching, and I couldn't find a WP plugin that does so, if anyone knows of one, please point me in that direction :) if not, I'll have to write it from scratch, and thats where i'll need the help.
I've been trying to understand the DB and how it works. I'm assuming that this is where the magic will happen, with PHP, unless there is a javascript option using cookies to do it.
Im open to all ideas :P & Thank you
If i were to code such a plugin, i'd use the session cookies to populate an array via array_unshift() and array_pop(). it'd be as simple as :
$server_url = "http://mydomain.com";
$current_url = $server_url.$_SERVER['PHP_SELF'];
$history_max_url = 5; // change to the number of urls in the history array
//Assign _SESSION array to variable, create one if empty ::: Thanks to Sold Out Activist for the explanation!
$history = (array) $_SESSION['history'];
//Add current url as the latest visit
array_unshift($history, $current_url);
//If history array is full, remove oldest entry
if (count($history) > $history_max_url) {
array_pop($history);
}
//update session variable
$_SESSION['history']=$history;
Now i've coded this on the fly. There might be syntax errors or typos. If such a mistake appears, just put a notice and i'll modify it. The purpose of this answer is mostly to make a proof of concept. You can adapt this to your liking. Please note that i assume that session_start() is already in your code.
Hope it helps.
===============
Hey! Sorry about the late answer, i was out of town for a couple of days! :)
This addon is to answer your request for a print out solution with LI tags
Here's what i'd do :
print "<ol>";
foreach($_SESSION['history'] as $line) {
print "<li>".$line.</li>";
}
print "</ol>";
Simple as that. you should read the foreach loop here : http://www.php.net/manual/en/control-structures.foreach.php
As for the session_start();, put it before you use any $_SESSION variables.
Hope it helped! :)
I'm going to update and translate the code above for WordPress 5+ because the original question has the wordpress tag. Note that you don't need session_start() anywhere.
Here it goes, add the code below to your singular.php template (or single.php + page.php templates, depending on what you need):
/**
* Store last visited ID (WordPress ID)
*/
function so7035465_store_last_id() {
global $post;
$postId = $post->ID; // or get the post ID from your template
$historyMaxUrl = 3; // number of URLs in the history array
$history = (array) $_SESSION['history'];
array_unshift($history, $postId);
if (count($history) > $historyMaxUrl) {
array_pop($history);
}
$_SESSION['history'] = $history;
}
// Display latest viewed posts (or pages) wherever you want
echo '<ul>';
foreach ($_SESSION['history'] as $lastViewedId) {
echo '<li>' . get_permalink($lastViewedId) . '</li>';
}
echo '</ul>';
You can also store latest viewed custom post types (CPT) by placing the so7035465_store_last_id() function in your single-cpt.php template.
You can also add it to a hook or inject it in your template as an action, but that is beyond the scope of this question.

Dynamically generating page links for a CMS

I've searched far and wide and every CMS tutorial out there either doesn't explain this at all or gives you a huge chunk of code without explaining how it works. Even on stack overflow I can't find anything close to the answer, though I'd be okay with eating my words if someone could point me to the answer.
I am using PHP and mysql for this project.
I am building a CMS. Its extremely simple and I understand every concept I think I'll need except how to dynamically generate pages and page links. The way I want to do it is by having a database table that stores the name of a page and the main content of the page. That's all. Then I'd just call a script to pull the main content of a page into whatever page I happen to call. No big deal, right? Wrong.
Here's the problem. If I were to do this then I'd have to create a file for every page I want to create that calls the script that pulls the content from the correct database row. So I could add all sorts of page names and contents into the table but I don't know how to call them without manually creating new files each time I want to link to a new page.
Ideally there'd be a script that creates links to pages based on the page name row of the DB table as the pages are created. But how do you get those links with the ?=pageName at the end? If I just knew how that worked then I could figure the rest out.
UPDATE
The second answer really confirmed everything I thought I had to do but there is one catch. My plan now is to split up all the code into a series of functions and either include or require them in different templates that will be used to format the way pages are displayed. I need one look for the home page and one other design for the rest of the pages. I'm thinking that I'll have a function that says if ID is 0 then call this page template.php else call this other template file.php. But how do I pass the required variables to these new files? Do I just include the index.PHP page in them?
Bill your actually on the right track. Almost all web software today does extensive URL processing. Traditionally you would have php pages on your web root and then utilize the query string in the URL to refine the page's output. You have already arrived at why this might not be desired. So the popular alternative is the Front Controller design pattern. Basically we funnel every request to your index.php page and then route the request to internal pages or apps outside the web root. This can get complicated fast and everybody seems to implement this pattern in unique ways.
We can utilize this pattern without the routing by simply putting our app in the index page. The script below shows an example of what your trying to do in the simplest of ways. We basically have one page with our script. We can request the virtual pages by changing the id query string in our url. For example www.demo.net/?id=0 can be utilized as an index to your site. This should be the same as www.demo.net without the 'id' query. Just keep solving those problems one by one even if you don't know what the problem is. Once you start looking at other peoples code, then you can start seeing how other people solved the same problems you have.
The solution below will get you started, but then what do you do when you want an admin page? How do you authenticate the user? Do you duplicate alot of the code for yet another page? If your serious about your CMS then your going to want to implement some kind of framework underneath it. A framework to process the url, route to your application, load configuration files, and probably manage your database connection. Yea it gets complicated, but not if you solve each problem one at a time. Utilize classes or functions to share code to start. At the very least include a common "bootstrap" file at the top of your page to initialize common functionality such as a database connection. Read Stack Overflow just to keep up with whats going on. You can learn alot of terminology and probably find some answers to questions you didn't even know you wanted to ask.
Below assume we have a table with the following fields:
page_id
page_name
page_title
page_body
<?php
//<--------Move outside of web root-------------->
define('DB_HOST', 'localhost');
define('DB_USER', 'cms');
define('DB_PASS', 'changeme');
define('DB_DB', 'cms');
define('DB_TABLE', 'cms_pages');
//<---------------------------------------------->
//Display errors for development testing
ini_set('display_errors','On');
//Get the requested page id
if(isset($_GET['id']))
{
$id = $_GET['id'];
}
else
{
//Make page id '0' an index page to catch all
$id = 0;
}
//Establish a connection to MySQL
$conn = mysql_connect(DB_HOST,DB_USER,DB_PASS) or die(mysql_error());
//Select the database we will be querying
mysql_select_db(DB_DB, $conn) or die(mysql_error());
//Lets just grab the whole table
$sql = "SELECT * FROM ".DB_TABLE;
$resultset = mysql_query($sql, $conn) or die(mysql_error());
//The Select Query succeeded, but returned 0 result.
if (mysql_num_rows($resultset)==0)
{
echo "<pre>Add some Pages to my CMS</pre>";
exit;
}
//This is our target array we need to fill with arrays of pages
$result = array();
//Convert result into an array of associative arrays
while($row = mysql_fetch_assoc($resultset))
{
$result[] = $row;
}
//We now have all the information needed to build our app
//Page name - Short name for buttons, etc.
$name = "";
//Page title - The page content title
$title = "";
//Page body - The content you have stored in a table
$body = "";
//Page navigation - Array of formatted links
$nav = array();
//Process all pages in one pass
foreach($result as $row)
{
//Logic to match the requested page id
if($row['page_id'] == $id)
{
//Requested Page
$name = $row['page_name'];
$title = $row['page_title'];
$body = $row['page_body'];
$page = "<b>$name</b>";
}
else
{
//Not the requested page
$page = $row['page_name'];
}
//Build the navigation array preformatted with list items
$url = "./?id=" . $row['page_id'];
$nav[] = "<li>$page</li>";
}
?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>SimpleCMS | <?php echo $title; ?></title>
</head>
<body>
<div>
<div id="navigation" style="float:left;">
<ul>
<?php
foreach($nav as $item)
{
echo $item;
}
?>
</ul>
</div>
<div id="content"><?php echo $body;?></div>
</div>
</body>
</html>
I think you need to read about $_GET.
I also recommend a decent PHP book. Forget online tutorials; they are (for the most part) utterly useless.

Fetch database information on a new page without using new documents

I'm working on a page where I've listed some entries from a database. Although, because the width of the page is too small to fit more on it (I'm one of those people that wants it to look good on all resolutions), I'm basically only going to be able to fit one row of text on the main page.
So, I've thought of one simple idea - which is to link these database entries to a new page which would contain the information about an entry. The problem is that I actually don't know how to go about doing this. What I can't figure out is how I use the PHP code to link to a new page without using any new documents, but rather just gets information from the database onto a new page. This is probably really basic stuff, but I really can't figure this out. And my explanation was probably a bit complicated.
Here is an example of what I basically want to accomplish:
http://vgmdb.net/db/collection.php?do=browse&ltr=A&field=&perpage=30
They are not using new documents for every user, they are taking it from the database. Which is exactly what I want to do. Again, this is probably a really simple process, but I'm so new to SQL and PHP coding, so go easy on me, heh.
Thanks!
<?php
// if it is a user page requested
if ($_GET['page'] == 'user') {
if (isset($_GET['id']) && is_numeric($_GET['id'])) {
// db call to display user WHERE id = $_GET['id']
$t = mysql_fetch_assoc( SELECT_QUERY );
echo '<h1>' . $t['title'] . '</h1>';
echo '<p>' . $t['text'] . '</p>';
} else {
echo "There isn't such a user".
}
}
// normal page logic goes here
else {
// list entries with links to them
while ($t = mysql_fetch_assoc( SELECT_QUERY )) {
echo '<a href="/index.php?page=user&id='. $t['id'] .'">';
echo $t['title'] . '</a><br />';
}
}
?>
And your links should look like: /index.php?page=user&id=56
Note: You can place your whole user page logic into a new file, like user.php, and include it from the index.php, if it turns out that it it a user page request.
Nisto, it sounds like you have some PHP output issues to contend with first. But the link you included had some code in addition to just a query that allows it to be sorted alphabetically, etc.
This could help you accomplish that task:
www.datatables.net
In a nutshell, you use PHP to dynamically build a table in proper table format. Then you apply datatables via Jquery which will automatically style, sort, filter, and order the table according to the instructions you give it. That's how they get so much data into the screen and page it without reloading the page.
Good luck.
Are you referring to creating pagination links? E.g.:
If so, then try Pagination - what it is and how to do it for a good walkthrough of how to paginate database table rows using PHP.

Categories