How to make models from external table data PHP and Laravel? - php

I am currently trying to pull some data from an external website in order to create models for my website. I am able to get the data from the table that I want, but have not been able to figure out exactly how to manipulate the data to structure my models. For each row in the table, there is certain data that I want to extract. As it currently stands, I am creating a player model with all of the player names, but now need to figure out how to attach the data that I want from each row to each player. Here is my code so far:
$dom = new DOMDocument();
$html = file_get_contents('https://www.baseball-reference.com/register/team.cgi?id=41270199');
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$table = $dom->getElementByID('team_batting');
$stats = $table->getElementsByTagName("td");
for ($i = 0; $i < $stats->length; $i++) {
// get player name
$attr = $stats->item($i)->getAttribute('data-stat');
if ($attr != 'player') {
continue;
}
$names[] = $stats->item($i)->textContent;
}
foreach($names as $name) {
$player = new Player(['name' => $name]);
echo $player;
echo '<br>';
}
So how would I get certain table data, like age, H, HR, and other things, and appropriately attach this data to the correct player model?
EDIT/UPDATE:
$dom = new DOMDocument();
$html = file_get_contents('https://www.baseball-reference.com/register/team.cgi?id=41270199');
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$table = $dom->getElementByID('team_batting');
$rows = $table->getElementsByTagName("tr");
for($i = 0; $i < $rows->length; $i++) {
$stats = $table->getElementsByTagName("td");
for($i = 0; $i < $stats->length; $i++) {
$name = $stats->item($i)->getAttribute('player');
$age = $stats->item($i)->getAttribute('age');
$names[] = $stats->item($i)->textContent;
$ages[] = $stats->item($i)->textContent;
dd($ages);
}
}

Related

Got stuck parsing tabular content from a website

I've written a script in PHP to get the tabular data from a webpage. When I execute my script I can get them in a single column. However, I wish to parse them as a list, as in how they look like in that webpage.
Website link
To be clearer:
My current output are like:
978
EMU
EUR
1
118.2078
36
Australija
AUD
1
73.1439
My expected output are like:
['978', 'EMU', 'EUR', '1', '118.2078']
['36', 'Australija', 'AUD', '1', '73.1439']
['124', 'Kanada', 'CAD', '1', '77.7325']
['156', 'Kina', 'CNY', '1', '14.6565']
['191', 'Hrvatska', 'HRK', '1', '15.9097']
This is my try so far:
<?php
$url = "http://www.nbs.rs/kursnaListaModul/srednjiKurs.faces?lang=lat";
$dom = new DomDocument;
$dom->loadHtmlFile($url);
$xpath = new DomXPath($dom);
$rowData = array();
foreach ($xpath->query('//tbody[#id="index:srednjiKursList:tbody_element"]//tr') as $node) {
foreach ($xpath->query('td', $node) as $cell) {
$rowData[] = $cell->nodeValue;
}
}
foreach($rowData as $rows){
echo $rows . "<br/>";
}
?>
You are adding each element one at a time to the output array, you probably wanted to build up a row at a time and output that...
$rowData = array();
foreach ($xpath->query('//tbody[#id="index:srednjiKursList:tbody_element"]//tr') as $node) {
$row = array();
foreach ($xpath->query('td', $node) as $cell) {
$row[] = $cell->nodeValue;
}
$rowData[] = $row;
}
foreach($rowData as $rows){
print_r($rows); // Format the data as needed
}
Try this.
$htmlContent = file_get_contents("http://www.nbs.rs/kursnaListaModul/srednjiKurs.faces?lang=lat");
$DOM = new DOMDocument();
$DOM->loadHTML($htmlContent);
$Header = $DOM->getElementsByTagName('th');
$Detail = $DOM->getElementsByTagName('td');
//#Get header name of the table
foreach($Header as $NodeHeader)
{
$aDataTableHeaderHTML[] = trim($NodeHeader->textContent);
}
//#Get row data/detail table without header name as key
$i = 0;
$j = 0;
foreach($Detail as $sNodeDetail)
{
$aDataTableDetailHTML[$j][] = trim($sNodeDetail->textContent);
$i = $i + 1;
$j = $i % count($aDataTableHeaderHTML) == 0 ? $j + 1 : $j;
}
//print_r($aDataTableDetailHTML)
//#Get row data/detail table with header name as key and outer array index as row number
for($i = 0; $i < count($aDataTableDetailHTML); $i++)
{
for($j = 0; $j < count($aDataTableHeaderHTML); $j++)
{
#$aTempData[$i][$aDataTableHeaderHTML[$j]] = $aDataTableDetailHTML[$i][$j];
}
}
$aDataTableDetailHTML = $aTempData; unset($aTempData);
print_r($aDataTableDetailHTML);

In PHP(Codeigniter) After inserting few data in Mongodb insertion is getting slow

In Mongodb after inserting few data insertion is getting slow, i am using batch_insert for insertion. While inserting i need to check some conditions also.
To insert 20k data itself takes more than 1 hour.
In temporary_table i am having 1L data
$batchSize = 20;
$documents = array();
$count = count($pending_contacts_data);
$count =1;
$temporary_data = array();
for($i=0;$i<$count;$i++){
$multiple_temporary_data = $this->mongo_db->select('*')->where(array('contact_id'=>(int)4,'status'=>1))->limit(10000)->get('temporary_table');
$temporary_data = array_merge($temporary_data,$multiple_temporary_data);
}
$count1 = count($temporary_data);
$documents = array();
for ($i=0; $i<$count1; $i++)
{
$obj_id[] = $temporary_data[$i]->_id;
$test_email = $this->mongo_db->select('*')->where(array('encrypted_email'=>$temporary_data[$i]->encrypted_email))->get('email_table');
if(empty($test_email)){
$document = array('email_id'=>$temporary_data[$i]->email,
'encrypted_email'=>$temporary_data[$i]->encrypted_email,
'encrypted_key'=>$temporary_data[$i]->encrypted_key,
'encrypted_iv'=>$temporary_data[$i]->encrypted_iv,
'status'=>(int)1,
'opend_supression_status'=>''
);
array_push($documents, $document);
if ((($i % $batchSize) === 0)) {
$insert = $this->mongo_db->batch_insert('opend_contacts_email_new1',$documents);
$update_temporary =$this->mongo_db->where_in('_id',$obj_id)->set(array('status'=>13))->update_all('temporary_data');
$documents = array();
}
}
}
What went wrong with your code is after the 500th element each new item acted as single batch insert. And $documents are just array of one element for the rest of loop. U should have used the modulus operator %
$count = 20000;
$batchSize = 50;
for ($i=1; $i<=$count; ++$i){
$test_email = $this->mongo_db->select('*')->where(array('email_id'=>$temporary_data[$i]->email_id))->get('email_table');
if(empty($test_email)){
$document = array('email_id'=>$temporary_data[$i]->email,
'encrypted_email'=>$temporary_data[$i]->encrypted_email,
'encrypted_key'=>$temporary_data[$i]->encrypted_key,
'encrypted_iv'=>$temporary_data[$i]->encrypted_iv,
'status'=>(int)1,
'suppression_status'=>''
);
array_push($documents, $document);
if (($i % $batchSize) === 0) {
$insert = $this->mongo_db->batch_insert('email_table',$documents);
$documents = array();
}
}
}

how to read only part of an xml file with php xmlreader

I have an RSS xml file that is pretty large, with more than 700 nodes.
I am using XMLReader Iterator library to parse it and display the results as 10 per page.
This is my sample code for parsing xml:
<?php
require('xmlreader-iterators.php');
$xmlFile = 'http://www.example.com/rss.xml';
$reader = new XMLReader();
$reader->open($xmlFile);
$itemIterator = new XMLElementIterator($reader, 'item');
$items = array();
foreach ($itemIterator as $item) {
$xml = $item->asSimpleXML();
$items[] = array(
'title' => (string)$xml->title,
'link' => (string)$xml->link
);
}
// Logic for displaying the array values, based on the current page.
// page = 1 means $items[0] to $items[9]
for($i = 0; $i <= 9; $i++)
{
echo ''.$items[$i]['title'].'<br>';
}
?>
But the problem is that, for every page, i am parsing the entire xml file and then just displaying the corresponding page results, like: if the page is 1, displaying the 1 to 10 nodes, and if the page is 5, displaying 41 to 50 nodes.
It is causing delay in displaying data. Is it possible to read just the nodes corresponding to the requested page? So for the first page, i can read nodes from 1 to 10 positions, instead of parsing all the xml file and then display first 10 nodes. In other words, can i apply a limit while parsing an xml file?
I came across this answer of Gordon that addresses a similar question, but it is using SimpleXML, which is not recommended for parsing large xml files.
use array_splice to extract the portion of array
require ('xmlreader-iterators.php');
$xmlFile = 'http://www.example.com/rss.xml';
$reader = new XMLReader();
$reader->open($xmlFile);
$itemIterator = new XMLElementIterator($reader, 'item');
$items = array();
$curr_page = (0 === (int) $_GET['page']) ? 1 : $_GET['page'];
$pages = 0;
$max = 10;
foreach ($itemIterator as $item) {
$xml = $item->asSimpleXML();
$items[] = array(
'title' => (string) $xml->title,
'link' => (string) $xml->link
);
}
// Take the length of the array
$len = count($items);
// Get the number of pages
$pages = ceil($len / $max);
// Calculate the starting point
$start = ceil(($curr_page - 1) * $max);
// return the portion of results
$arrayItem = array_slice($items, $start, $max);
for ($i = 0; $i <= 9; $i ++) {
echo '' . $arrayItem[$i]['title'] . '<br>';
}
// pagining stuff
for ($i = 1; $i <= $pages; $i ++) {
if ($i === (int) $page) {
// current page
$str[] = sprintf('<span style="color:red">%d</span>', $i);
} else {
$str[] = sprintf('%d', $i, $i);
}
}
echo implode('', $str);
Use cache in this case, since you cannot parse partially an XML.
Check this
<?php
if($_GET['page']!=""){
$startPagenew = $_GET['page'];
$startPage = $startPagenew-1;
}
else{
$startPage = 0;
}
$perPage = 10;
$currentRecord = 0;
$xml = new SimpleXMLElement('http://sports.yahoo.com/mlb/teams/bos/rss.xml', 0, true);
echo $startPage * $perPage;
foreach($xml->channel->item as $key => $value)
{
$currentRecord += 1;
if($currentRecord > ($startPage * $perPage) && $currentRecord < ($startPage * $perPage + $perPage)){
echo "$value->title";
echo "<br>";
}
}
//and the pagination:
//echo $currentRecord;
for ($i = 1; $i <= ($currentRecord / $perPage); $i++) {
echo("<a href='xmlpagination.php?page=".$i."'>".$i."</a>");
} ?>
Updated
Check this Link
http://www.phpclasses.org/package/5667-PHP-Parse-XML-documents-and-return-arrays-of-elements.html
You can use Dom and Xpath. It should be much faster, since Xpath allows you to select nodes by their position in a list.
<?php
$string = file_get_contents("http://oar.icrisat.org/cgi/exportview/subjects/s1=2E2/RSS2/s1=2E2.xml");
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadXML($string);
$string = "";
$xpath = new DOMXPath($dom);
$channel = $dom->getElementsByTagName('channel')->item(0);
$numItems = $xpath->evaluate("count(item)", $channel);
// get your paging logic
$start = 10;
$end = 20;
$items = $xpath->evaluate("item[position() >= $start and not(position() > $end)]", $channel);
$count = $start;
foreach($items as $item) {
print_r("\r\n_____Node number $count ");
print_r( $item->nodeName);
$childNodes = $item->childNodes;
foreach($childNodes as $childNode) {
print_r($childNode->nodeValue);
}
$count ++;
}

Is it possible to query the first 5 images with DOMDocument?

Is it possible to query the first 5 images with DOMDocument?
$dom = new DOMDocument;
$list = $dom->query('img');
With XPath You can fetch all images like this:
$xpath = new DOMXPath($dom);
$list = $xpath->query('//img');
Then you limit the results by only iterating over the first five.
for ($i = 0, $n = min(5, $list->length); $i < $n; ++$i) {
$node = $list->item(0);
}
XPath is very versatile thanks to its expression language. However, in this particular case, you may not need all that power and a simple $list = $dom->getElementsByTagName('img') would yield the same result set.
You can use getElementsByTagName to build and array of images:
$dom = new DOMDocument();
$dom->loadHTML($string);
$images = $dom->getElementsByTagName('img');
$result = array();
for ($i=0; $i<5; $i++){
$node = $images->item($i);
if (is_object( $node)){
$result[] = $node->ownerDocument->saveXML($node);
}
}

php multidimensional array iteration issue

Consider having 4 different RSS news feeds URL's
$urls[0]['news'] = "url1";
$urls[1]['news'] = "url2";
$urls[2]['news'] = "url3";
$urls[3]['news'] = "url4";
The following function should get 4 news titles from each of the url's
function getRssFeeds($urls,$type){
//Prepare XML objects
$xmls[$type] = array();
//Prepare item objects
$items[$type] = array();
//Prepare news titles/links arrays
$article_titles[$type] = array();
$encoded_titles[$type] = array();
$article_links[$type] = array();
//Fill XML objects
for($i=0; $i<4; $i++){
$xmls[$i][$type] = simplexml_load_file($urls[$i][$type]);
//Prepare news items
$items[$i][$type] = $xmls[$i][$type]->channel->item;
for($i=0; $i<4; $i++){
$article_titles[$i][$type] = $items[$i][$type]->title;
$encoded_titles[$i][$type] = iconv("UTF-8","windows-1251",$article_titles[$i][$type]);
}
//$article_links[$type][$i] = $items[$type][$i]->link;
}
return $encoded_titles;
}
After using the following to get the values:
type='';
function printRssFeed($urls,$type){
$titles = getRssFeeds($urls,$type);
foreach($titles as $title)
{
echo $title[$type]."<hr/>";
}
}
I get undefined offset error. If I remove the inner for loop of the getRssFeeds() function I get only 1 new title from each URL.
in this code you are resetting $i to 0 in your inner for loop.
for($i=0; $i<4; $i++){
$xmls[$i][$type] = simplexml_load_file($urls[$i][$type]);
//Prepare news items
$items[$i][$type] = $xmls[$i][$type]->channel->item;
for($i=0; $i<4; $i++){
$article_titles[$i][$type] = $items[$i][$type]->title;
$encoded_titles[$i][$type] = iconv("UTF-8","windows-1251",$article_titles[$i][$type]);
}
//$article_links[$type][$i] = $items[$type][$i]->link;
}
try changing your inner for loop variable to a different one. Also when you define your arrays it seems that you are not following the same structure.
$xmls[$i][$type] does not = your original instantiation of $xmls[$type] = array();
this is true for all your other arrays.
so I think your array structure is off because you add a top level of $type and then when you iterate you use a $i as you top level key.
try to remove the instantiations of the arrays in the beginning
function getRssFeeds($urls,$type){
//Fill XML objects
for($i=0; $i<4; $i++){
$xmls[$i][$type] = simplexml_load_file($urls[$i][$type]);
//Prepare news items
$items[$i][$type] = $xmls[$i][$type]->channel->item;
for($i=0; $i<4; $i++){
$article_titles[$i][$type] = $items[$i][$type]->title;
$encoded_titles[$i][$type] = iconv("UTF-8","windows-1251",$article_titles[$i][$type]);
}
//$article_links[$type][$i] = $items[$type][$i]->link;
}
return $encoded_titles;
}
try this in your inner for loop
$j = 0;
for($j=0; $j<4; $j++){
$article_titles[$j][$type] = $items[$i][$type]->title;
$encoded_titles[$j][$type] = iconv("UTF-8","windows-1251",$article_titles[$j][$type]);
}

Categories