I have a pretty large table which I put on the page with a php call
<?php include('7c2dsf12c24-4441e-532ded8-88dsc7-4fsd2c8.txt'); ?>
That file has thousands of TR's and TD's within.
The text file is dynamically created and updated every couple of hours.
Some of the rows have a "featuredRow" class on them, which helps with styling.
However, they appear in a random order in that text file.
I need to sort them so that the featured rows go first. Basically take all the rows, and put all the featuredRows at the top of the table, followed by all the other rows.
I already have javascript code that sorts the table by different td's alphabetically, but since its a front-end sorting, and the table consists of thousands of tr's (the text file is 7mb of text), it is quite a strain on Internet Explorer, if I was to filter it initially (the user expects to wait a long time when reordering the entire table alhpabetically, but he is not expected to wait 30 seconds until everything is ordered right (only 2-3 seconds on chrome is 20-30 seconds on IE)).
Therefore I figured that doing it on the backend, and displaying a reordered text file right away would be better, instead of using the dom, to create huge arrays and lag out the user's browser.
TL/DR
As an example, the file's structure looks like something like this:
<tr><td></td><td></td></tr>
<tr class="featuredRow"><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr class="featuredRow"><td></td><td></td></tr>
<tr class="featuredRow"><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr class="featuredRow"><td></td><td></td></tr>
<tr><td></td><td></td></tr>
I need to take that file and reorder the structure to this:
<tr class="featuredRow"><td></td><td></td></tr>
<tr class="featuredRow"><td></td><td></td></tr>
<tr class="featuredRow"><td></td><td></td></tr>
<tr class="featuredRow"><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
And I do not want to use JS since there are thousands of rows, megabytes of data, and it will take a long time on IE to do it on the front-end.
What is the easiest way to make it work in PHP?
Thank you
P.S.
Here is how the html/php looks like now - jsfiddle.net/1pggwuah
Here is another link on how two trs of the text file look like (there are about 3,000-4,000 of those trs in the text file) jsfiddle.net/a308w8b6
$rows = file_get_contents('/path/to/rows.html');
$rows = explode('<tr', $rows);
sort($rows);
$rows = implode('<tr', $rows);
Demo: https://ideone.com/bITopF
$domd=#DOMDocument::loadHTMLFile('7c2dsf12c24-4441e-532ded8-88dsc7-4fsd2c8.txt');
$masterele=$domd->getElementsByTagName("table")->item(0);
foreach($domd->getElementsByTagName("tr") as $trele){
if($trele->getAttribute("class")!=="featuredRow"){continue;}
$trele->parentNode->removeChild($trele);
$masterele->insertBefore($trele,$masterele->firstChild);
}
echo $domd->saveHTML();
EDIT: the original code would put "featuredRow"'s at the bottom, not the top, sorry, fixed it now (use insertBefore instead of append)
Related
I don't know if this is possible as I'm unable to find how I can do this and I'm very new to PHP but here's an overview of my issue:
I have a script which reads a CSV file. One of the columns contains cells which contain HTML tables. At varying positions within all of the tables there exists a table row which contains <td>Retail</td> and then the price such as <td>$300</td> for example. An example is below which I have formatted so that it's easier for you to read, but this is returned as a continuous string from the CSV file normally:
<table>
<tr>
<td>Designer</td>
<td>Hermes</td>
</tr>
<tr>
<td>Size inch</td>
<td>5.9 x 4.3 x 2.4</td>
</tr>
<tr>
<td>Material</td>
<td>Cotton</td>
</tr>
<tr>
<td>Retail</td>
<td>$300.00</td>
</tr>
<tr>
<td>Made in</td>
<td>France</td>
</tr>
</table>
These tables are then required to have the CAD [Canadian Dollars] retail price added to them. Example below of the desired end result:
<table>
<tr>
<td>Designer</td>
<td>Hermes</td>
</tr>
<tr>
<td>Size inch</td>
<td>5.9 x 4.3 x 2.4</td>
</tr>
<tr>
<td>Material</td>
<td>Cotton</td>
</tr>
<tr>
<td>Retail USD</td>
<td>$300.00</td>
</tr>
<tr>
<td>Retail CAD</td>
<td>$410.00</td>
</tr>
<tr>
<td>Made in</td>
<td>France</td>
</tr>
</table>
I have looked at using substr() but it looks as though you need to specify the length of characters that will be ignored from the start of the string which isn't possible for me here as the data varies.
So therefore my question is whether it's at all possible to specifically split the price out from the string and then append it back in after the </tr> so that the result is as above. If you could point me in the right direction of the functions that I would need to use to achieve this then I would really appreciate it. Please bear in mind I am already using str_replace() to rename Retail to Retail USD and I already have a variable created ready to convert USD price to a CAD price which uses a finance API.
Thank you in advance for any insight you can offer me here.
I have looked at using substr() but it looks as though you need to specify the length of characters that will be ignored from the start of the string which isn't possible for me here as the data varies.
So use stripos to find the start of the string you want to replace.
However the more I dig into this, it because a mess very quickly. It would be better to edit the CSV generator rather than trying to mutate your CSV. It would also in an ideal world be better your CSV contained only data and not HTML.
Apologies the following became a large and probably unwieldy answer:
However to do it, you need to isolate this CSV column, into a variable $csvData. Then work with it directly:
$csvData = "<table data from your question>";
$csvData = str_replace("</td>","*!*</td>",$csvData);
//remove all the HTML junk
$csvDataClean = strip_tags($csvData);
// Form an array.
$csvDataArray = explode("*!*",$csvDataClean);
// trim contents of the array.
$csvDataArray = array_map('trim', $csvDataArray);
// remove empty array values.
$csvDataArray = array_filter($csvDataArray);
// build new contents array.
foreach($csvDataArray as $key=>$value){
if($key%2 == 0){
//odd number. Is a content header.
$value = str_replace(" ","_",$value);
$lastHeader = preg_replace("/[^a-z0-9-_]/i","",$value);
}
else {
//even number, it's a value
$csvArray[$lastHeader] = $value;
}
}
//tidy up.
unset($key,$value,$lastHeader,$csvDataArray,$csvDataClean);
print_r($csvArray);
This will now output for you an array of headers and values from your HTML table. You can then easily reference values from this array and then recompile them into an HTML table as nessecary.
Using phpsandbox I can output:
Array
(
[Designer] => Hermes
[Size_inch] => 5.9 x 4.3 x 2.4
[Material] => Cotton
[Retail] => $300.00
[Made_in] => France
)
So you can then take $csvArray['Retail'] and process this value to get the other currency values, and add them to this array. Then you can run this array through another process to rebuild a table, to save into the CSV (although this doesn't come recommended, it's better to save the arraty as a CSV itself, but I don't know your requirements).
So:
//whatever system you currently use to get conversion.
$csvArray['Retail_CAD'] = convert_currency($csvArray['Retail']);
$csvArray['Retail_USD'] = convert_currency($csvArray['Retail']);
And now rebuild the HTML form:
foreach($csvArray as $key=>$value){
$csvOutput .= "<tr><td>".str_replace("_"," ",$key)."</td><td>".$value."</td></tr>\n";
}
unset($key,$value);
$csvOutput = "<table>".$csvOutput."</table>";
print_r($csvOutput);
You can also manually delete and readd the Made_in array key if you want to maintain this as the final array value:
//whatever system you currently use to get conversion.
$csvArray['Retail_CAD'] = convert_currency($csvArray['Retail']);
$csvArray['Retail_USD'] = convert_currency($csvArray['Retail']);
....
$value = $csvArray['Made_in'];
unset($csvArray['Made_in']);
$csvArray['Made_in'] = $value;
This is a hacky but quick way of keeping the "made in" column after the new Retail columns added above.
What you pasted here is a html table, not csv.
Anyway, there are several ways to manipulate strings. str_replace() is one of the most basic ones, so you got that already. In your case, you're probably best off using regular expressions. It's like str_replace but much more powerful. There are plenty of tutorials out there.
If you want to do a lot and more complex manipulation of html or xml data, you may want to have a look at XSLT.
I had to deal with a similar scenario once, what I would do is:
1.-Form first your desired output block in a variable $output_block i.e :
<td>Retail USD</td><td>$300.00</td></tr><tr><td>Retail CAD</td><td>$410.00</td>
note: you dont need the firs opening tr tag neither the last closing one cause you already have those on your original output.
2.-find the position of <td>Retail</td>
(use strpos)
3.-Save the substring you have before in a a variable i.e: $first_part
4.-find the position of <td>Made in</td>
5.-Save the substring you have after this in a variable : $last_part
6.- Your final output: $final_output = $firstpart . $output_block . $last_part;
easy cake... ;)
in PHP Unit (using Selenium Server) i'm trying to check if a particular element node in xpath has a certain string value, for instance
<table>
<thead></thead>
<tbody>
<tr>
<th>1</th>
<td>value 1</td>
<td>value 2</td>
<td>value c</td>
</tr>
<tr>
<th>2</th>
<td> </td>
<td> </td>
<td> </td>
</tr>
</tbody>
with the above code, using the xpath
isElementPresent('//table/tbody/tr[last()-0]/td[last()-1]');
it would return true, if i changed tr[last()-0] to tr[last()-1] it would still return true,
naturally, the isElementPresent would be in a loop with the xpath generated in the loop as well (substituting the integer for $i and $j which are used in the for() loop) as as it is, that would be fine, however, what i want to check is that the has nothing in it
using the same html code above, if i change the xpath
isElementPresent('//table/tbody/tr[last()-$i]/td[last()-1 and text()="${nbsp}"]');
you would think that it would return true at //table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"] and false at //table/tbody/tr[last()-1]/td[last()-1 and text()="${nbsp}"] however here is the kicker
using Selenium IDE 1.10.0 Plugin for Firefox to check the xpath by putting it in the Target Box and hitting find (to check that it will locate the xpath when it should, //table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"] doesn't highlight the 2nd last td in the last tr, it highlights the first td in the last tr, as if the xpath was //table/tbody/tr[last()-0]/td[last()-2]
from my experiments, it seems to be treating the xapth like //table/tbody/tr[last()-0]/td[text()="${nbsp}"] which would only be the FIRST instance in which text is a blank space, not good if the 2nd tr was like
<tr>
<th>2</th>
<td>cows are my friends</td>
<td>let's go to my room pig!</td>
<td> </td>
</tr>
and i was to use isElementPresent('//table/tbody/tr[last()-0]/td[last()-1 and text()="${nbsp}"]'); it would still return true as its not looking at last()-1 but last()-0
so my question is, how can i check if a particular element node has a certain string
NOTE 1: i use last()-# cause on this page http://www.w3schools.com/xpath/xpath_syntax.asp it says IE5 and later says [0] is the first not and not [1] like Firefox or Chrome which in a sense makes sense since that's how an array's index works, for full compatibility, i start from the last Node and work backwards which last() would work with IE5 and later and the logic of moving backwards though the nodes should be the same (unless microsoft wants to redefine that logic)
NOTE 2: i am well aware that a simple fix is to add title or id attributes to the table however the page i'm making the test for was done by someone else, i would like to avoid modifying the page just to suit test cases
NOTE 3: the table i'm testing is populated using a JSON string so my test is when there is no data for the table, is it blank, if not, than the JSON string is adding data to the table when it shouldn't
EDIT 1: seems like ${nbsp} doesn't work in php, only in the Selenium IDE 1.10.0 Plugin seemed to recognize it however inserting a space by holding Alt and typing in 0160 worked just as fine
EDIT 2: for the time being i have added id attributes to the tags to get this working and it works perfectly fine with checking #id=[VALUE] and text()=[VALUE] but it would still be good to get this question answered as while i add id, title and/or class attributes to all my html tags the person who originally made the table i was testing obviously didn't and as i said in NOTE 2, 'i would like to avoid modifying the page just to suit test cases'
#Memor-X Use SimpleXML to load the xml. Then it should be easy to test in phpunit. If I was you I would build in dependency tests to validate the xml structure before testing the contents.
I'm working on a project that requires to convert html email into text. Below is a simplified version of the HTML code:
<table>
<tr>
<td width="10%"></td>
<td width="60%"> test product </td>
<td width="20%">5</td>
<td width="10%"> £50.00 </td>
</tr>
<tr>
<td></td>
<td colspan="3" width="100%"> Project Name: Test Project </td>
</tr>
<tr>
<td width="10%"> </td>
<td colspan="2" width="80%"> Page 1 : 01 New York 1.jpg </td>
<td width="10%"> £0.00 </td>
</tr>
</table>
The expected outcome should look like this in a text file (with columns aligned nicely):
test product 5 £50.00
Project Name: Test Project
Page 1 : 01 New York 1.jpg £0.00
My idea is parsing the HTML content by DOMDocument. Then I will set a default width for the table (i.e.: 100 spaces) then convert the width of each column from % to number of spaces (based on colspan & width attribute of <td> tag). Then I will subtract these column width to strlen of the data in each column to archive the number of spaces I need to pad_right to the string to make everything align vertically.
I have been working that way, hasn't been archived what I want but just wondering if it is stupid or anyone knows a better way please help me out.
Also when it comes to Multibyte languages (Japanese, Korean etc...) I don't think my approach would work because their characters will be bigger than one space and it end up a mess.
Can someone help me out please?
Don't reinvent the wheel. Table rendering is difficult, rendering tables using only text is even more difficult.
To clarify the complexity of a text-based table renderer that offers all the features of HTML, take a look at w3m, which is open source:
these 3000 lines of code are there only to display html tables.
Transform HTML to Text
There are textbased browsers that can be used by command line, like lynx.
You could fwrite your html table into a file, pass that file into the textbased browser and take its output.
Note: textbased browsers are generally used in a shell, which generally displays in monospace. This remains a prerequisite.
lynx and w3m are both available on Windows and you don't need to "install" them, you just need to have the executables and the permission to run them from PHP.
code example:
<?php
$table = '<table><tr><td>foo</td><td>bar</td></tr></table>'; //this contains your table
$html = "<html><body>$table</body></html>";
//write html file
$tmpfname = tempnam(sys_get_temp_dir(), "tblemail");
$handle = fopen($tmpfname, "w");
fwrite($handle, $html);
fclose($handle);
$myTextTable = shell_exec("w3m.exe -dump \"$tmpfname\"");
unlink($tmpfname);
w3m.exe needs to be in your working directory.
(didn't try it)
Render a Text table
If you want a native PHP solution, there's also at least one framework (https://github.com/c9s/CLIFramework) aimed at console applications for PHP which has a table renderer.
It doesn't transform HTML to text, but it helps you build a text formatted table with support for multiline cells (which seems to be the most complicated part).
Using CLIFramework you would need a code like this to render your table:
<?php
require 'vendor/autoload.php';
use CLIFramework\Component\Table\Table;
$table = new Table;
$table->addRow(array(
"test product", "5", "£50.00"
));
$table->addRow(array(
"Project Name: Test Project", "", ""
));
$table->addRow(array(
"Page 1 : 01 New York 1.jpg", "", "£0.00"
));
$myTextTable = $table->render();
The CLIFramework table renderer doesn't seem to support anything similar to "colspan" however.
Here's the documentation for the table component: https://github.com/c9s/CLIFramework/wiki/Using-Table-Component
I have no clue at all.
How do I extract the numeric % data on the right from the link below and display them on my website without updating daily myself? Can a simple PHP + HTML solve my problem?
http://www.mrrebates.com/merchants/all_merchants.asp
Meanwhile, how do I automatically hyperlink the extracted numeric % and display it as a link for that retailer? for example,
1 Stop Florists------------------------- 8% (this 8% should be displayed as hyperlink for that retailer, unfortunately I am too new to have more than 1 hyperlink)
at the same time integrating my referral id (shown below) on to that 8% hyperlink
mrrebates.com?refid=420149
You can use curl to download the page, then use regular expressions to parse it up and print it out in whatever form you want. Here's some PHP code to do it:
<?php
system("curl -v http://www.mrrebates.com/merchants/all_merchants.asp > /tmp/x.txt");
$data = file_get_contents("/tmp/x.txt");
preg_match_all('/<td><a href="([^"]*)".*?<b>([^<]*)<\/b>.*?<td class="r">([^<]*)<\/td>/',
$data, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$site_name = $match[2];
$url = "http://www.mrrebates.com/{$match[1]}";
$percent = $match[3];
print "<a href='$url'>$site_name</a> ";
print "<a href='$url'>$percent</a> <br/>";
}
That'll print out a list of links every time you refresh the page. I have no idea how referral codes work on that site, but I imagine it'll be pretty easy to tack it onto the $url variable.
One caveat here is that every time you refresh your page, it's going to have to load the other site first and parse it so it'll be slow. You could separate out the system("curl...") call into a separate file and only do that once an hour or so if you want to make it go faster. Good luck.
Parsing XHTML is best left to a DOM parser. However, this type of scrape operation is messy business anyway. I will propose another solution and let you piece it together.
View the source of your HTML and find out the beginning and end of your table. Looks like you want this:
<table border="0" width="95%" cellpadding="3" cellspacing="0" style="border: 1px dotted #808080;">
<tr>
<td bgcolor="#FFCC00"><b>Store Name</b></td>
<td width="75" align="center" bgcolor="#FFCC00"><b>Coupons</b></td>
<td width="75" align="right" bgcolor="#FFCC00"><b>Rebate</b></td>
</tr>
And then look for the next occurrence of </table>.
Now, your content is in rows... look for <tr and </tr>.
I'll let you figure it out how to break it down from there.
Now, do actually all of this work... there are lots of functions that can help you. Start with strpos.
This is probably better done with javascript (or at least I have usually tackled problems like this on the client-side), particularly jQuery library.
You want to load the data on that page with something like
$.get("www.mrrebates.com/merchants/allmerchants.asp");
and parse the remaining data to get the info you need (this should be simple enough jQuery will do, tho there are fuller DOM parsers). I'm not sure what you're familiar with so far but it would probably be a lot to describe here. I see the % info is in td with class "r"
Do you have just one referral ID or one for each vender? that will obviously matter
I'm trying to find a Selenium/PHP XPath for matching a table row that contains multiple elements (text, and form elements).
Example:
<table class="foo">
<tr>
<td>Car</td><td>123</td><td><input type="submit" name="s1" value="go"></td>
</tr>
</table>
This works for a single text element:
$this->isElementPresent( "//table/tbody/tr/td[contains(text(), 'Car')]" );
while this does NOT (omitting the /td locator):
$this->isElementPresent( "//table/tbody/tr[contains(text(), 'Car')]" );
and thus, this obviously won't work either for multiple elements:
$this->isElementPresent( "//table/tbody/tr[contains(text(), 'Car')][contains(text(), '123')]" );
Another way to do this would be with getTable( "xpath=//table[#class='foo'].x.y") for each and every row x, column y. Cumbersome, but it worked... mostly. It does NOT return the <input> tag! It will return an empty string for that cell :(
Any ideas?
This XPath expression:
/html/body/table[descendant::td[contains(.,'Car')]]
Note: If you know your schema, don't use a starting // operator. Use string value instead of text node (this way you get the concatenation of all descendant text nodes).
Several paths can be combined with | separator.
Tweak this:
//tr/td[contains(text(), 'Car')]/text() | //tr/td/input[#value="s1"]/#name
you might want to use
//td[contains,'Car'] and td[contains,'123']/ancestor::tr
that will select the tr that contains td which matches the two contains arguments
Try to use View Xpath Plugin in firefox, very useful plugin.
Learn more about Axes in Xpath: http://www.w3schools.com/xpath/xpath_axes.asp
Thanks to knb for some syntax hints.
This is slightly off-topic, but relevant to the search that led me here...
I had a table with [ name | value ] cells. I needed to get value from the row with 'name' preceding it.
(fake example, but every link I was looking for had the same text and no IDs - the point is that the context information was in a neighboring cell)
<table id="options"><tbody>
<tr>
<td>other</td>
<td>edit</td>
</tr>
<tr>
<td>this label</td>
<td>edit</td> <!-- I want this button -->
</tr>
<tr>
<td>other</td>
<td>edit</td>
</tr>
</tbody></table>
I could retrieve the button I wanted like this, using nested [[]] conditions:
//table[#id='options']/tbody/tr[td[contains(text(), 'this label')]]/td[2]/a
"get the "a" that is in a row that contains another cell with the text I'm looking for"
I think this sort of task might be a common case, so I'm posting it here FYI
In my problem, I had a list of products where it was identified by a unique SKU/catalog combination. If I wanted to add that product to a cart, I chose it by SKU and catalog.
Using foob.ar's example:
//table[#class='foo']/tr[td[contains(text(), 'Car')] and td[contains(., '123')]]
You can combine it with dman's solution for choosing a specific element/column within that row
//table[#class='foo']/tr[td[contains(text(), 'Car')] and td[contains(., '123')]]//input[#name='s1']
Edit:
The solution above works if I was only looking for those two values in any of the columns. If you want to find a value relative to a specific column, I had to modify it a bit
//table[#class='foo']/tr[td[position()=1 and contains(text(), 'Car')] and td[position()=2 and contains(text(), '123')]]//input[#name='s1']