I'm parsing out an HTML table and building an array based on the row values. My problem is the associative keys that are returned have a bit of white space at the end of them giving me results like this:
Array ( [Count ] => 6 [Class ] => 30c [Description] => Conformation Model (Combined 30,57) )
So a line like this:
echo $myArray['Count'];
or
echo $myArray['Count '];
Gives me a blank result.
for now I've got a pretty hacky work around going...
foreach($myArray as $row){
$count = 0;
foreach($row as $info){
if($count == 0){
echo 'Count:' . $info;
echo '<br>';
}
if($count == 1){
echo ' Class:' . $info;
echo '<br>';
}
if($count == 2){
echo ' Description:' . $info;
echo '<br>';
}
$count++;
}
}
The function I'm using to parse the table I found here:
function parseTable($html)
{
// Find the table
preg_match("/<table.*?>.*?<\/[\s]*table>/s", $html, $table_html);
// Get title for each row
preg_match_all("/<th.*?>(.*?)<\/[\s]*th>/", $table_html[0], $matches);
$row_headers = $matches[1];
// Iterate each row
preg_match_all("/<tr.*?>(.*?)<\/[\s]*tr>/s", $table_html[0], $matches);
$table = array();
foreach($matches[1] as $row_html)
{
preg_match_all("/<td.*?>(.*?)<\/[\s]*td>/", $row_html, $td_matches);
$row = array();
for($i=0; $i<count($td_matches[1]); $i++)
{
$td = strip_tags(html_entity_decode($td_matches[1][$i]));
$row[$row_headers[$i]] = $td;
}
if(count($row) > 0)
$table[] = $row;
}
return $table;
}
I'm assuming I can eliminate the white space by updating with the correct regex expression, but, of course I avoid regex like the plague. Any ideas? Thanks in advance.
-J
You can use trim to remove leading and trailing whitespace characters:
$row[trim($row_headers[$i])] = $td;
But don’t use regular expressions for parsing the HTML document; use a proper HTML parser like the Simple HTML DOM Parser or the one of DOMDocument instead.
An easy solution would be to change
$row[$row_headers[$i]] = $td;
to:
$row[trim($row_headers[$i])] = $td;
Related
I can clearly see in my results that one of the output results is "general". When I try to filter this out in my "if statement", it fails to catch the "general" everytime. My "str_replace" is an attempt to rid the results of any empty white space that might be causing the issue.
Code Snippet:
$tick = 0;
foreach($html->find('select.js-career-select') as $info) {
foreach($info->find('option') as $info2) {
++$tick;
$general = 'general';
if($tick > 38) {
$list = $info2;
$list = strtolower(str_replace(' ', '', $list));
if($list != $general) {
echo $list."<br>";
}
else {
echo "NOPE!";
}
}
}
}
I suspect $list has newlines before or after it. Try:
$list = strtolower(trim(strip_tags($list)));
to remove all types of whitespace surrounding the text, and any HTML tags in the text.
You can also get just the text from the tag with:
$list = $info2->plaintext;
I have a table with data from sql. Some of the columns in the db have more than 1 name. I have created an array from them but now I need to compare the two while creating a table.
$strArrayChildren = explode(',', $children);
$strArrayChildren = str_replace('and', '', $strArrayChildren);
$childCount = count($strArrayChildren);
$strArrayGrades = explode(',',$the_grades);
$strArrayGrades = str_replace('(', '', $strArrayGrades);
$strArrayGrades = str_replace(')', '', $strArrayGrades);
$grades ='';
foreach($strArrayChildren as $child){
foreach($strArrayGrades as $grade){
if(strpos($grade, $child) !== false){
$grades = preg_replace('([a-z A-Z ()]+)', "", $grade);
}elseif(strpos($grade, $child) !== true){
$grades ='';
}
}
echo "<tr>";
echo "<td>{$child}</td>";
echo "<td>{$last_name}</td>";
echo "<td>Child</td>";
echo "<td>{$grades}</td>";
echo "</tr>";
}
When I run this code I get the grade of the student to match with the first name from the array, but then the rest of the grades keep trying to match with the first student even though there is a new row with a new name.
Any help would be great! Thank you!
I'm not sure about your database but this should work. I'm posting the response with static arrays.
<?php
$strArrayChildren = array("Becky"," Aaron"," Luke");
$the_grades = "9 (Susan), 5 (Luke)";
$strArrayGrades = explode(',',$the_grades);
echo "<html><body><table style='text-align: center; width: 400px;'>";
echo "<tr>";
echo "<td>Child</td>";
echo "<td>Grade</td>";
echo "</tr>";
foreach($strArrayChildren as $child){
$grades = "";
$child = trim($child);
foreach($strArrayGrades as $key => $grade){
if(strpos($grade, $child) > 0){
$grades = intval(preg_replace('/[^0-9]+/', '', $grade), 10);
}
}
echo "<tr>";
echo "<td>{$child}</td>";
echo "<td>{$grades}</td>";
echo "</tr>";
}
echo "</table></body></html>";
?>
Explanation:
you need to initialize $grades right after first loop start
There is no point to test both states of strpos() function
It's safe to check position of needle occurrence in the string (be grater than 0)
You need to change you regex for selecting numbers from an string.
Trim needle in strpos(); there are unwanted white space in some cases. Better to trim white spaces at the start of the loop
With my custom function below, my aim is to give a specific link to each element of array of tags. My input to the function is a string like (tag1, tag2, tag3). My output is (in linked form) tag1,
“tag1,” is okey but why can not I get what I expect : “tag1, tag2, tag3” (in linked form)
I read examples in php.net and in this site for the terms (array, explode, for, .=) but I couldn’t solve my issue.
Can you guide me please
function tag_linkify ($article_tags)
{
$array_of_tags = explode(",", $article_tags);
$sayac = count($array_of_tags);
$linked_tags ="";
for ($i=0; $i<$sayac; $i++)
{
$linked_tags .= ''.$array_of_tags[$i].', ';
}
echo substr_replace($linked_tags, '', -1, 2);
}
tag_linkify (tag1,tag2,tag3);
ThanksRegards
Improving on Sedz post:
function tag_linkify ($article_tags)
{
$array_of_tags = explode(",", $article_tags);
echo '' . implode(',', $array_of_tags) . '';
}
tag_linkify ("tag1,tag2,tag3");
Btw. the parameters in your tag_linkify call miss their quotation marks and
'<a href="'.'">'
is really the same as
'<a href="">'
If i understand your question correctly i would do:
tag_linkify ($tag1, $tag2, $tag3);
function tag_linkify ()
{
$tags = get_func_args(); // get all tags in an array
$final = '';
// loop through the tags
forech($tags as $tag)
{
// return or echo depends on what you doing with your data
$final .=''. $tag . '';
}
return $final;
}
get_func_args
Check this out with use of implode
function tag_linkify ()
{
$array_of_tags = get_func_args();;
$sayac = count($array_of_tags);
$linked_tags =array();
for ($i=0; $i<$sayac; $i++)
{
$linked_tags[] = ''.$array_of_tags[$i].' ';
}
echo "(".implode(',', $lined_tags).")";
}
tag_linkify (tag1,tag2,tag3);
I hope this can help
What i'm trying to do is make my output usable for a spreadsheet.
I want each item in the output without array tags or not mashed together but starting with an asterisk and ending with a % sign.
<?php
$file = file_get_contents('aaa.txt'); //get file to string
$row_array = explode("\n",$file); //cut string to rows by new line
$row_array = array_count_values(array_filter($row_array));
foreach ($row_array as $key=>$counts) {
if ($counts==1)
$no_duplicates[] = $key;
}
//do what You want
echo '<pre>';
print_r($no_duplicates);
//write to file. If file don't exist. Create it
file_put_contents('no_duplicates.txt',$no_duplicates);
?>
Maybe this would give you what you want:
$str = "*" . implode("% *", $no_duplicates) . "%";
echo '<pre>';
echo $str;
echo '</pre>';
I have a huge HTML code in a PHP variable like :
$html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....';
I want to display only first 500 characters of this code. This character count must consider the text in HTML tags and should exclude HTMl tags and attributes while measuring the length.
but while triming the code, it should not affect DOM structure of HTML code.
Is there any tuorial or working examples available?
If its the text you want, you can do this with the following too
substr(strip_tags($html_code),0,500);
Ooohh... I know this I can't get it exactly off the top of my head but you want to load the text you've got as a DOMDOCUMENT
http://www.php.net/manual/en/class.domdocument.php
then grab the text from the entire document node (as a DOMnode http://www.php.net/manual/en/class.domnode.php)
This won't be exactly right, but hopefully this will steer you onto the right track.
Try something like:
$html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....';
$dom = new DOMDocument();
$dom->loadHTML($html_code);
$text_to_strip = $dom->textContent;
$stripped = mb_substr($text_to_strip,0,500);
echo "$stripped"; // The Sameple text.Another sample text.....
edit ok... that should work. just tested locally
edit2
Now that I understand you want to keep the tags, but limit the text, lets see. You're going to want to loop the content until you get to 500 characters. This is probably going to take a few edits and passes for me to get right, but hopefully I can help. (sorry I can't give undivided attention)
First case is when the text is less than 500 characters. Nothing to worry about. Starting with the above code we can do the following.
if (strlen($stripped) > 500) {
// this is where we do our work.
$characters_so_far = 0;
foreach ($dom->child_nodes as $ChildNode) {
// should check if $ChildNode->hasChildNodes();
// probably put some of this stuff into a function
$characters_in_next_node += str_len($ChildNode->textcontent);
if ($characters_so_far+$characters_in_next_node > 500) {
// remove the node
// try using
// $ChildNode->parentNode->removeChild($ChildNode);
}
$characters_so_far += $characters_in_next_node
}
//
$final_out = $dom->saveHTML();
} else {
$final_out = $html_code;
}
i'm pasting below a php class i wrote a long time ago, but i know it works. its not exactly what you're after, as it deals with words instead of a character count, but i figure its pretty close and someone might find it useful.
class HtmlWordManipulator
{
var $stack = array();
function truncate($text, $num=50)
{
if (preg_match_all('/\s+/', $text, $junk) <= $num) return $text;
$text = preg_replace_callback('/(<\/?[^>]+\s+[^>]*>)/','_truncateProtect', $text);
$words = 0;
$out = array();
$text = str_replace('<',' <',str_replace('>','> ',$text));
$toks = preg_split('/\s+/', $text);
foreach ($toks as $tok)
{
if (preg_match_all('/<(\/?[^\x01>]+)([^>]*)>/',$tok,$matches,PREG_SET_ORDER))
foreach ($matches as $tag) $this->_recordTag($tag[1], $tag[2]);
$out[] = trim($tok);
if (! preg_match('/^(<[^>]+>)+$/', $tok))
{
if (!strpos($tok,'=') && !strpos($tok,'<') && strlen(trim(strip_tags($tok))) > 0)
{
++$words;
}
else
{
/*
echo '<hr />';
echo htmlentities('failed: '.$tok).'<br /)>';
echo htmlentities('has equals: '.strpos($tok,'=')).'<br />';
echo htmlentities('has greater than: '.strpos($tok,'<')).'<br />';
echo htmlentities('strip tags: '.strip_tags($tok)).'<br />';
echo str_word_count($text);
*/
}
}
if ($words > $num) break;
}
$truncate = $this->_truncateRestore(implode(' ', $out));
return $truncate;
}
function restoreTags($text)
{
foreach ($this->stack as $tag) $text .= "</$tag>";
return $text;
}
private function _truncateProtect($match)
{
return preg_replace('/\s/', "\x01", $match[0]);
}
private function _truncateRestore($strings)
{
return preg_replace('/\x01/', ' ', $strings);
}
private function _recordTag($tag, $args)
{
// XHTML
if (strlen($args) and $args[strlen($args) - 1] == '/') return;
else if ($tag[0] == '/')
{
$tag = substr($tag, 1);
for ($i=count($this->stack) -1; $i >= 0; $i--) {
if ($this->stack[$i] == $tag) {
array_splice($this->stack, $i, 1);
return;
}
}
return;
}
else if (in_array($tag, array('p', 'li', 'ul', 'ol', 'div', 'span', 'a')))
$this->stack[] = $tag;
else return;
}
}
truncate is what you want, and you pass it the html and the number of words you want it trimmed down to. it ignores html while counting words, but then rewraps everything in html, even closing trailing tags due to the truncation.
please don't judge me on the complete lack of oop principles. i was young and stupid.
edit:
so it turns out the usage is more like this:
$content = $manipulator->restoreTags($manipulator->truncate($myHtml,$numOfWords));
stupid design decision. allowed me to inject html inside the unclosed tags though.
I'm not up to coding a real solution, but if someone wants to, here's what I'd do (in pseudo-PHP):
$html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....';
$aggregate = '';
$document = XMLParser($html_code);
foreach ($document->getElementsByTagName('*') as $element) {
$aggregate .= $element->text(); // This is the text, not HTML. It doesn't
// include the children, only the text
// directly in the tag.
}