The following is a code block that's working fine as regards getting the data I want.
Don't laugh, it's probably inefficient, but I'm learning :)
What I want, is to use the $totalLength variable, to stop gathering data when the $totalLength is, say 1500 bytes/characters (ideally, ending on a full word, but I'm not looking for miracles!). Anyway, the code:
$paraLength = 0;
$totalLength = 0;
for ($k = 0; $k < $descriptionValue->length; $k++) { //define integer k as 0, get every description using ($k = 0; $k < $descriptionValue->length; $k++), increment the k loop (to get only 14 elements, use ($k <= 13))
$totalLength = $totalLength + $paraLength;
echo $totalLength." Total<br />";
$descNode = $descriptionValue->item($k)->nodeValue; //find each description element
$descNode = trim($descNode); //trim any whitespace around the element
$descPara = strip_tags($descNode); //remove any HTML tags from the elements
$paraLength = (strlen($descPara)); //find the length of each element
//if (preg_match('/^([0-9 ]+)$/', $descPara)) { //if element starts with numbers followed by a space, define it as a telephone number
// $number = $descPara;
// fwrite ($fh, "\t\t".'<div id="tel">'.$number."</div>\n"); //write a div with id tel, containing the number
//}
//else
if (preg_match('/[A-Z]{4,}/', $descPara)) { //if element starts with at least 4 uppercase characters, define it as a heading
$heading = $descPara;
$heading=ucfirst(strtolower($heading)); //convert the uppercase string to proper
fwrite ($fh, "\t\t".'<div id="heading"><h4>'.$heading."</h4></div>\n"); //write a div with id heading, containing the heading in h4 tags
}
else if (preg_match('/\d*\.\d{1,}[m x]/', $descPara)) { //if the element contains any number of digits followed by a dot, at least one further digit and the letters m x, define it as a heading based on it containing room measurements (this pattern matches at least two number after the dot \d*{2,}}
$room = $descPara;
fwrite ($fh, "\t\t".'<div id="roomheading"><h4>'.$room."</h4></div>\n"); //write a div with id roomheading, containing the heading in h4 tags
}
else if (preg_match('/^Disclaimer/i', $descPara)) { //if the element contains the word Disclaimer, define it as such
$disclaimer = $descPara;
fwrite ($fh, "\t\t".'<div id="disclaimer"><h4>'.$disclaimer."</h4></div>\n"); //write a div with id disclaimer, containing the heading in h4 tags
}
else if (strlen($paraLength<14 && $paraLength>3)) { //when all else fails, if the element is less than 14 but more than 3 characters, also define it as a heading
$other = $descPara;
fwrite ($fh, "\t\t".'<div id="other"><h4>'.$other."</h4></div>\n"); //write a div with id other and the heading in h4 tags
}
else {
fwrite ($fh, "\t\t\t<p>".$descPara."</p>\n"); //anything else is considered content, so write it out inside p tags
}
}
$totalLength counts nicely, but when I tried to put a while statement in there, it just hung. I tried putting the while statement before and after the for, but no joy. What am I doing wrong and how best to solve this one?
FYI $descriptionValue, is data parsed from HTML using DOM & xpath, the while I tried was while($totalLength <= 1500)
Maybe this is what You want:
if ($totalLength > 1500) {
break;
}
Just put a condition inside your for loop. It will jump outside the loop as soon as the condition evaluates to true.
// for () { ...
if ($totalLength > 1500) {
break;
}
// }
Basically, break ends execution of the current for, foreach, while, do-while or switch structure. You can find more about PHP's control structures in the manual.
You can also delete the for and add
$k = 0;
while ($totalLength <= 1500 || $k < $descriptionValue->length)
and inside the loop you increment the value ok $k
Related
What I'm looking for is looking for logic that allows you to print horizontal rulers between content, except for the last content block, which you do NOT want to close with a horizontal ruler.
I have this code to loop the number from 1 to 10 and I have this if inside to check if number is stored in database.
for($i=1;$i<=10;$i++){
if($i == $class->Check($i)){
echo $i;
echo "<hr>"; // if not the last!
}
}
I want to check here if it is the last data with the if condition not the loop, so if we got from if and the loop the numbers 2,3,4,7, I want to check with another if condition if it is the last number and print something.
Notice -> the for loop numbers isn't the same it's a variable this is just an example.
What is the code for (I have this loop and when if condition is true it print something then a hr tag, but I don't want the lase print to print a hr tag so I want to check if it is the last print with if condition to stop it from printing the tag).
In your condition, you can save your output in an array, instead of displaying it immediatly, and then, you can use it later, by checking if you are at the last value :
$MyOutput = [];
for($i=1;$i<=10;$i++) {
if($i == $class->Check($i)){
$MyOutput[] = $i;
}
}
for($i = 0, $len = count($MyOutput); $i < $len; $i++)
{
echo $MyOutput[$i];
if ($i == $len - 1) // are we at the last element ?
{
// your special message for the last element
}
}
Example:
#article{boonzaier2009development,<br/>
author = "Boonzaier, A. and Schubach, K. and Troup, K. and Pollard, A. and Aranda, S. and Schofield, P.",<br/>
title = "Development of a psychoeducational intervention ",<br/>
journal = "Journal of Psychosocial Oncology",<br/>
volume = "27",<br/>
number = "1",<br/>
pages = "136-153",<br/>
year = 2009<br/>
}<br/>
#book{bottoff2008women,<br/>
author = "Bottoff, J. L. and Oliffe, J. L. and Halpin, M. and Phillips, M. and McLean, G. and Mroz, L.",<br/>
title = "Women and prostate cancer support groups: {The} gender connect? {Social} {Science} & {Medicine}",<br/>
publisher = "66",<br/>
pages = "1217-1227",<br/>
year = 2008<br/>
}<br/>
#article{bottorff2012gender,<br/>
author = "Bottorff, J. L. and Oliffe, J. L. and Kelly, M.",<br/>
title = "The gender (s) in the room",<br/>
journal = "Qualitative Health Research",<br/>
volume = "22",<br/>
number = "4",<br/>
pages = "435-440",<br/>
year = 2012<br/>
}
I want to capture the string between double quotes of #article part only. Am getting the count of #article and range of #article fields to get the values of #article elements. Using for loop am getting values of #article (for loop values: range of #article to next #article and so on) The problem is, for example first string #article is in 10th line and second one is in 18 th line, am doing for loop between this range and getting the value but, inbetween #book also is there so how to eliminate that #book range of lines in for loop. Because it captures #book elements also as it is inside in the range of #article.
php code:
<?php
$file=file("master.bib");
$typeart=array();
$cont=array();
//count of article
$key = '#article';
foreach ($file as $l => $line) {
if (strpos($line,$key) !== false) {
$l++;
$typeart[]= $l;
}
}//end-count of article
$counttypeart=count($typeart);
for($j=0;$j<$counttypeart;$j++){
for($i=$typeart[$j];$i<$typeart[$j+1];$i++){
if(strpos($file[$i],'author')){
preg_match('/\"(.*?)\"/',$file[$i],$cont);
$author= $cont[1];
echo $author;
echo "<br>";
}
if(strpos($file[$i],'title')){
preg_match('/\"(.*?)\"/',$file[$i],$cont);
$title= $cont[1];
echo $title;
echo "<br>";
}
if(strpos($file[$i],'journal')){
preg_match('/\"(.*?)\"/',$file[$i],$cont);
$journal= $cont[1];
echo $journal;
echo "<br>";
}
if(strpos($file[$i],'volume')){
preg_match('/\"(.*?)\"/',$file[$i],$cont);
$volume= $cont[1];
echo $volume;
echo "<br>";
}
if(strpos($file[$i],'number')){
preg_match('/\"(.*?)\"/',$file[$i],$cont);
$number= $cont[1];
echo $number;
echo "<br>";
}
if(strpos($file[$i],'pages')){
preg_match('/\"(.*?)\"/',$file[$i],$cont);
$pages= $cont[1];
echo $pages;
echo "<br>";
echo "<br>";
}
}
}
?>
Expected output (From above mentioned example):
Boonzaier, A. and Schubach, K. and Troup, K. and Pollard, A. and Aranda, S. and Schofield P.
Development of a psychoeducational intervention for men with prostate cancer
Journal of Psychosocial Oncology
27
1
136-153
Bottorff, J. L. and Oliffe, J. L. and Kelly, M.
The gender (s) in the room
Qualitative Health Research
22
4
435-440
It appears that the reason your code captures #book elements is because you are not recording the line at which #article element is terminated. Thus, when you iterate over all the lines inside the #article element, you start at the line where your #article element starts and finish at the line where the next #article element starts.
There are two alternative ways to fix your code:
Record both the start and the end lines of the #article element, when you originally scan through all the lines in the file. For example:
// count of article
$key_start = '#article';
$key_end = '}<br/>';
foreach ($file as $l => $line) {
if (strpos($line,$key_start) !== false) {
$start = ++$l;
next;
}
if (strpos($line,$key_end) !== false) {
$typeart[] = array($start, --$l);
next;
}
}
// end-count of article
Now you should be able to iterate over the lines belonging to the #article element by simply doing:
for($j=0;$j<$counttypeart;$j++){
list($start, $end) = $typeart[$j];
for ($i=$start; $i<=$end; $i++) {
…
Break out from your second for loop early, as soon as you come to the #article's closing tag. Thus, avoiding iteration over all the lines up to the following #article element, eg:
for($i=$typeart[$j];$i<$typeart[$j+1];$i++){
$key_end = '}<br/>';
break if (strpos($line,$key_end) !== false);
…
However, neither of this solutions are ideal, as both of them results in the repetitive code which is difficult to maintain. Plus, it relies on you knowing each and every attribute within the #article element in order to capture its value. Unless you have a very good reason to structure your in this specific way, I would opt for an alternative solution…
Alternative solution:
read in all of the bibliography text at once
use regular expression to capture content of all #article elements
use another regular expression to capture parameter names and their values within captured content of individual #article elements
The following is a brief implementation of what I'm talking about:
<?php
// Use file_get_contents() instead of file() as it is the preferred way
// read the contents of a file into a string. It will also use memory mapping
// techniques if supported by your OS to enhance performance.
$file_content = file_get_contents('master.bib');
// Capture all article container from file content. We use a regular
// expression on a multi-line string to do that:
preg_match_all(
'%#article{\w+,<br/>\s+(.*)\s+}(<br/>)?%sUu',
$file_content,
$articles,
PREG_PATTERN_ORDER
);
// Initialise empty results (plural) container which will store results data
// for all #article elements
$results = array();
// At this point $articles[0] is an array of all captured #article blocks
// and $articles[1] is an array of all captured first parenthesis within
// the above regular expression.
foreach ($articles[1] as $article) {
// Initialise empty result (singular) container which will store results
// for the current #article element
$result = array();
// Now we will take the content of the first paranthesis, split it into
// individual lines and pick out reqired data from those lines.
foreach (explode("\n", $article) as $line) {
$found = preg_match(
'%\s*(\w+)\s*=\s*"?([^"]+)"?,?<br/>\s*%Uu',
$line,
$matches
);
// At this point $matches is populated with our desired data, unless
// $found is 0 (no matches where found) or false (an error occurred)
if ($found != false and $found > 0) {
$result[$matches[1]] = trim($matches[2]);
}
}
// Add current #article results to the list of all results, but avoid
// doing so if current results are empty
if (!empty($result)) {
$results[] = $result;
}
}
// Print results
foreach ($results as $article) {
print "{$article['author']}\n"
. "{$article['title']}\n"
. "{$article['journal']}\n"
. "{$article['volume']}\n"
. "{$article['number']}\n"
. "{$article['pages']}\n"
. "\n\n";
}
I have an issue where I have 2 linked blocks. Once there is a max of characters(before they shrink to fit) it moves to my next linked block. The problem is that if there is a lot of text pushed to the next block it becomes very tiny. Is there a way to divide all characters evenly and each box? Even if it shrinks that is ok, just needs to look the same. Image and code below of example:
http://i.imgur.com/ytadphF.png
public function addTextToMultiBlock($text,$baseBlockName,$numberOfBlocks)
{
$tf = 0;
for ($i = 1; $i <= $numberOfBlocks; $i++)
{
$optlist ="encoding=unicode textflowhandle=" . $tf;
$tf = $this->p->fill_textblock($this->page, $baseBlockName.$i, $text, $optlist);
//Set text to null ( $tf handle holds extra text from now on )
$text = null;
if ($tf == 0) {
trigger_error("Warning: " . $this->p->get_errmsg() . "\n");
break;
}
$reason = (int) $this->p->info_textflow($tf, "returnreason");
$result = $this->p->get_parameter("string", $reason);
//Break if all text is placed
if ($result == "_stop")
{
$this->p->delete_textflow($tf);
break;
}
}
}
//call below to block
if(!empty($this->orderData->remarks))
{
$addRemarks.= $this->orderData->remarks;
$helper->addTextToMultiBlock($this->orderData->remarks, 'info', 2);
}
else
{
//nothing
}
What I am thinking is to do a count of the words:
<?php
// SET THE NUMBER OF BOXES YOU WANT
$boxes = 2;
// SAMPLE INPUT STRING
$string = 'Life it seems to fade away, drifting further every day. Getting lost within myself. Nothing matters no one else.';
// MATCH EACH WORD AND STORE IT INTO AN ARRAY
preg_match_all('/\S+/', $string, $matches);
// COUNT HOW MANY WORDS WE HAVE TOTAL
$word_count = count($matches[0]);
// DIVIDE THE TOTAL WORDS BY THE NUMBER OF BOXES
// TO FIND HOW MANY WORDS WE SHOULD HAVE IN EACH BOX
$words_per_box = round($word_count / $boxes);
// SPLIT THE ARRAY OF WORDS INTO CHUNKS BASED ON THE
// - NUMBER OF WORDS PER BOX THAT WE SHOULD HAVE
$chunks = array_chunk($matches[0], $words_per_box);
// START OUTPUTTING THE TABLE
print '
<table border=1 cellpadding=5 cellspacing=0>
<tr>';
// LOOP THROUGH EACH CHUNK OF WORDS AND TURN IT BACK
// - INTO A STRING THAT WE CAN PRINT OUT
foreach ($chunks AS $word_block) {
//PRINT BLOCK TEXT
print '
<td>'.implode(' ', $word_block).'</td>';
}
// CLOSE OUT THE TABLE
print '
</table>';
This outputs the following:
<table border="1" cellpadding="5" cellspacing="0">
<tr>
<td>Life it seems to fade away, drifting further every day.</td>
<td>Getting lost within myself. Nothing matters no one else.</td>
</table>
This is not going to be a perfect solution, but hopefully it gets you pretty darn close.
you should check the "fitmethod" property in the block. If the fitmethod is set to "auto" the complete text will be fit to the textlblock.
From the PDFlib cookbook sample:
http://www.pdflib.com/pdflib-cookbook/block-handling-and-pps/linked-textblocks/
Please see the following comment before filling the block:
* "fitmethod=clip" to clip the text when it doesn't fit completely
* into the block while avoiding any text shrinking.
Ok basically what I am trying to do is create a kind of BB Code system without using regex. The code that Im using below seems like it would work perfectly although it's not. Basically the code is supposed to take a string and remove all the break tags from inside all of the [code][/code] blocks and replace that back into the entire string. Then the code is supposed to turn the [code][/code] tags into "pre" tags for the SyntaxHighlighter script I'm using.
Unfortunately the code doesn't completely work 100%. In some cases it will still leave the break tags inside the [code][/code] blocks. My code is:
<?php
$string = "Hello\n[code]\nCode One\n[/code]\n[code]\nCode Two\n[/code]\n[code]\nCode Three\n[/code]";
$string = nl2br($string);
$openArray = array();
$closeArray = array();
$original = "";
$newString = "";
$i = 0;
if(strpos($string, "[code]") === 0) {
array_push($openArray, 0);
}
while($i = strpos($string, "[code]", $i + 1)) {
array_push($openArray, $i);
}
while($i = strpos($string, "[/code]", $i + 1)) {
array_push($closeArray, $i + 7);
}
for($j = 0; $j < count($openArray); $j++) {
$length = $closeArray[$j] - $openArray[$j];
$original = substr($string, $openArray[$j], $length);
$newString = strip_tags($original);
$string = str_replace($original, $newString, $string);
}
$string = str_replace("[code]", '<pre class="brush: plain">', $string);
$string = str_replace("[/code]", '</pre>', $string);
echo $string;
?>
All answers are greatly appreciated as I have been wondering what is wrong with this for quite some time now and Ive tried many different ways!
The major problem I see with your processing is that you store the open and the close tag pretty independent to each other. You then later on process them as if each one would belong to each other, but that's just not guaranteed because you do not validate if a closing code follows an opening code and if not two opening or closing codes after each other which should give a parse error.
You could write yourself a little helper function that, like strpos, returns you the next position of a open and closing code pair:
function codepos($string, $code, $offset) {
$offset = 0;
if (FALSE === $start = strpos($string, "[$code]", $offset)) {
return FALSE;
}
if (FALSE === $stop = strpos($string, "[/$code]", $start) {
throw new Exception('Close code not found.');
}
if ($next = strpos($string, "[$code]", $start + 1) && $next < $stop) {
throw new Exception('Double opening detected.');
}
$pos = new stdClass;
$pos->start = $start;
$pos->stop = $stop;
$pos->code = $code;
return $pos;
}
It's then easier to process this alter on, as you already know that things are in order. Instead of throwing exceptions you can just run FALSE and give notice somehow differently. And this routine does not yet check for a closing code before the first starting code.
$offset = 0;
while($pos = codepos($string, 'code', $offset))
{
... process each code-pair.
}
For learning or for an intranet tool only, not to be even considered on the www:
You need to take into consideration:
Lines may be longer than the string buffer. Know you will have a max line size unless you code around it.
Code for possible close tags before open tags and possible missing close/open tags unless you assume the input will always be correct.
Be able to handle the following cases:
State1 Looking for one or more open tags:
No open/close tags
Open tag only
Close tag first - parse fails
one or more matching open/close tags (in proper order)
one or more matching open/close tags (in proper order) ending with open tag
End of document - OK
State2 Looking for close tag:
close tag followed by one or more matching open/close tags (in proper order)
close tag followed by one or more matching open/close tags (in proper order) ending with open tag
no close tag
End of document - Parse fails
So I have a large table (701-ish rows, 19 columns). I need to extract the innertext in each td, and then I write it to a csv. The problem is, this takes forever. Doing just 100, takes 32 seconds. This is the code I have:
for ($j = 0; $j < 100; $j++)
{
$f = $html->find("td",$j); // get the td elements from the html
$rowArray[] = $f->innertext; // store that text inside the array
if(($j+1) % 19 == 0) // hit the end of the row
{
$txt .= implode(",", $rowArray) . "\r\n"; // format with comma's and throw it into $txt
unset($rowArray); // clear the array, for the next record
$rowArray = array(); // re-set the array
}
}
The 100 is a temporary value while I test, it really is closer to 13000. The biggest issue is finding the TD values. Is there a faster way for this or is this as good as I can get it?
Basically, looking for the quickest way to extract TD data from an HTML table so I can write it to a CSV.
Did a str_replace to get the stuff I didn't want out and was able to get the contents a lot quicker and faster.