I have a bunch of file. I need to print out the file only ITEM_DESCRIPTION: part. Lets say the contents of each files is like below
// ITEM_DESCRIPTION: Phone 1
// - Android 9.0 (Pie)
// - 64/128 GB
// - Li-Po 3500 mAh
I want the code to display like below
Phone 1
- Android 9.0 (Pie)
- 64/128 GB
- Li-Po 3500 mAh
so far, what I can produce is
// Phone 1 // - Android 9.0 (Pie) // - 64/128 GB // - Li-Po 3500 mAh
How I want to separate the double slash with new line?
Here is my code
// Get file path
// This code inside for loop which i don't write here
$filedir=$filelist[$i];
//Display Item Description
$search = "ITEM_DESCRIPTION";
$endsearch = "BRAND";
$contents = stristr(file_get_contents($filedir), $search);
$description = substr($contents, 0, stripos($contents, $endsearch));
$rmv_char = str_replace(str_split('\:'), ' ', $description);
$newline = str_replace(str_split('\//'), PHP_EOL , $rmv_char);
$phone_dscrpn = substr($newline, strlen($search));
Here is the way I had tried, but it doesn't work
$newline = str_replace(str_split('\//'), PHP_EOL , $rmv_char);
$newline = str_replace(str_split('\//'), "\r\n" , $rmv_char);
It looks like you already have newlines in the original data so you don't need to add them again. You just need to clear the slashes and the spaces (or tabs if that's what they are, hard to tell on SO).
Remember if you're testing output in browser it won't show the newlines without <pre></pre>.
// Get file path
// This code inside for loop which i don't write here
$filedir=$filelist[$i];
//Display Item Description
$search = "ITEM_DESCRIPTION";
$endsearch = "BRAND";
$contents = stristr(file_get_contents($filedir), $search);
$description = substr($contents, 0, stripos($contents, $endsearch));
//Clear out the slashes
$phone_dscrpn = str_replace("//", "", $description);
//Clear out the spaces
while(strpos($phone_dscrpn," ")!==false) {
$phone_dscrpn = str_replace(" ", " ", $phone_dscrpn);
}
Note this will replace any double slashes or double spaces within the description. If this could be an issue then you will need to consider a more advanced approach (e.g. line by line).
Assuming that all of your lines begin with // and this pattern isn't used in the actual product description then you can use a simple regular expression:
$description = preg_replace('~//\s(ITEM_DESCRIPTION:)?\s+~', '', $description);
Match //\s where \s is any white-space
Optionally match ITEM_DESCRIPTION:
Match \s+ any number of white-space characters
This will give you:
Phone 1
- Android 9.0 (Pie)
- 64/128 GB
- Li-Po 3500 mAh
Related
I am trying to determine the absolute position of certain words within a block of html, but only if they are outside of an actual html tag. For instance, if I wanted to determine the position of the word "join" using preg_match in this text:
<p>There are 14 more days until our holiday special so come join us!</p>
I could use:
preg_match('/join/', $post_content, $matches, PREG_OFFSET_CAPTURE, $offset);
The problem is that this is matching the word within the aria-label attribute, when what I need is the one just after the link. It would be fine to match between the <a> and </a>, just not inside the brackets themselves.
My actual end goal, most of what (I think) I have aside from this last element: I am trimming a block of html (not a full document) to cut off at a specific word count. I am trying to determine which character that last word ends at, and then joining the left side of the html block with only the html from the right side, so all html tags close gracefully. I thought I had it working until I ran into an example like I showed where the last word was also within an html attribute, causing me to split the string at the wrong location. This is my code so far:
$post_content = strip_tags ( $p->post_content, "<a><br><p><ul><li>" );
$post_content_stripped = strip_tags ( $p->post_content );
$post_content_stripped = preg_replace("/[^A-Za-z0-9 ]/", ' ', $post_content_stripped);
$post_content_stripped = preg_replace("/\s+/", ' ', $post_content_stripped);
$post_content_stripped_array = explode ( " " , trim($post_content_stripped) );
$excerpt_wordcount = count( $post_content_stripped_array );
$cutpos = 0;
while($excerpt_wordcount>48){
$thiswordrev = "/" . strrev($post_content_stripped_array[$excerpt_wordcount - 1]) . "/";
preg_match($thiswordrev, strrev($post_content), $matches, PREG_OFFSET_CAPTURE, $cutpos);
$cutpos = $matches[0][1] + (strlen($thiswordrev) - 2);
array_pop($post_content_stripped_array);
$excerpt_wordcount = count( $post_content_stripped_array );
}
if($pwordcount>$excerpt_wordcount){
preg_match_all('/<\/?[^>]*>/', substr( $post_content, strlen($post_content) - $cutpos ), $closetags_result);
$excerpt_closetags = "" . $closetags_result[0][0];
$post_excerpt = substr( $post_content, 0, strlen($post_content) - $cutpos ) . $excerpt_closetags;
}else{
$post_excerpt = $post_content;
}
I am actually searching the string in reverse in this case, since I am walking word by word backwards from the end of the string, so I know that my html brackets are backwards, eg:
>p/<!su nioj emoc os >a/<laiceps yadiloh>"su nioj"=lebal-aira "renepoon rerreferon"=ler "knalb_"=tegrat "lmth.egapemos/"=ferh a< ruo litnu syad erom 41 era erehT>p<
But it's easy enough to flip all of the brackets before doing the preg_match, or I am assuming should be easy enough to have the preg_match account for that.
Do not use regex to parse HTML.
You have a simple objective: limit the text content to a given number of words, ensuring that the HTML remains valid.
To this end, I would suggest looping through text nodes until you count a certain number of words, and then removing everything after that.
$dom = new DOMDocument();
$dom->loadHTML($post_content);
$xpath = new DOMXPath($dom);
$all_text_nodes = $xpath->query("//text()");
$words_left = 48;
foreach( $all_text_nodes as $text_node) {
$text = $text_node->textContent;
$words = explode(" ", $text); // TODO: maybe preg_split on /\s/ to support more whitespace types
$word_count = count($words);
if( $word_count < $words_left) {
$words_left -= $word_count;
continue;
}
// reached the threshold
$words_that_fit = implode(" ", array_slice($words, 0, $words_left));
// If the above TODO is implemented, this will need to be adjusted to keep the specific whitespace characters
$text_node->textContent = $words_that_fit;
$remove_after = $text_node;
while( $remove_after->parentNode) {
while( $remove_after->nextSibling) {
$remove_after->parentNode->removeChild($remove_after->nextSibling);
}
$remove_after = $remove_after->parentNode;
}
break;
}
$output = substr($dom->saveHTML($dom->getElementsByTagName("body")->item(0)), strlen("<body>"), -strlen("</body>"));
Live demo
Ok, I figured out a workaround. I don't know if this is the most elegant solution, so if someone sees a better one I would still love to hear it, but for now I realized that I don't have to actually have the html in the string I am searching to determine the position to cut, I just need it to be the same length. I grabbed all of the html elements and just created a dummy string replacing all of them with the same number of asterisks:
// create faux string with placeholders instead of html for search purposes
preg_match_all('/<\/?[^>]*>/', $post_content, $alltags_result);
$tagcount = count( $alltags_result );
$post_content_dummy = $post_content;
foreach($alltags_result[0] as $thistag){
$post_content_dummy = str_replace($thistag, str_repeat("*",strlen($thistag)), $post_content_dummy);
}
Then I just use $post_content_dummy in the while loop instead of $post_content, in order to find the cut position, and then $post_content for the actual cut. So far seems to be working fine.
I was having issues when trying to use the string and when i copied it into notepad++ and viewed all characters tab it showed the following attached symbols. My knowledge is that they are line breaks and spaces. Issue is, i cant seem to get them removed from my string?
Explanation:
I have a function which uses shell_exec to grab information from a stored DB.
$output = trim(shell_exec("'".$command."' 2>&1")); //Trimmed version
return $output;
I have a credit system but when they load the page it calls the function to obtain credits depending on the user etc.
$Credits = Sqlite('select "Credits" from TBL WHERE User = "bla" limit 1');
Thing is, the credit comes back with a � beside it. So if i have 9.50 stored, i received �9.50. When looking into this, i noticed the above characters included in the string?
My PHP attempts:
$Credits = preg_replace('/\s/', '', $Credits); //Clear all spaces
//$Credits = str_replace(' ', '', $Credits); //Clear spaces <-- dont work either
$Credits = str_replace('\r\n', '', $Credits); //Clear all new lines
echo $Credits; //Still returns the new line etc
$Credits is just a variable and will not change the context of your file. So give this a try:
<?php
$text = file_get_contents('source.txt');
echo '<pre>'; // to display any new line from linebreak
echo $text;
echo '<br>====<br>';
$text = preg_replace('/\r\n/','a',$text); // a is just an indicator of original linebreak which you can use '' empty instead
echo $text // display text after linebreak is replaced from variable $text
file_put_contents('source2.txt', $text); // save this to file with linebreaks removed
// or replace content of source.txt
?>
What I'm trying to do here is make use of PHP's ability to create and write to files because I have like 350 pages to make all with the same line of code that differs by one number. Much rather do this through code than manually creating 350 pages!
Each file will be (.php) and named after the title of the content it will have which has already been defined. However, as this will be the URL to reach the page, I need to format the title and use the formatted version as the filename.
This is what I've got to start with:
function seoUrl($string) {
//Make lowercase
$string = strtolower($string);
//Clean up multiple dashes or whitespaces
$string = preg_replace("/[\s-]+/", " ", $string);
//Convert whitespaces and underscore to dash
$string = preg_replace("/[\s_]/", "-", $string);
return $string;
}
I found this function earlier on here and it worked perfectly for making the sitemap for all these pages. The URLs were just like I wanted. However, when I call the same function to do this for each title, I hit a snag. I assume I have the code wrong somewhere so here's a piece of the file creation code:
//Content title to be formatted for the filename
$title1="Capitalized And Spaced Title";
//Formatting
$urlfile1="seoUrl ($title1)";
//Text to be written
$txt1="<?include 'tpl/pages/1.txt'?>";
//And the create/write file code
$createfile1=fopen("$urlfile1.php", "w");
fwrite($createfile1, $txt1);
fclose($createfile1);
The code inserts the $txt values just fine, which is actually where I anticipated having a problem. But my files that are created include the function name and parenthesis, plus the title isn't formatted.
I didn't have this problem on the sitemap page:
$url1="$domainurl/$pathurl/$title1.php";
$url2="$domainurl/$pathurl/$title2.php";
...
seoUrl($url1);
seoUrl($url2);
...
<?echo $url1?><br>
<?echo $url2?><br>
...
I've tried everything I can think of for the past couple hours now. What am I doing wrong here?
Try this i hope this might help you out. it will create file in proper format.
function seoUrl($string) {
//Make lowercase
$string = strtolower($string);
//Clean up multiple dashes or whitespaces
$string = preg_replace("/[\s-]+/", " ", $string);
//Convert whitespaces and underscore to dash
$string = preg_replace("/[\s_]/", "-", $string);
return $string;
}
$title1 = "Capitalized And Spaced Title";
//Formatting
$urlfile1 = seoUrl($title1);
//Text to be written
$txt1 = "<?include 'tpl/pages/1.txt'?>";
//And the create/write file code
$fileName = "" . $urlfile1 . ".php";
$createfile1 = fopen($fileName, "w");
fwrite($createfile1, $txt1);
fclose($createfile1);
I am modifying a piece of code, the essence is to pick the first 90 characters from the body of a post. I have managed to get the text including some punctuation characters.
My problem is that I do not know how to get the 90 characters NOT to ignore newline. I want it to terminate once it encounters a line break. As it is now, it doesn't respect it and so ends up adding content from another line/paragraph.
This is the code I am using -
$title_data = substr($postdata,0,90);
$title_data = preg_replace("/[^\w#&,\":; ]/",'', strip_tags($title_data));
$data['post_title'] = "F. Y. I - " . $title_data . " ...";
The right first step you do the preg_replace(), then you put that value to substr() param.
$title_data = preg_replace("/[^\w#&,\":; ]/",'', strip_tags($postdata));
$data = substr($title_data,0,90);
$data['post_title'] = "F. Y. I - " . $data . " ...";
Here's my to cents... It also makes sure words aren't truncated.
// Break the string after the first paragraph (if any)
$parts = explode('</p>', $postdata);
// Remove all HTML from the first element (which contain the full text if no paragraph exists)
$excerpt = strip_tags($parts[0]);
$ending = '...';
if (strlen($excerpt) > 90) {
// Check where the last space is, so we don't truncate any words.
$excerpt = substr($excerpt, 0, 90 - strlen($ending));
$excerpt = substr($excerpt, 0, strrpos($excerpt, ' '));
}
// Return the new string
$data['post_title'] = "F. Y. I - " . $excerpt . $ending;
A bit more complicated, but might help to get the result you're after:
// Use `wpautop` to work WP's paragraph-adding magic.
$rawText = wpautop($postdata);
// Remove all the opening `<p>` tags...
$preSplitContent = str_replace('<p>', '', $rawText);
// ...and then break into an array using the closing `</p>` tags.
// (hacky, but this gives you an array where each
// item is a paragraph/line from your content)
$splitContent = explode('</p>', $preSplitContent);
// Then run your preg_replace
// (because `$splitContent[0]` is only the first
// line of your text, you won't include any content
// from the other lines)
$firstLine = preg_replace("/[^\w#&,\":; ]/",'', strip_tags($splitContent[0]));
// Then trim the result down to the first 90 characters
$finalText = substr($firstLine,0,90);
$data['post_title'] = "F. Y. I - " . $finalText . " ...";
I have this text : http://pastebin.com/2Zgbs7hi
And i want to be able to remove the HTML code from it and just display the plain text but i want to keep at least one line break where there are currently a few line breaks
i have tried:
$ticket["summary"] = 'pastebin example';
$TicketSummaryDisplay = nl2br($ticket["summary"]);
$TicketSummaryDisplay = stripslashes($TicketSummaryDisplay);
$TicketSummaryDisplay = trim(strip_tags($TicketSummaryDisplay));
$TicketSummaryDisplay = preg_replace('/\n\s+$/m', '', $TicketSummaryDisplay);
echo $TicketSummaryDisplay;
that is displaying as plain text, but it shows it all as one big block of text with no line breaks at all
Maybe this will earn you some time.
<?php
libxml_use_internal_errors(true); //crazy o tags
$html = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$dom = new DOMDocument;
$dom->loadHTML($html);
$result='';
foreach ($dom->getElementsByTagName('p') as $node) {
if (strstr($node->nodeValue, 'Legal Disclaimer:')){
break;
}
$result .= $node->nodeValue;
}
echo $result;
This example should successfully store text from html into an array of strings.
After stripping all the tags, you can use preg_split with \R special character ( matches any newline sequence ) to convert string into array. That array will now have several blank values, and there will be also some amount of html non-breaking space entities, so we will check the array for empty values with array_filter() function ( it will remove all items that do not satisfy the filter conditions, in our case, an empty value ). Here are a problem with entity, because and space characters are not the same, they have different ASCII code, so trim() function will not remove spaces. Here are two possible solutions, the first uncommented part will only replace   and check for white space characters, while the second commented one will decode all html entities and also check for spaces.
PHP:
$text = file_get_contents( 'http://pastebin.com/raw.php?i=2Zgbs7hi' );
$text = strip_tags( $text );
$array = array_filter(
preg_split( '/\R/', $text ),
function( &$item ) {
$item = str_replace( ' ', ' ', $item );
return trim( $item );
// $item = html_entity_decode( $item );
// return trim( str_replace( "\xC2\xA0", ' ', $item ) );
}
);
foreach( $array as $value ) {
echo $value . '<br />';
}
Array output:
Array
(
[8] => Hi,
[11] => Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
[13] => Regards
[23] => Legal Disclaimer:
[24] => This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
[25] => Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
)
Now you should have clear array with only items with value in it. By the way, newlines in HTML are expressed through <br />, not through \n, your example as response in a web browser still has them, but they are only visible in page source code. I hope I did not missed the point of the question.
try this get text output with line brakes
<?php
$ticket["summary"] = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$TicketSummaryDisplay = nl2br($ticket["summary"]);
echo strip_tags($TicketSummaryDisplay,'<br>');
?>
You are asking on how to add line-breaks to your "one big block of text with no line breaks at all".
Short answer
After you stripped the HTML tags, apply wordwrap with a desired text-block length
$text = wordwrap($text, 90, "<br />\n");
I really wonder, why nobody suggested that function before.
there is also chunk_split around, which doesn't take words into account and just splits after a certain number of chars. breaking words - but that's not what you want, i guess.
PHP
<?php
$text = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
/**
* Returns string without html tags, also
* removes takes control chars, spaces and " " into account.
*/
function dropHtmlTags($string) {
// remove html tags
//$string = preg_replace ('/<[^>]*>/', ' ', $string);
$string = strip_tags($string);
// control characters and " "
$string = str_replace("\r", '', $string); // remove
$string = str_replace("\n", ' ', $string); // replace with space
$string = str_replace("\t", ' ', $string); // replace with space
$string = str_replace(" ", ' ', $string);
// remove multiple spaces
$string = preg_replace('/ {2,}/', ' ', $string);
$string = trim($string);
return $string;
}
$text = dropHtmlTags($text);
// The Answer: insert line breaks after 95 chars,
// to get rid of the "one big block of text with no line breaks at all"
$text = wordwrap($text, 95, "<br />\n");
// if you want to insert line-breaks before the legal disclaimer,
// uncomment the next line
//$text = str_replace("Regards Legal Disclaimer", "<br /><br />Regards Legal Disclaimer", $text);
echo $text;
?>
Result
first section shows your text block
second section shows the text with wordwrap applied (code from above)
Hello it can be done as follows:
$abc= file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$abc = strip_tags("\n", $abc);
echo $abc;
Please, let me know whether it works
you may use
<?php
$a= file_get_contents('a.txt');
echo nl2br(htmlspecialchars($a));
?>
<?php
$handle = #fopen("pastebin.html", "r");
if ($handle) {
while (!feof($handle)) {
$buffer = fgetss($handle, 4096);
echo $buffer;
}
fclose($handle);
}
?>
output is
Hi,
Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
Regards
Legal Disclaimer:
This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
You can probably write additional code to convert to spaces etc.
I'm not sure I did understand everything correctly but this seems to be your expected result:
$txt = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
var_dump(preg_replace("/(\ \;(\s{1,})?)+/", "\n", trim(strip_tags(preg_replace("/(\s){1,}/", " ", $txt)))));
//more readable
$txt = preg_replace("/(\s){1,}/", " ", $txt);
$txt = trim(strip_tags($txt));
$txt = preg_replace("/(\ \;(\s{1,})?)+/", "\n", $txt);
The strip_tags() function strips HTML and PHP tags from a string, if that is what you are trying to accomplish.
Examples from the docs:
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
The above example will output:
Test paragraph. Other text
<p>Test paragraph.</p> Other text