Related
I was having issues when trying to use the string and when i copied it into notepad++ and viewed all characters tab it showed the following attached symbols. My knowledge is that they are line breaks and spaces. Issue is, i cant seem to get them removed from my string?
Explanation:
I have a function which uses shell_exec to grab information from a stored DB.
$output = trim(shell_exec("'".$command."' 2>&1")); //Trimmed version
return $output;
I have a credit system but when they load the page it calls the function to obtain credits depending on the user etc.
$Credits = Sqlite('select "Credits" from TBL WHERE User = "bla" limit 1');
Thing is, the credit comes back with a � beside it. So if i have 9.50 stored, i received �9.50. When looking into this, i noticed the above characters included in the string?
My PHP attempts:
$Credits = preg_replace('/\s/', '', $Credits); //Clear all spaces
//$Credits = str_replace(' ', '', $Credits); //Clear spaces <-- dont work either
$Credits = str_replace('\r\n', '', $Credits); //Clear all new lines
echo $Credits; //Still returns the new line etc
$Credits is just a variable and will not change the context of your file. So give this a try:
<?php
$text = file_get_contents('source.txt');
echo '<pre>'; // to display any new line from linebreak
echo $text;
echo '<br>====<br>';
$text = preg_replace('/\r\n/','a',$text); // a is just an indicator of original linebreak which you can use '' empty instead
echo $text // display text after linebreak is replaced from variable $text
file_put_contents('source2.txt', $text); // save this to file with linebreaks removed
// or replace content of source.txt
?>
I have huge online php fileset that is made dynamically.
It has links, even some invalid ones with quotes (made with frontpage)
index2.php?page=xd
index2.php?page=xj asdfa
index2.php?page=xj%20aas
index2.php?page=xj#jumpword
index2.php?page=gj#jumpword with spaces that arenot%20
index2.php?page=afdsdj#jumpword%20with
index2.php?page=xj#jumpword with "quotes" iknow
$input_lines=preg_replace("/(index2.php?page\=.*)(#[a-zA-Z0-9_ \\"]*)(\"\>)/U", "$0 --> $2", $input_lines);
I want all of those to be just with the # -part and not have the index2.php?page=* part.
I could not get this to work in whole evening. So please help.
In some cases, you can use parse_url to get attributes from the URL (ex: what is after the #), like so:
$urls = array(
'index2.php?page=xd',
'index2.php?page=xj asdfa',
'index2.php?page=xj%20aas',
'index2.php?page=xj#jumpword',
'index2.php?page=gj#jumpword with spaces that arenot%20',
'index2.php?page=afdsdj#jumpword%20with',
'index2.php?page=xj#jumpword with "quotes" iknow',
);
foreach($urls as $url){
echo 'For "' . $url . '": ';
$parsed = parse_url($url);
echo isset($parsed['fragment']) ? $parsed['fragment'] : 'DID NOT WORK';
echo '<br>';
}
Output:
For "index2.php?page=xd": DID NOT WORK
For "index2.php?page=xj asdfa": DID NOT WORK
For "index2.php?page=xj%20aas": DID NOT WORK
For "index2.php?page=xj#jumpword": jumpword
For "index2.php?page=gj#jumpword with spaces that arenot%20": jumpword with spaces that arenot%20
For "index2.php?page=afdsdj#jumpword%20with": jumpword%20with
For "index2.php?page=xj#jumpword with "quotes" iknow": jumpword with "quotes" iknow
I'm counting words in an article and removing common words such as "and" or "the".
I"m removing them by use of preg_replace
after it is done I do a quick clean of extra white space by using.
$search_body = preg_replace('/\s+/',' ',$search_body);
However I've got some very stubborn white space that will not go away. I've tried
if($word == "" OR $word == " "){
//chop it's head off
}
But the if statement does not see $word as being just whitespace. I've also tried printing it to the screen to get the raw data type of it and it's still just showing up blank.
Here is the full regex that I'm using.
$pattern = array(
'/\"\;/',
'/[0-9]/',
'/\,/',
'/\./',
'/\!/',
'/\#/',
'/\#/',
'/\$/',
'/\%/',
'/\^/',
'/\&/',
'/\*/',
'/\(/',
'/\)/',
'/\_/',
'/\"/',
'/\'/',
'/\:/',
'/\;/',
'/\?/',
'/\`/',
'/\~/',
'/\[/',
'/\]/',
'/\{/',
'/\}/',
'/\|/',
'/\+/',
'/\=/',
'/\-/',
'/–/',
'/°/',
'/\bthe\b/',
'/\band\b/',
'/\bthat\b/',
'/\bhave\b/',
'/\bfor\b/',
'/\bnot\b/',
'/\bwith\b/',
'/\byou\b/',
'/\bthis\b/',
'/\bbut\b/',
'/\bhis\b/',
'/\bfrom\b/',
'/\bthey\b/',
'/\bsay\b/',
'/\bher\b/',
'/\bshe\b/',
'/\bwill\b/',
'/\bone\b/',
'/\ball\b/',
'/\bwould\b/',
'/\bthere\b/',
'/\btheir\b/',
'/\bwhat\b/',
'/\bout\b/',
'/\babout\b/',
'/\bwho\b/',
'/\bget\b/',
'/\bwhich\b/',
'/\bwhen\b/',
'/\bmake\b/',
'/\bcan\b/',
'/\blike\b/',
'/\btime\b/',
'/\bjust\b/',
'/\bhim\b/',
'/\bknow\b/',
'/\btake\b/',
'/\bpeople\b/',
'/\binto\b/',
'/\byear\b/',
'/\byour\b/',
'/\bgood\b/',
'/\bsome\b/',
'/\bcould\b/',
'/\bthem\b/',
'/\bsee\b/',
'/\bother\b/',
'/\bthan\b/',
'/\bthen\b/',
'/\bnow\b/',
'/\blook\b/',
'/\bonly\b/',
'/\bcome\b/',
'/\bits\b/', //it's?
'/\bover\b/',
'/\bthink\b/',
'/\balso\b/',
'/\bback\b/',
'/\bafter\b/',
'/\buse\b/',
'/\btwo\b/',
'/\bhow\b/',
'/\bour\b/',
'/\bwork\b/',
'/\bfirst\b/',
'/\bwell\b/',
'/\bway\b/',
'/\beven\b/',
'/\bnew\b/',
'/\bwant\b/',
'/\bbecause\b/',
'/\bany\b/',
'/\bthese\b/',
'/\bgive\b/',
'/\bday\b/',
'/\bmost\b/',
'/\bare\b/',
'/\bwas\b/',
'/\<\w+\>/', '/\<\/\w+\>/',
'/\b\w{1}\b/', //1 letter word
'/\b\w{2}\b/', //2 letter word
'/\//',
'/\</',
'/\>/'
);
$search_body = strip_tags($body);
$search_body = strtolower($search_body);
$search_body = preg_replace($pattern, ' ', $search_body);
$search_body = preg_replace('/\s+/',' ',$search_body);
$search_body = explode(" ", $search_body);
When exploded blank values show up left and right
Example text that I am using is too long to post here. But I copied and pasted
This article to give it a test and it showed 32 counts of white space, not including the white space in front of or behind of other words even after using trim().
Here's a js.fiddle of the raw data that is being handled by php.
htmlentities and htmlspecialchars also show nothing.
Here's the code counts all the values and puts them into one.
$inhere = array();
$body_hold = array();
foreach($search_body as $value){
$value = trim($value);
if(in_array($value, $inhere) && $value != ""){
$key = array_search($value, $inhere);
$body_hold[$key]['count'] = $body_hold[$key]['count']+1;
}elseif($value != ""){
$inhere[] = $value;
$body_hold[] = array(
'count' => 1,
'word' => $value
);
}
}
rsort($body_hold);
Basic foreach to see values.
foreach($body_hold as $value){
$count = $value['count'];
$word = trim($value['word']);
echo "Count: ".$count;
echo " Word: ".$word;
echo '<br>';
}
Here's a PHP example of what it's returning
Are you sure you put the exact same data you're processing in the js.fiddle? Or did you get it from a subsequent post-processed step?
It's obviously a Wikipedia article. I went to that article on Wikipedia and opened it in Edit mode, and saw that there are s in the raw wikitext. However, those nbsp's don't appear in your js.fiddle data.
TL;DR: Check for in your processing (and convert to spaces, etc.).
This character 160 looks like space but it's not, replacing all of them to the regular spaces (32) and then removing all the double spaces will fix your problem.
$search_body = str_replace(chr(160), chr(32), $search_body);
$search_body = trim(preg_replace('/\s+/', ' ', $search_body));
I have this text : http://pastebin.com/2Zgbs7hi
And i want to be able to remove the HTML code from it and just display the plain text but i want to keep at least one line break where there are currently a few line breaks
i have tried:
$ticket["summary"] = 'pastebin example';
$TicketSummaryDisplay = nl2br($ticket["summary"]);
$TicketSummaryDisplay = stripslashes($TicketSummaryDisplay);
$TicketSummaryDisplay = trim(strip_tags($TicketSummaryDisplay));
$TicketSummaryDisplay = preg_replace('/\n\s+$/m', '', $TicketSummaryDisplay);
echo $TicketSummaryDisplay;
that is displaying as plain text, but it shows it all as one big block of text with no line breaks at all
Maybe this will earn you some time.
<?php
libxml_use_internal_errors(true); //crazy o tags
$html = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$dom = new DOMDocument;
$dom->loadHTML($html);
$result='';
foreach ($dom->getElementsByTagName('p') as $node) {
if (strstr($node->nodeValue, 'Legal Disclaimer:')){
break;
}
$result .= $node->nodeValue;
}
echo $result;
This example should successfully store text from html into an array of strings.
After stripping all the tags, you can use preg_split with \R special character ( matches any newline sequence ) to convert string into array. That array will now have several blank values, and there will be also some amount of html non-breaking space entities, so we will check the array for empty values with array_filter() function ( it will remove all items that do not satisfy the filter conditions, in our case, an empty value ). Here are a problem with entity, because and space characters are not the same, they have different ASCII code, so trim() function will not remove spaces. Here are two possible solutions, the first uncommented part will only replace   and check for white space characters, while the second commented one will decode all html entities and also check for spaces.
PHP:
$text = file_get_contents( 'http://pastebin.com/raw.php?i=2Zgbs7hi' );
$text = strip_tags( $text );
$array = array_filter(
preg_split( '/\R/', $text ),
function( &$item ) {
$item = str_replace( ' ', ' ', $item );
return trim( $item );
// $item = html_entity_decode( $item );
// return trim( str_replace( "\xC2\xA0", ' ', $item ) );
}
);
foreach( $array as $value ) {
echo $value . '<br />';
}
Array output:
Array
(
[8] => Hi,
[11] => Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
[13] => Regards
[23] => Legal Disclaimer:
[24] => This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
[25] => Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
)
Now you should have clear array with only items with value in it. By the way, newlines in HTML are expressed through <br />, not through \n, your example as response in a web browser still has them, but they are only visible in page source code. I hope I did not missed the point of the question.
try this get text output with line brakes
<?php
$ticket["summary"] = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$TicketSummaryDisplay = nl2br($ticket["summary"]);
echo strip_tags($TicketSummaryDisplay,'<br>');
?>
You are asking on how to add line-breaks to your "one big block of text with no line breaks at all".
Short answer
After you stripped the HTML tags, apply wordwrap with a desired text-block length
$text = wordwrap($text, 90, "<br />\n");
I really wonder, why nobody suggested that function before.
there is also chunk_split around, which doesn't take words into account and just splits after a certain number of chars. breaking words - but that's not what you want, i guess.
PHP
<?php
$text = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
/**
* Returns string without html tags, also
* removes takes control chars, spaces and " " into account.
*/
function dropHtmlTags($string) {
// remove html tags
//$string = preg_replace ('/<[^>]*>/', ' ', $string);
$string = strip_tags($string);
// control characters and " "
$string = str_replace("\r", '', $string); // remove
$string = str_replace("\n", ' ', $string); // replace with space
$string = str_replace("\t", ' ', $string); // replace with space
$string = str_replace(" ", ' ', $string);
// remove multiple spaces
$string = preg_replace('/ {2,}/', ' ', $string);
$string = trim($string);
return $string;
}
$text = dropHtmlTags($text);
// The Answer: insert line breaks after 95 chars,
// to get rid of the "one big block of text with no line breaks at all"
$text = wordwrap($text, 95, "<br />\n");
// if you want to insert line-breaks before the legal disclaimer,
// uncomment the next line
//$text = str_replace("Regards Legal Disclaimer", "<br /><br />Regards Legal Disclaimer", $text);
echo $text;
?>
Result
first section shows your text block
second section shows the text with wordwrap applied (code from above)
Hello it can be done as follows:
$abc= file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$abc = strip_tags("\n", $abc);
echo $abc;
Please, let me know whether it works
you may use
<?php
$a= file_get_contents('a.txt');
echo nl2br(htmlspecialchars($a));
?>
<?php
$handle = #fopen("pastebin.html", "r");
if ($handle) {
while (!feof($handle)) {
$buffer = fgetss($handle, 4096);
echo $buffer;
}
fclose($handle);
}
?>
output is
Hi,
Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
Regards
Legal Disclaimer:
This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
You can probably write additional code to convert to spaces etc.
I'm not sure I did understand everything correctly but this seems to be your expected result:
$txt = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
var_dump(preg_replace("/(\ \;(\s{1,})?)+/", "\n", trim(strip_tags(preg_replace("/(\s){1,}/", " ", $txt)))));
//more readable
$txt = preg_replace("/(\s){1,}/", " ", $txt);
$txt = trim(strip_tags($txt));
$txt = preg_replace("/(\ \;(\s{1,})?)+/", "\n", $txt);
The strip_tags() function strips HTML and PHP tags from a string, if that is what you are trying to accomplish.
Examples from the docs:
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
The above example will output:
Test paragraph. Other text
<p>Test paragraph.</p> Other text
I am working on my Wordpress blog and its required to get the title of a post and split it at the "-". Thing is, its not working, because in the source its &ndash and when I look at the result on the website, its a "long minus" (–). Copying and pasting this long minus into some editor makes it a normal minus (-). I cant split at "-" nor at &ndash, but somehow it must be possible. When I created the article, I just typed "-" (minus), but somewhere it gets converted to – automatically.
Any ideas?
Thanks!
I think I found it. I remember that I have meet the similar problem that when I paste code in my post the quote mark transform to an em-quad one when display to readers.
I found that is in /wp-include/formatting.php line 56 (wordpress ver 3.3.1), it defined some characters need to replace
$static_characters = array_merge( array('---', ' -- ', '--', ' - ', 'xn–', '...', '``', '\'\'', ' (tm)'), $cockney );
$static_replacements = array_merge( array($em_dash, ' ' . $em_dash . ' ', $en_dash, ' ' . $en_dash . ' ', 'xn--', '…', $opening_quote, $closing_quote, ' ™'), $cockneyreplace );
and in line 85 it make an replacement
// This is not a tag, nor is the texturization disabled static strings
$curl = str_replace($static_characters, $static_replacements, $curl);
If you want to split a string at the "-" character, basically you must replace "-" with a space.
Try this:
$string_to_be_stripped = "my-word-test";
$chars = array('-');
$new_string = str_replace($chars, ' ', $string_to_be_stripped);
echo $new_string;
These lines splits the string at the "-". For example, if you have my-word-test, it will echo "my word test". I hope it helps.
For more information about the str_replace function click here.
If you want to do this in a WordPress style, try using filters. I suggest placing these lines in your functions.php file:
add_filter('the_title', function($title) {
$string_to_be_stripped = $title;
$chars = array('-');
$new_string = str_replace($chars, ' ', $string_to_be_stripped);
return $new_string;
})
Now, everytime you use the_title in a loop, the title will be escaped.