I know it's probably a very simple issue but still I didn't find a solution...
Ok, I'll be brief:
suppose to have a so structured XML file:
<root><item><text>blah blah blah
blah blah blah
blah blah blah
...</text></item></root>
My XML is obviously more complex but that's not important since my question is:
how do I replace those
with, for instance, html <br> tags?
I'm using SimpleXML to read data from XML and tried with:
echo str_replace("
", "<br>", $message->text);
and even with:
echo str_replace("\n", "<br>", $message->text);
but nothing...
I need to use SimpleXML for this.
represents the ASCII "carriage return" character (ASCII code 13, which is D in hexadecimal), sometimes written "\r", rather than the "linefeed" character, "\n" (which is ASCII 10, or A in hex). Note that when SimpleXML is asked for the string content of a node (with (string)$node or implicitly with statements like echo $node) it will turn this "entity" into the actual character it represents.
Depending on your platform (Windows, Linux, MacOS, etc), the standard line-ending, accessible via the built-in constant PHP_EOL, will be either "\n", "\r\n", or "\r".
The safest way to replace these with HTML linebreak tags (<br>) is to replace any of these characters, since you don't know which convention the source of the XML data might have been using.
PHP has a built-in function which should be able to do this for you, called nl2br(). If you want a slightly custom version, there's a comment in the docs from "ngkongs" showing how to use str_replace to similar effect.
I figured it out how to solve just before posting my question so, having already written, I'll share this hoping it'll be useful for someone else sooner or later...
This does the trick:
echo str_replace(PHP_EOL, "<br>", $message->text);
Related
So, for basic stuff my regex is working fine.
eg
[b]text[/b] becomes <b>text</b>
but
[b]te[b]x[/b]t[/b] becomes <b>te[b]x</b>t[/b]
ideally i want it: <b>te<b>x</b>t</b>
If i do a different bbcode, it works:
[b]te[i]x[/i]t[/b] becomes <b>te<i>x</i>t</b>
The reason this is an issue, is because i have a quote bbcode, and sometimes people will end up quoting someones post, which itself has quoted someone else.
The regex looks like this:
$str = preg_replace("/\[b\](.*?)\[\/b\]/misS", "<b>$1</b>", $str);
Use \[(/?b)\] and replace it with <$1>
You can do this easily with a regex as long as you are willing to be fairly dumb: don't worry about finding matched pairs, just replace anything that looks like an opening or closing tag:
$str = preg_replace("|\[(/?\w+)\]|iS", "<$1>", $str);
If something like that is not good enough, then a regex is not your solution. It can't handle nested elements properly because it is not a real parser.
There is a PHP extension for parsing BBCode. That is probably the way to go if you are interested in robust, correct parsing (caveat: haven't used it myself).
I'm using PHP markdown but I also need a script to convert plaintext links into clicakable ones. Both work independently, but when I try to run them together, if I run markdown first, the makelinks still processes on the html code and screws things up.. and.. vice versa. Any idea of how to stop it from doing that? I can't figure out regex to ignore the markdown style links
function makeLinks($text) {
$text = preg_replace('%(((f|ht){1}tp://)[-a-zA-^Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1', $text);
$text = preg_replace('%([[:space:]()[{}])(www.[-a-zA-Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1\\2', $text);
return $text;
}
sample text:
###[Title Section](http://domain/folder/page.html)
- Blah blah some text and then a link: www.webpage.org.
The double-linkify problem can be solved best with guesswork and workarounds. (We have some duplicate questions, but I can never find a good one..)
Since already converted http://-urls only occur right after href=" or an >, you can use those for negative assertions.
(?<!href="|>)
Should be written at the start of your first regex:
$text = preg_replace('%(?<!href="|>)(((f|ht){1}tp://)...
Your second regex uses the :space: as anchor, so should be fault tolerant already.
I've tried about everything to delete some extra \n characters in a web application I'm working with. I was hoping someone has encountered this issue before and knows what can be causing this. All my JS and PHP files are UTF-8 encoded with no BOM.
And yes I've tried things like
In JS:
text.replace(/\n/g,"")
In PHP:
preg_replace("[\n]","",$result);
str_replace("\n","",$result);
and when I try
text.replace(/\n/g,"")
in the firebug console using the same string I get from the server it works but for reason it doesn't work in a JS file.
I'm desperate, picky and this is killing me. Any input is appreciated.
EDIT:
If it helps, I know how to use the replace functions above. I'm able to replace any other string or pattern except \n for some reason.
Answer Explanation:
Some people do and use what works because it just works. If you are like me and for the record I always like to know why what works WORKS!
In my case:
Why this works? str_replace('\n', '', $result)
And this doesn't? str_replace("\n", '', $result)
Looks identical right?
Well it seems that when you enclose a string with a character value like \n in double quotes "\n" it's seen as it's character value NOT as a string. On the other hand if you enclose it in single quotes '\n' it's really seen as the string \n. At least that is what i concluded in my 3 hours headache.
If what I concluded is a setup specific issue OR is erroneous please do let me know or edit.
In php, use str_replace(array('\r','\n'), '', $string).
I guess the problem is you also have \r's in your code (carriage returns, also displayed as newlines).
In javascript, the .replace() method doesn't modify the string. It returns a new modified string, so you need to reference the result.
text = text.replace(/\n/g,"")
Both of the PHP functions you tried return the altered string, they do not alter their arguments:
$result = preg_replace("[\n]","",$result);
$result = str_replace("\n","",$result);
Strangely, using
str_replace(array('\r','\n'), '', $string)
didn't work for me. I can't really work out why either.
In my situation I needed to take output from the a WordPress custom meta field, and then I was placing that formatted as HTML in a javascript array for later use as info windows in a Google Maps instance on my site.
If I did the following:
$stockist_address = $stockist_post_custom['stockist_address'][0];
$stockist_address = apply_filters( 'the_content', $stockist_address);
$stockist_sites_html .= str_replace(array('\r','\n'), '', $stockist_address);
This did not give me a string with the html on a single line. This therefore threw an error on Google Maps.
What I needed to do instead was:
$stockist_address = $stockist_post_custom['stockist_address'][0];
$stockist_address = apply_filters( 'the_content', $stockist_address);
$stockist_sites_html .= trim( preg_replace( '/\s+/', ' ', $stockist_address ) );
This worked like a charm for me.
I believe that usage of \s in regular expressions tabs, line breaks and carriage returns.
I'm trying to build a PHP preg replace string when processing poorly written xml, such that if I am given:
$x='<abc x="y"><def x="g">more test</def x="g"><blah>test data</blah></abc x="y">';
That it checks if there's a space within a closing tag and deletes everything from the space to the end of the tag such that.
becomes
<abc x="y"><def x="g">more test</def><blah>test data</blah></abc>
thanks
This should do it:
preg_replace('/<\/(\w+)\s*[^>]*>/', '</\1>', $x);
A regex might actually be feasible in this case:
$xml = preg_replace("#(</(\w+:)?\w+)\s[^>]+>#", "$1>", $xml);
Edit: fixed as per #netcoder's hint. Made space mandatory before garbage.
The obvious pitfalls are of course comments (unlikely for data XML), and CDATA sections (from the looks of your xml also not likely).
Though you could still try QueryPath, it's supposed to work with XML too and might be resilient about these cases. How did it get garbled anyway?
preg_replace('/<\/(.*?)\s+[^>]+>/', '</$1>', $string);
Edit: tested, works.
Try:
preg_replace("/<\/((\w)([^<].*)?)\>/","</$2>",$x);
Code not tested
You can also use T-Regx library:
This with #Jonah example:
pattern('<\/(.*?)\s+[^>]+>')->replace($string)->all()->withReferences('</$1>');
PS: Notice that using with() would quote the placeholders.
I am writing a comment-stripper and trying to accommodate for all needs here. I have the below stack of code which removes pretty much all comments, but it actually goes too far. A lot of time was spent trying and testing and researching the regex patterns to match, but I don't claim that they are the best at each.
My problem is that I also have situation where I have 'PHP comments' (that aren't really comments' in standard code, or even in PHP strings, that I don't actually want to have removed.
Example:
<?php $Var = "Blah blah //this must not comment"; // this must comment. ?>
What ends up happening is that it strips out religiously, which is fine, but it leaves certain problems:
<?php $Var = "Blah blah ?>
Also:
will also cause problems, as the comment removes the rest of the line, including the ending ?>
See the problem? So this is what I need...
Comment characters within '' or "" need to be ignored
PHP Comments on the same line, that use double-slashes, should remove perhaps only the comment itself, or should remove the entire php codeblock.
Here's the patterns I use at the moment, feel free to tell me if there's improvement I can make in my existing patterns? :)
$CompressedData = $OriginalData;
$CompressedData = preg_replace('!/\*.*?\*/!s', '', $CompressedData); // removes /* comments */
$CompressedData = preg_replace('!//.*?\n!', '', $CompressedData); // removes //comments
$CompressedData = preg_replace('!#.*?\n!', '', $CompressedData); // removes # comments
$CompressedData = preg_replace('/<!--(.*?)-->/', '', $CompressedData); // removes HTML comments
Any help that you can give me would be greatly appreciated! :)
If you want to parse PHP, you can use token_get_all to get the tokens of a given PHP code. Then you just need to iterate the tokens, remove the comment tokens and put the rest back together.
But you would need a separate procedure for the HTML comments, preferably a real parser too (like DOMDocument provides with DOMDocument::loadHTML).
You should first think carefully whether you actually want to do this. Though what you're doing may seem simple, in the worst case scenario, it becomes extremely complex problem (to solve with just few regular expressions). Let me just illustrate just of the few problems you would be facing when trying to strip both HTML and PHP comments from a file.
You can't straight out strip HTML comments, because you may have PHP inside the HTML comments, like:
<!-- HTML comment <?php echo 'Actual PHP'; ?> -->
You can't just simply separately deal with stuff inside the <?php and ?> tags either, since the ending thag ?> can be inside strings or even comments, like:
<?php /* ?> This is still a PHP comment <?php */ ?>
Let's not forget, that ?> actually ends the PHP, if it's preceded by one line comment. For example:
<?php // ?> This is not a PHP comment <?php ?>
Of course, like you already illustrated, there will be plenty of problems with comment indicators inside strings. Parsing out strings to ignore them isn't that simple either, since you have to remember that quotes can be escaped. Like:
<?php
$foo = ' /* // None of these start a comment ';
$bar = ' \' // Remember escaped quotes ';
$orz = " ' \" \' /* // Still not a comment ";
?>
Parsing order will also cause you headache. You can't just simply choose to parse either the one line comments first or the multi line comments first. They both have to be parsed at the same time (i.e. in the order they appear in the document). Otherwise you may end up with broken code. Let me illustrate:
<?php
/* // Multiline comment */
// /* Single Line comment
$omg = 'This is not in a comment */';
?>
If you parse multi line comments first, the second /* will eat up part of the string destroying the code. If you parse the single line comments first, you will end up eating the first */, which will also destroy the code.
As you can see, there are many complex scenarios you'd have to account, if you intend to solve your problem with regular expression. The only correct solution is to use some sort of PHP parser, like token_get_all(), to tokenize the entire source code and strip the comment tokens and rebuild the file. Which, I'm afraid, isn't entirely simple either. It also won't help with HTML comments, since the HTML is left untouched. You can't use XML parsers to get the HTML comments either, because the HTML is rarely well formed with PHP.
To put it short, the idea of what you're doing is simple, but the actual implementation is much harder than it seems. Thus, I would recommend trying to avoid doing this, unless you have a very good reason to do it.
One way to do this in REGEX is to use one compound expression and preg_replace_callback.
I was going to post a poor example but the best place to look is at the source code to the PHP port of Dean Edwards' JS packer script - you should see the general idea.
http://joliclic.free.fr/php/javascript-packer/en/
try this
private function removeComments( $content ){
$content = preg_replace( "!/\*.*?\*/!s" , '', $content );
$content = preg_replace( "/\n\s*\n/" , "\n", $content );
$content = preg_replace( '#^\s*//.+$#m' , "", $content );
$content = preg_replace( '![\s\t]//.*?\n!' , "\n", $content );
$content = preg_replace( '/<\!--.*-->/' , "\n", $content );
return $content;
}