preg_split : How to get what's before the split

preg_split : How to get what's before the split - php

I'm having some issues with the preg-split function.
I would like to get what is before the delimiter instead of what's after it.
I've found some leads explaining that using the following code would do the trick :
$var = end(preg_split('/\./',$string));
echo($var[0]);
But when I'm doing that I only get the first char and not every chars before the dot.
Here is my code :
$item = "software_technical_item.TI";
$joint = end(preg_split('/\./',$item));
I obviously get "TI" in $joint, I would like to get "software_technical_item", would someone know how to do that ?
Thanks,
Corentin.

Dot is a special character in regex which matches any character , you need to escape it in-order to match a literal dot.
$string = "software_technical_item.TI";
$var = preg_split('/\./',$string);
echo($var[0]);
Output:
software_technical_item

Related

Erasing C comments with preg_replace

I need to erase all comments in $string which contains data from some C file.
The thing I need to replace looks like this:
something before that shouldnt be replaced
/*
* some text in between with / or * on many lines
*/
something after that shouldnt be replaced
and the result should look like this:
something before that shouldnt be replaced
something after that shouldnt be replaced
I have tried many regular expressions but neither work the way I need.
Here are some latest ones:
$string = preg_replace("/\/\*(.*?)\*\//u", "", $string);
and
$string = preg_replace("/\/\*[^\*\/]*\*\//u", "", $string);
Note: the text is in UTF-8, the string can contain multibyte characters.

You would also want to add the s modifier to tell the regex that .* should include newlines. I always think of s to mean "treat the input text as a single line"
So something like this should work:
$string = preg_replace("/\\/\\*(.*?)\\*\\//us", "", $string);
Example: http://codepad.viper-7.com/XVo9Tp
Edit: Added extra escape slashes to the regex as Brandin suggested because he is right.

I don't think regexp fit good here. What about wrote a very small parse to remove this? I don't do PHP coding for a long time. So, I will try to just give you the idea (simple alogorithm) I haven't tested this, it's just to you get the idea, as I said:
buf = new String() // hold the source code without comments
pos = 0
while(string[pos] != EOF) {
if(string[pos] == '/') {
pos++;
while(string[pos] != EOF)
{
if(string[pos] == '*' && string[pos + 1] == '/') {
pos++;
break;
}
pos++;
}
}
buf[buf_index++] = string[pos++];
}
where:
string is the C source code
buf a dynamic allocated string which expands as needed

It is very hard to do this perfectly without ending up writing a full C parser.
Consider the following, for example:
// Not using /*-style comment here.
// This line has an odd number of " characters.
while (1) {
printf("Wheee!
(*\/*)
\\// - I'm an ant!
");
/* This is a multiline comment with a // in, and
// an odd number of " characters. */
}
So, from the above, we can see that our problems include:
multiline quote sequences should be ignored within doublequotes. Unless those doublequotes are part of a comment.
single-line comment sequences can be contained in double-quoted strings, and in multiline strings.
Here's one possibility to address some of those issues, but far from perfect.
// Remove "-strings, //-comments and /*block-comments*/, then restore "-strings.
// Based on regex by mauke of Efnet's #regex.
$file = preg_replace('{("[^"]*")|//[^\n]*|(/\*.*?\*/)}s', '\1', $file);

try this:
$string = preg_replace("#\/\*\n?(.*)\*\/\n?#ms", "", $string);
Use # as regexp boundaries; change that u modifier with the right ones: m (PCRE_MULTILINE) and s (PCRE_DOTALL).
Reference: http://php.net/manual/en/reference.pcre.pattern.modifiers.php
It is important to note that my regexp does not find more than one "comment block"... Use of "dot match all" is generally not a good idea.

php string replace by str_replace issue

i made a function to replace a words in a string by putting new words from an array.
this is my code
function myseo($t){
$a = array('me','lord');
$b = array('Mine','TheLord');
$theseotext = $t;
$theseotext = str_replace($a,$b, $theseotext);
return $theseotext;
}
echo myseo('This is me Mrlord');
the output is
This is Mine MrTheLord
and it is wrong it should be print
This is Mine Mrlord
because word (Mrlord) is not included in the array.
i hope i explained my issue in good way. any help guys
regards

According to the code it is correct, but you want it to isolate by word. You could simply do this:
function myseo($t){
$a = array(' me ',' lord ');
$b = array(' Mine ',' TheLord ');
return str_replace($a,$b, ' '.$t.' ');
}
echo myseo('This is me Mrlord');
keep in mind this is kind of a cheap hack since I surround the replace string with empty spaces to ensure both sides get considered. This wouldn't work for punctuated strings. The alternate would be to break apart the string and replace each word individually.

str_replace doesn't look at full words only - it looks at any matching sequence of characters.
Thus, lord matches the latter part of Mrlord.

use str_ireplace instead, it's case insensitive.

Replace a character only in one special part of a string

When I've a string:
$string = 'word1="abc.3" word2="xyz.3"';
How can I replace the point with a comma after xyz in xyz.3 and keep him after abc in abc.3?

You've provided an example but not a description of when the content should be modified and when it should be kept the same. The solution might be simply:
str_replace("xyz.", "xyz", $input);
But if you explicitly want a more explicit match, say requiring a digit after the ful stop, then:
preg_replace("/xyz\.([0-9])+/", 'xyz\${1}', $input);
(not tested)

something like (sorry i did this with javascript and didn't see the PHP tag).
var stringWithPoint = 'word1="abc.3" word2="xyz.3"';
var nopoint = stringWithPoint.replace('xyz.3', 'xyz3');
in php
$str = 'word1="abc.3" word2="xyz.3"';
echo str_replace('xyz.3', 'xyz3', $str);

You can use PHP's string functions to remove the point (.).
str_replace(".", "", $word2);

It depends what are the criteria for replace or not.
You could split string into parts (use explode or preg_split), then replace dot in some parts (eg. str_replace), next join them together (implode).

how about:
$string = 'word1="abc.3" word2="xyz.3"';
echo preg_replace('/\.([^.]+)$/', ',$1', $string);
output:
word1="abc.3" word2="xyz,3"

Preg_match, Replace and back to string

sorry but i cant solve my problem, you know , Im a noob.
I need to find something in string with preg_match.. then replace it with new word using preg_replace, that's ok, but I don't understand how to put replaced word back to that string.
This is what I got
$text ='zda i "zda"';
preg_match('/"(\w*)"/', $text);
$najit = '/zda/';
$nahradit = 'zda';
$o = '/zda/';
$a = 'if';
$ahoj = preg_replace($najit, $nahradit, $match[1]);
Please, can you help me once again?

You can use e.g. the following code utilizing negative lookarounds to accomplish what you want:
$newtext = preg_replace('/(?<!")zda|zda(?!")/', 'if', $text)
It will replace any occurence of zda which is not enclosed in quotes on both sides (i.e. in U"Vzda"W the zda will be replaced because it is not enclosed directly into quotes).

Regular Expressions: how to do "option split" replaces

those reqular expressions drive me crazy. I'm stuck with this one:
test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not
Task:
Remove all [[ and ]] and if there is an option split choose the later one so output should be:
test1:link test2:silver test3:out1insideout2 test4:this|not
I came up with (PHP)
$text = preg_replace("/\\[\\[|\\]\\]/",'',$text); // remove [[ or ]]
this works for part1 of the task. but before that I think I should do the option split, my best solution:
$text = preg_replace("/\\[\\[(.*\|)(.*?)\\]\\]/",'$2',$text);
Result:
test1:silver test3:[[out1[[inside]]out2]] this|not
I'm stuck. may someone with some free minutes help me? Thanks!

I think the easiest way to do this would be multiple passes. Use a regular expression like:
\[\[(?:[^\[\]]*\|)?([^\[\]]+)\]\]
This will replace option strings to give you the last option from the group. If you run it repeatedly until it no longer matches, you should get the right result (the first pass will replace [[out1[[inside]]out2]] with [[out1insideout2]] and the second will ditch the brackets.
Edit 1: By way of explanation,
\[\[ # Opening [[
(?: # A non-matching group (we don't want this bit)
[^\[\]] # Non-bracket characters
* # Zero or more of anything but [
\| # A literal '|' character representing the end of the discarded options
)? # This group is optional: if there is only one option, it won't be present
( # The group we're actually interested in ($1)
[^\[\]] # All the non-bracket characters
+ # Must be at least one
) # End of $1
\]\] # End of the grouping.
Edit 2: Changed expression to ignore ']' as well as '[' (it works a bit better like that).
Edit 3: There is no need to know the number of nested brackets as you can do something like:
$oldtext = "";
$newtext = $text;
while ($newtext != $oldtext)
{
$oldtext = $newtext;
$newtext = preg_replace(regexp,replace,$oldtext);
}
$text = $newtext;
Basically, this keeps running the regular expression replace until the output is the same as the input.
Note that I don't know PHP, so there are probably syntax errors in the above.

This is impossible to do in one regular expression since you want to keep content in multiple "hierarchies" of the content. It would be possible otherwise, using a recursive regular expression.
Anyways, here's the simplest, most greedy regular expression I can think of. It should only replace if the content matches your exact requirements.
You will need to escape all backslashes when putting it into a string (\ becomes \\.)
\[\[((?:[^][|]+|(?!\[\[|]])[^|])++\|?)*]]
As others have already explained, you use this with multiple passes. Keep looping while there are matches, performing replacement (only keeping match group 1.)
Difference from other regular expressions here is that it will allow you to have single brackets in the content, without breaking:
test1:[[link]] test2:[[gold|si[lv]er]]
test3:[[out1[[in[si]de]]out2]] test4:this|not
becomes
test1:[[link]] test2:si[lv]er
test3:out1in[si]deout2 test4:this|not

Why try to do it all in one go. Remove the [[]] first and then deal with options, do it in two lines of code.
When trying to get something going favour clarity and simplicity.
Seems like you have all the pieces.

Why not just simply remove any brackets that are left?
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$str = preg_replace('/\\[\\[(?:[^|\\]]+\\|)+([^\\]]+)\\]\\]/', '$1', $str);
$str = str_replace(array('[', ']'), '', $str);

Well, I didn't stick to just regex, because I'm of a mind that trying to do stuff like this with one big regex leads you to the old joke about "Now you have two problems". However, give something like this a shot:
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not'; $reg = '/(.*?):(.*?)( |$)/';
preg_match_all($reg, $str, $m);
foreach($m[2] as $pos => $match) {
if (strpos($match, '|') !== FALSE && strpos($match, '[[') !== FALSE ) {
$opt = explode('|', $match); $match = $opt[count($opt)-1];
}
$m[2][$pos] = str_replace(array('[', ']'),'', $match );
}
foreach($m[1] as $k=>$v) $result[$k] = $v.':'.$m[2][$k];

This is C# using only using non-escaped strings, hence you will have to double the backslashes in other languages.
String input = "test1:[[link]] " +
"test2:[[gold|silver]] " +
"test3:[[out1[[inside]]out2]] " +
"test4:this|not";
String step1 = Regex.Replace(input, #"\[\[([^|]+)\|([^\]]+)\]\]", #"[[$2]]");
String step2 = Regex.Replace(step1, #"\[\[|\]\]", String.Empty);
// Prints "test1:silver test3:out1insideout2 test4:this|not"
Console.WriteLine(step2);

$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$s = preg_split("/\s+/",$str);
foreach ($s as $k=>$v){
$v = preg_replace("/\[\[|\]\]/","",$v);
$j = explode(":",$v);
$j[1]=preg_replace("/.*\|/","",$j[1]);
print implode(":",$j)."\n";
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_split : How to get what's before the split - php

Dot is a special character in regex which matches any character , you need to escape it in-order to match a literal dot. $string = "software_technical_item.TI"; $var = preg_split('/\./',$string); echo($var[0]); Output: software_technical_item

Related

Erasing C comments with preg_replace

php string replace by str_replace issue

Replace a character only in one special part of a string

Preg_match, Replace and back to string

Regular Expressions: how to do "option split" replaces

Categories

Resources