I need to split multiple lines in multiple files by different delimiters. I think preg_split should do the job but i never worked with PCRE REGEX stuff. I could also change all my delimiters to be consistent but that adds unnecessary calculations.
Q: My delimiters consist of (,)(;)(|)(space) and i am curious how to build such a REGEX.
Put the characters in square brackets []:
$parts = preg_split('/[,;| ]/', $string, null, PREG_SPLIT_NO_EMPTY);
You can also use \s instead of a space character, which matches all kinds of whitspace, such as tabs and newlines.
Try this:
$string = "foo:bar|it;is:simple";
print_r(preg_split ( '/,|;|\||\s/' , $string ));
Related
I have a database of texts that contains this kind of syntax in the middle of English sentences that I need to turn into HTML links using PHP
"text1(text1)":http://www.example.com/mypage
Notes:
text1 is always identical to the text in parenthesis
The whole string always have the quotation marks, parenthesis, colon, so the syntax is the same for each.
Sometimes there is a space at the end of the string, but other times there is a question mark or comma or other punctuation mark.
I need to turn these into basic links, like
text1
How do I do this? Do I need explode or regex or both?
"(.*?)\(\1\)":(.*\/[a-zA-Z0-9]+)(?=\?|\,|\.|$)
You can use this.
See Demo.
http://regex101.com/r/zF6xM2/2
You can use this replacement:
$pattern = '~"([^("]+)\(\1\)":(http://\S+)(?=[\s\pP]|\z)~';
$replacement = '\1';
$result = preg_replace($pattern, $replacement, $text);
pattern details:
([^("]+) this part will capture text1 in the group 1. The advantage of using a negated character class (that excludes the double quote and the opening parenthesis) is multiple:
it allows to use a greedy quantifier, that is faster
since the class excludes the opening parenthesis and is immediatly followed by a parenthesis in the pattern, if in an other part of the text there is content between double quotes but without parenthesis inside, the regex engine will not go backward to test other possibilities, it will skip this substring without backtracking. (This is because the PCRE regex engine converts automatically [^a]+a into [^a]++a before processing the string)
\S+ means all that is not a whitespace one or more times
(?=[\s\pP]|\z) is a lookahead assertion that checks that the url is followed by a whitespace, a punctuation character (\pP) or the end of the string.
You can use this regex:
"(.*?)\(.*?:(.*)
Working demo
An appropriate Regular Expression could be:
$str = '"text1(text1)":http://www.example.com/mypage';
preg_match('#^"([^\(]+)' .
'\(([^\)]+)\)[^"]*":(.+)#', $str, $m);
print ''.$m[2].'' . PHP_EOL;
I have a string and it contains some words that I want to reach, seperators can be any string that consist of , ; or a space.
Here is a example:
;,osman,ali;, mehmet ;ahmet,ayse; ,
I need to take words osman ali mehmet ahmet and ayse to an array or any type that I can use them one by one. I tried it by using preg function but i couldn't figure out.
If anyone help, I will be appreciative.
$words = preg_split('/[,;\s]+/', $str, -1, PREG_SPLIT_NO_EMPTY);
[,;\s] is a character group which means match any of the characters contained in this group.
\s matches any white space character (space, tab, newline, etc.). If this is too much just replace it with a space: [,; ].
+ means match one or more of the preceding symbol or group.
DEMO
http://www.regular-expressions.info/ is a good site to learn regular expressions.
You want to use preg_split and use [;, ]+ for your regex to split on
$keywords = preg_split("/[;, ]+/", $yourstring);
Split on non-word characters:
$array=preg_split("/\W+/", $string);
I have the following string in php:
$string = 'FEDCBA9876543210';
The string can be have 2 or more (I mean more) hexadecimal characters
I wanted to group string by 2 like :
$output_string = 'FE:DC:BA:98:76:54:32:10';
I wanted to use regex for that, I think I saw a way to do like "recursive regex" but I can't remember it.
Any help appreciated :)
If you don't need to check the content, there is no use for regex.
Try this
$outputString = chunk_split($string, 2, ":");
// generates: FE:DC:BA:98:76:54:32:10:
You might need to remove the last ":".
Or this :
$outputString = implode(":", str_split($string, 2));
// generates: FE:DC:BA:98:76:54:32:10
Resources :
www.w3schools.com - chunk_split()
www.w3schools.com - str_split()
www.w3schools.com - implode()
On the same topic :
Split string into equal parts using PHP
Sounds like you want a regex like this:
/([0-9a-f]{2})/${1}:/gi
Which, in PHP is...
<?php
$string = 'FE:DC:BA:98:76:54:32:10';
$pattern = '/([0-9A-F]{2})/gi';
$replacement = '${1}:';
echo preg_replace($pattern, $replacement, $string);
?>
Please note the above code is currently untested.
You can make sure there are two or more hex characters doing this:
if (preg_match('!^\d*[A-F]\d*[A-F][\dA-F]*$!i', $string)) {
...
}
No need for a recursive regex. By the way, recursive regex is a contradiction in terms. As a regular language (which a regex parses) can't be recursive, by definition.
If you want to also group the characters in pairs with colons in between, ignoring the two hex characters for a second, use:
if (preg_match('!^[\dA-F]{2}(?::[A-F][\dA-F]{2})*$!i', $string)) {
...
}
Now if you want to add the condition requiring tow hex characters, use a positive lookahead:
if (preg_match('!^(?=[\d:]*[A-F][\d:]*[A-F])[\dA-F]{2}(?::[A-F][\dA-F]{2})*$!i', $string)) {
...
}
To explain how this works, the first thing it does it that it checks (with a positive lookahead ie (?=...) that you have zero or more digits or colons followed by a hex letter followed by zero or more digits or colons and then a letter. This will ensure there will be two hex letters in the expression.
After the positive lookahead is the original expression that makes sure the string is pairs of hex digits.
Recursive regular expressions are usually not possible. You may use a regular expression recursively on the results of a previous regular expression, but most regular expression grammars will not allow recursivity. This is the main reason why regular expressions are almost always inadequate for parsing stuff like HTML. Anyways, what you need doesn't need any kind of recursivity.
What you want, simply, is to match a group multiple times. This is quite simple:
preg_match_all("/([a-z0-9]{2})+/i", $string, $matches);
This will fill $matches will all occurrences of two hexadecimal digits (in a case-insensitive way). To replace them, use preg_replace:
echo preg_replace("/([a-z0-9]{2})/i", $string, '\1:');
There will probably be one ':' too much at the end, you can strip it with substr:
echo substr(preg_replace("/([a-z0-9]{2})/i", $string, '\1:'), 0, -1);
While it is not horrible practice to use rtrim(chunk_split($string, 2, ':'), ':'), I prefer to use direct techniques that avoid "mopping up" after making modifications.
Code: (Demo)
$string = 'FEDCBA9876543210';
echo preg_replace('~[\dA-F]{2}(?!$)\K~', ':', $string);
Output:
FE:DC:BA:98:76:54:32:10
Don't be intimidated by the regex. The pattern says:
[\dA-F]{2} # match exactly two numeric or A through F characters
(?!$) # that is not located at the end of the string
\K # restart the fullstring match
When I say "restart the fullstring match" I mean "forget the previously matched characters and start matching from this point forward". Because there are no additional characters matched after \K, the pattern effectively delivers the zero-width position where the colon should be inserted. In this way, no original characters are lost in the replacement.
I've created this regex
(www|http://)[^ ]+
that match every http://... or www.... but I dont know how to make preg_replace that would work, I've tried
preg_replace('/((www|http://)[^ ]+)/', '\1', $str);
but it doesn't work, the result is empty string.
You need to escape the slashes in the regex because you are using slashes as the delimiter. You could also use another symbol as the delimiter.
// escaped
preg_replace('/((www|http:\/\/)[^ ]+)/', '\1', $str);
// another delimiter, '#'
preg_replace('#((www|http://)[^ ]+)#', '\1', $str);
When using the regex codes provided by the other users, be sure to add the "i" flag to enable case-insensitivity, so it'll work with both HTTP:// and http://. For example, using chaos's code:
preg_replace('!(www|http://[^ ]+)!i', '\1', $str);
First of all, you need to escape—or even better, replace—the delimeters as explained in the other answers.
preg_replace('~((www|http://)[^ ]+)~', '\1', $str);
Secondly, to further improve the regex, the $n replacement reference syntax is preferred over \\n, as stated in the manual.
preg_replace('~((www|http://)[^ ]+)~', '$1', $str);
Thirdly, you are needlessly using capturing parentheses, which only slows things down. Get rid of them. Don't forget to update $1 to $0. In case you are wondering, these are non-capturing parentheses: (?: ).
preg_replace('~(?:www|http://)[^ ]+~', '$0', $str);
Finally, I would replace [^ ]+ with the shorter and more accurate \S, which is the opposite of \s. Note that [^ ]+ does not allow spaces, but accepts newlines and tabs! \S does not.
preg_replace('~(?:www|http://)\S+~', '$0', $str);
Your main problem seems to be that you are putting everything in parentheses, so it doesn't know what "\1" is. Also, you need to escape the "/". So try this:
preg_replace('/(www|http:\/\/[^ ]+)/', '\1', $str);
Edit: It actually seems the parentheses were not an issue, I misread it. The escaping was still an issue as others also pointed out. Either solution should work.
preg_replace('!((?:www|http://)[^ ]+)!', '\1', $str);
When you use / as your pattern delimiter, having / inside your pattern will not work out well. I solved this by using ! as the pattern delimiter, but you could escape your slashes with backslashes instead.
I also didn't see any reason why you were doing two paren captures, so I removed one of them.
Part of the trouble in your situation is that you're running with warnings suppressed; if you had error_reporting(E_ALL) on, you'd have seen the messages PHP is trying to generate about your delimiter problem in your regex.
If there are multiple url contained in a string a separated by a line break instead of a space, you have to use the \S
preg_replace('/((www|http:\/\/)\S+)/', '$1', $val);
How can I match a space character in a PHP regular expression?
I mean like "gavin schulz", the space in between the two words. I am using a regular expression to make sure that I only allow letters, number and a space. But I'm not sure how to find the space. This is what I have right now:
$newtag = preg_replace("/[^a-zA-Z0-9s|]/", "", $tag);
If you're looking for a space, that would be " " (one space).
If you're looking for one or more, it's " *" (that's two spaces and an asterisk) or " +" (one space and a plus).
If you're looking for common spacing, use "[ X]" or "[ X][ X]*" or "[ X]+" where X is the physical tab character (and each is preceded by a single space in all those examples).
These will work in every* regex engine I've ever seen (some of which don't even have the one-or-more "+" character, ugh).
If you know you'll be using one of the more modern regex engines, "\s" and its variations are the way to go. In addition, I believe word boundaries match start and end of lines as well, important when you're looking for words that may appear without preceding or following spaces.
For PHP specifically, this page may help.
From your edit, it appears you want to remove all non valid characters The start of this is (note the space inside the regex):
$newtag = preg_replace ("/[^a-zA-Z0-9 ]/", "", $tag);
# ^ space here
If you also want trickery to ensure there's only one space between each word and none at the start or end, that's a little more complicated (and probably another question) but the basic idea would be:
$newtag = preg_replace ("/ +/", " ", $tag); # convert all multispaces to space
$newtag = preg_replace ("/^ /", "", $tag); # remove space from start
$newtag = preg_replace ("/ $/", "", $tag); # and end
Cheat Sheet
Here is a small cheat sheet of everything you need to know about whitespace in regular expressions:
[[:blank:]]
Space or tab only, not newline characters. It is the same as writing [ \t].
[[:space:]] & \s
[[:space:]] and \s are the same. They will both match any whitespace character spaces, newlines, tabs, etc...
\v
Matches vertical Unicode whitespace.
\h
Matches horizontal whitespace, including Unicode characters. It will also match spaces, tabs, non-breaking/mathematical/ideographic spaces.
x (eXtended flag)
Ignore all whitespace. Keep in mind that this is a flag, so you will add it to the end of the regex
like /hello/gmx. This flag will ignore whitespace in your regular expression.
For example, if you write an expression like /hello world/x, it will match helloworld, but not hello world. The extended flag also allows comments in your regex.
Example
/helloworld #hello this is a comment/
If you need to use a space, you can use \ to match spaces.
To match exactly the space character, you can use the octal value \040 (Unicode characters displayed as octal) or the hexadecimal value \x20 (Unicode characters displayed as hex).
Here is the regex syntax reference: https://www.regular-expressions.info/nonprint.html.
In Perl the switch is \s (whitespace).
I am using a regex to make sure that I
only allow letters, number and a space
Then it is as simple as adding a space to what you've already got:
$newtag = preg_replace("/[^a-zA-Z0-9 ]/", "", $tag);
(note, I removed the s| which seemed unintentional? Certainly the s was redundant; you can restore the | if you need it)
If you specifically want *a* space, as in only a single one, you will need a more complex expression than this, and might want to consider a separate non-regex piece of logic.
It seems to me like using a REGEX in this case would just be overkill. Why not just just strpos to find the space character. Also, there's nothing special about the space character in regular expressions, you should be able to search for it the same as you would search for any other character. That is, unless you disabled pattern whitespace, which would hardly be necessary in this case.
You can also use the \b for a word boundary. For the name I would use something like this:
[^\b]+\b[^\b]+(\b|$)
EDIT Modifying this to be a regex in Perl example
if( $fullname =~ /([^\b]+)\b[^\b]+([^\b]+)(\b|$)/ ) {
$first_name = $1;
$last_name = $2;
}
EDIT AGAIN Based on what you want:
$new_tag = preg_replace("/[\s\t]/","",$tag);
Use it like this to allow for a single space.
$newtag = preg_replace("/[^a-zA-Z0-9\s]/", "", $tag)
I'm trying out [[:space:]] in an instance where it looks like bloggers in WordPress are using non-standard space characters. It looks like it will work.
This matches tires better because not all vendors use the same size format. I deal with many vendors all doing size in different format. This is my expression for now
/^[\d][\d](?:\d)?(?:\-|\/|\s)?([?:\d]+)?(?:\.)?(?:\d)?(?:\d)?(?:R|-|\s)?[1-3]([?:[\d]+)?(?:\.)?([?:\d])?(?:\s|-)/img
will catch all
35-12.50-22 HAIDA[AA]
35-12-22 HAIDA[AA]
35/35R20
35/35r20
thus uis a test
rrrrr
awdg
3345588
225-45-17 ACCELERA[AC]
195 50 16 KELLY
1955016 KELLY
CP671"
158 Buckshot
165-40-16-ACHILLES
11-24.5-16-LEAO-LLA08
11-24.5-LEAO-D37
11-22.5-14-LINGLONG-LLD37
11-22.5-HAPPYROAD[AA]