i have this function in php :
$html_price = "text text 12 eur or $ 22,01 text text";
preg_match_all('/(?<=|^)(?:[0-9]{1,3}(?:,| ?[0-9]{3})*(?:.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9] *)?)(?:\ |)(?:\$|usd|eur)+(?=|$)|(?:\$|usd|eur)(?:| )(?:[0-9]{1,3}(?:,| ?[0-9]{3})*(?:.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9] *)?)/', strtolower($html_price), $price_array1);
print_r($price_array1);
Now i want use in javascript the same regex but i have an error:
SyntaxError: invalid quantifier
my javascript :
maxime_string = "((?<=|^)(?:[0-9]{1,3}(?:,| ?[0-9]{3})*(?:.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9] *)?)(?:\ |)(?:\$|usd|eur|euro|euros|firm|obro|€|£|gbp|dollar|aud|cdn|sgd|€)+(?=|$)|(?:\$|usd|eur|euro|euros|firm|obro|€|£|gbp|dollar|aud|cdn|sgd|€)(?:| )(?:[0-9]{1,3}(?:,| ?[0-9]{3})*(?:.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9] *)?))";
maxime_regex = new RegExp(maxime_string);
var text = "text text 12 eur or $ 22,01 text text";
var result = text.match(maxime_regex);
var max = result.length;
alert(result.length);
for (i = 0; i < max; i++)
{
document.write(result[i] + "<br/>");
}
Can you help me ?
JavaScript does not support lookbehind - that's probably where you got your error from. Some things:
(?<=|^) makes no sense. It's a lookbehind demanding either nothing or string begin to come before this - it's always true. And if you want to match the string start, use a single ^.
(?:.[0-9]*)? - I guess you wanted to escape the dot so it matches literally
(?:\ |): You do not need to escape blanks. And instead of an alternative "nothing", you should just make the blank optional: " ?".
(?=|$) makes as much sense as the first group.
€ is a broken unicode character?
Also, you should consider using a regex literal instead of the RegExp constructor with a string literal - it eats one level of backslash escaping. And it seems you're missing the global flag, so that you get back all matches. Try this:
var maxime_regex = /((?:[0-9]{1,3}(?:,| ?[0-9]{3})*(?:\.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9] *)?) ?(?:\$|usd|eur|euro|euros|firm|obro|€|£|gbp|dollar|aud|cdn|sgd|€)+|(?:\$|usd|eur|euro|euros|firm|obro|€|£|gbp|dollar|aud|cdn|sgd|€) ?(?:[0-9]{1,3}(?:,| ?[0-9]{3})*(?:\.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9] *)?))/g
This post does not include any effort to understand, optimize or shorten that regex monster
Related
What is the regular expression (in JavaScript if it matters) to only match if the text is an exact match? That is, there should be no extra characters at other end of the string.
For example, if I'm trying to match for abc, then 1abc1, 1abc, and abc1 would not match.
Use the start and end delimiters: ^abc$
It depends. You could
string.match(/^abc$/)
But that would not match the following string: 'the first 3 letters of the alphabet are abc. not abc123'
I think you would want to use \b (word boundaries):
var str = 'the first 3 letters of the alphabet are abc. not abc123';
var pat = /\b(abc)\b/g;
console.log(str.match(pat));
Live example: http://jsfiddle.net/uu5VJ/
If the former solution works for you, I would advise against using it.
That means you may have something like the following:
var strs = ['abc', 'abc1', 'abc2']
for (var i = 0; i < strs.length; i++) {
if (strs[i] == 'abc') {
//do something
}
else {
//do something else
}
}
While you could use
if (str[i].match(/^abc$/g)) {
//do something
}
It would be considerably more resource-intensive. For me, a general rule of thumb is for a simple string comparison use a conditional expression, for a more dynamic pattern use a regular expression.
More on JavaScript regexes: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
"^" For the begining of the line "$" for the end of it. Eg.:
var re = /^abc$/;
Would match "abc" but not "1abc" or "abc1". You can learn more at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
I need to erase all comments in $string which contains data from some C file.
The thing I need to replace looks like this:
something before that shouldnt be replaced
/*
* some text in between with / or * on many lines
*/
something after that shouldnt be replaced
and the result should look like this:
something before that shouldnt be replaced
something after that shouldnt be replaced
I have tried many regular expressions but neither work the way I need.
Here are some latest ones:
$string = preg_replace("/\/\*(.*?)\*\//u", "", $string);
and
$string = preg_replace("/\/\*[^\*\/]*\*\//u", "", $string);
Note: the text is in UTF-8, the string can contain multibyte characters.
You would also want to add the s modifier to tell the regex that .* should include newlines. I always think of s to mean "treat the input text as a single line"
So something like this should work:
$string = preg_replace("/\\/\\*(.*?)\\*\\//us", "", $string);
Example: http://codepad.viper-7.com/XVo9Tp
Edit: Added extra escape slashes to the regex as Brandin suggested because he is right.
I don't think regexp fit good here. What about wrote a very small parse to remove this? I don't do PHP coding for a long time. So, I will try to just give you the idea (simple alogorithm) I haven't tested this, it's just to you get the idea, as I said:
buf = new String() // hold the source code without comments
pos = 0
while(string[pos] != EOF) {
if(string[pos] == '/') {
pos++;
while(string[pos] != EOF)
{
if(string[pos] == '*' && string[pos + 1] == '/') {
pos++;
break;
}
pos++;
}
}
buf[buf_index++] = string[pos++];
}
where:
string is the C source code
buf a dynamic allocated string which expands as needed
It is very hard to do this perfectly without ending up writing a full C parser.
Consider the following, for example:
// Not using /*-style comment here.
// This line has an odd number of " characters.
while (1) {
printf("Wheee!
(*\/*)
\\// - I'm an ant!
");
/* This is a multiline comment with a // in, and
// an odd number of " characters. */
}
So, from the above, we can see that our problems include:
multiline quote sequences should be ignored within doublequotes. Unless those doublequotes are part of a comment.
single-line comment sequences can be contained in double-quoted strings, and in multiline strings.
Here's one possibility to address some of those issues, but far from perfect.
// Remove "-strings, //-comments and /*block-comments*/, then restore "-strings.
// Based on regex by mauke of Efnet's #regex.
$file = preg_replace('{("[^"]*")|//[^\n]*|(/\*.*?\*/)}s', '\1', $file);
try this:
$string = preg_replace("#\/\*\n?(.*)\*\/\n?#ms", "", $string);
Use # as regexp boundaries; change that u modifier with the right ones: m (PCRE_MULTILINE) and s (PCRE_DOTALL).
Reference: http://php.net/manual/en/reference.pcre.pattern.modifiers.php
It is important to note that my regexp does not find more than one "comment block"... Use of "dot match all" is generally not a good idea.
I need a regex to match a string not enclosed by another different, specific string. For instance, in the following situation it would split the content into two groups: 1) The content before the second {Switch} and 2) The content after the second {Switch}. It wouldn't match the first {Switch} because it is enclosed by {my_string}'s. The string will always look like shown below (i.e. {my_string}any content here{/my_string})
Some more
{my_string}
Random content
{Switch} //This {Switch} may or may not be here, but should be ignored if it is present
More random content
{/my_string}
Content here too
{Switch}
More content
So far I've gotten what is below which I know isn't very close at all:
(.*?)\{Switch\}(.*?)
I'm just not sure how to use the [^] (not operator) with a specific string versus different characters.
It really seems you're trying to use a regular expression to parse a grammar - something that regular expressions are really bad at doing. You might be better off writing a parser to break down your string into the tokens that build it, and then processing that tree.
Perhaps something like http://drupal.org/project/grammar_parser might help.
Try this simple function:
function find_content()
function find_content($doc) {
$temp = $doc;
preg_match_all('~{my_string}.*?{/my_string}~is', $temp, $x);
$i = 0;
while (isset($x[0][$i])) {
$temp = str_replace($x[0][$i], "{REPL:$i}", $temp);
$i++;
}
$res = explode('{Switch}', $temp);
foreach ($res as &$part)
foreach($x[0] as $id=>$content)
$part = str_replace("{REPL:$id}", $content, $part);
return $res;
}
Use it this way
$content_parts = find_content($doc); // $doc is your input document
print_r($content_parts);
Output (your example)
Array
(
[0] => Some more
{my_string}
Random content
{Switch} //This {Switch} may or may not be here, but should be ignored if it is present
More random content
{/my_string}
Content here too
[1] =>
More content
)
You can try positive lookahead and lookbehind assertions (http://www.regular-expressions.info/lookaround.html)
It might look something like this:
$content = 'string of text before some random content switch text some more random content string of text after';
$before = preg_quote('String of text before');
$switch = preg_quote('switch text');
$after = preg_quote('string of text after');
if( preg_match('/(?<=' $before .')(.*)(?:' $switch .')?(.*)(?=' $after .')/', $content, $matches) ) {
// $matches[1] == ' some random content '
// $matches[2] == ' some more random content '
}
$regex = (?:(?!\{my_string\})(.*?))(\{Switch\})(?:(.*?)(?!\{my_string\}));
/* if "my_string" and "Switch" aren't wrapped by "{" and "}" just remove "\{" and "\}" */
$yourNewString = preg_replace($regex,"$1",$yourOriginalString);
This might work. Can't test it know, but i'll update later!
I don't if this is what you're looking for, but to negate more than one character, the regex syntax is:
(?!yourString)
and it is called "negative lookahead assertion".
/Edit:
This should work and return true:
$stringMatchesYourRulesBoolean = preg_match('~(.*?)('.$my_string.')(.*?)(?<!'.$my_string.') ?('.$switch.') ?(?!'.$my_string.')(.*?)('.$my_string.')(.*?)~',$yourString);
Have a look at PHP PEG. It is a little parser written in PHP. You can write your own grammar and parse it. It's going to be very simple in your case.
The grammar syntax and the way of parsing is all explained in the README.md
Extracts from the readme:
token* - Token is optionally repeated
token+ - Token is repeated at least one
token? - Token is optionally present
Tokens may be :
- bare-words, which are recursive matchers - references to token rules defined elsewhere in the grammar,
- literals, surrounded by `"` or `'` quote pairs. No escaping support is provided in literals.
- regexs, surrounded by `/` pairs.
- expressions - single words (match \w+)
Sample grammar: (file EqualRepeat.peg.inc)
class EqualRepeat extends Packrat {
/* Any number of a followed by the same number of b and the same number of c characters
* aabbcc - good
* aaabbbccc - good
* aabbc - bad
* aabbacc - bad
*/
/*Parser:Grammar1
A: "a" A? "b"
B: "b" B? "c"
T: !"b"
X: &(A !"b") "a"+ B !("a" | "b" | "c")
*/
}
I have a database of more than 70000 records and its primary key value started from 1 single digit.
So I want user have to type nm0000001 instead of 1 in url.And in code part I have to discard the rest of the value except 1.
But my problem is i want this type of things having 9 letters in the string and the pattern is like this
1 - nm0000001
9 - nm0000009
10 - nm0000010
2020 - nm0002020
And from the above pattern i want only the digits like 1,9,10,2020 in php.
Here:
$i = (int)substr($input, 2);
No reason to use regexes at all.
Anyway, if you're insisting on using regexp, then:
$input = 'nm0002020';
preg_match('~0*(\d+)$~', $input, $matches);
var_dump($matches[1]);
Assuming the value is received in the URL as a querystring parameter, that is, passed via $_GET['id'] or some other name than id:
// Trim the "nm" off the front
$pk = substr($_GET['id'],2);
// And parse out an integer value.
$id = intval($pk);
There's absolutely no use for regular expressions in this -- use sprintf("nm%07d", ...) to format and just substr and a cast to int to parse.
This function will do the trick:
function extractID($pInput)
{
$Matches = array();
preg_match('/^nm0*(.*)$/', $pInput, $Matches);
return intval($Matches[1]);
}
Here's why /^nm0+(.*)$/ works:
The line must start (^) with exactly one nm
The pattern must continue with at least one 0
At the first non-zero character after nm0..., capture the value (that's the job of the parentheses)
Continue until the end of the line ($)
So I'm working on a project that will allow users to enter poker hand histories from sites like PokerStars and then display the hand to them.
It seems that regex would be a great tool for this, however I rank my regex knowledge at "slim to none".
So I'm using PHP and looping through this block of text line by line and on lines like this:
Seat 1: fabulous29 (835 in chips)
Seat 2: Nioreh_21 (6465 in chips)
Seat 3: Big Loads (3465 in chips)
Seat 4: Sauchie (2060 in chips)
I want to extract seat number, name, & chip count so the format is
Seat [number]: [letters&numbers&characters] ([number] in chips)
I have NO IDEA where to start or what commands I should even be using to optimize this.
Any advice is greatly appreciated - even if it is just a link to a tutorial on PHP regex or the name of the command(s) I should be using.
I'm not entirely sure what exactly to use for that without trying it, but a great tool I use all the time to validate my RegEx is RegExr which gives a great flash interface for trying out your regex, including real time matching and a library of predefined snippets to use. Definitely a great time saver :)
Something like this might do the trick:
/Seat (\d+): ([^\(]+) \((\d+)in chips\)/
And some basic explanation on how Regex works:
\d = digit.
\<character> = escapes character, if not part of any character class or subexpression. for example:
\t
would render a tab, while \\t would render "\t" (since the backslash is escaped).
+ = one or more of the preceding element.
* = zero or more of the preceding element.
[ ] = bracket expression. Matches any of the characters within the bracket. Also works with ranges (ex. A-Z).
[^ ] = Matches any character that is NOT within the bracket.
( ) = Marked subexpression. The data matched within this can be recalled later.
Anyway, I chose to use
([^\(]+)
since the example provides a name containing spaces (Seat 3 in the example). what this does is that it matches any character up to the point that it encounters an opening paranthesis.
This will leave you with a blank space at the end of the subexpression (using the data provided in the example). However, his can easily be stripped away using the trim() command in PHP.
If you do not want to match spaces, only alphanumerical characters, you could so something like this:
([A-Za-z0-9-_]+)
Which would match any letter (within A-Z, both upper- & lower-case), number as well as hyphens and underscores.
Or the same variant, with spaces:
([A-Za-z0-9-_\s]+)
Where "\s" is evaluated into a space.
Hope this helps :)
Look at the PCRE section in the PHP Manual. Also, http://www.regular-expressions.info/ is a great site for learning regex. Disclaimer: Regex is very addictive once you learn it.
I always use the preg_ set of function for REGEX in PHP because the PERL-compatible expressions have much more capability. That extra capability doesn't necessarily come into play here, but they are also supposed to be faster, so why not use them anyway, right?
For an expression, try this:
/Seat (\d+): ([^ ]+) \((\d+)/
You can use preg_match() on each line, storing the results in an array. You can then get at those results and manipulate them as you like.
EDIT:
Btw, you could also run preg_match_all on the entire block of text (instead of looping through line-by-line) and get the results that way, too.
Check out preg_match.
Probably looking for something like...
<?php
$str = 'Seat 1: fabulous29 (835 in chips)';
preg_match('/Seat (?<seatNo>\d+): (?<name>\w+) \((?<chipCnt>\d+) in chips\)/', $str, $matches);
print_r($matches);
?>
*It's been a while since I did php, so this could be a little or a lot off.*
May be it is very late answer, But I am interested in answering
Seat\s(\d):\s([\w\s]+)\s\((\d+).*\)
http://regex101.com/r/cU7yD7/1
Here's what I'm currently using:
preg_match("/(Seat \d+: [A-Za-z0-9 _-]+) \((\d+) in chips\)/",$line)
To process the whole input string at once, use preg_match_all()
preg_match_all('/Seat (\d+): \w+ \((\d+) in chips\)/', $preg_match_all, $matches);
For your input string, var_dump of $matches will look like this:
array
0 =>
array
0 => string 'Seat 1: fabulous29 (835 in chips)' (length=33)
1 => string 'Seat 2: Nioreh_21 (6465 in chips)' (length=33)
2 => string 'Seat 4: Sauchie (2060 in chips)' (length=31)
1 =>
array
0 => string '1' (length=1)
1 => string '2' (length=1)
2 => string '4' (length=1)
2 =>
array
0 => string '835' (length=3)
1 => string '6465' (length=4)
2 => string '2060' (length=4)
On learning regex: Get Mastering Regular Expressions, 3rd Edition. Nothing else comes close to the this book if you really want to learn regex. Despite being the definitive guide to regex, the book is very beginner friendly.
Try this code. It works for me
Let say that you have below lines of strings
$string1 = "Seat 1: fabulous29 (835 in chips)";
$string2 = "Seat 2: Nioreh_21 (6465 in chips)";
$string3 = "Seat 3: Big Loads (3465 in chips)";
$string4 = "Seat 4: Sauchie (2060 in chips)";
Add to array
$lines = array($string1,$string2,$string3,$string4);
foreach($lines as $line )
{
$seatArray = explode(":", $line);
$seat = explode(" ",$seatArray[0]);
$seatNumber = $seat[1];
$usernameArray = explode("(",$seatArray[1]);
$username = trim($usernameArray[0]);
$chipArray = explode(" ",$usernameArray[1]);
$chipNumber = $chipArray[0];
echo "<br>"."Seat [".$seatNumber."]: [". $username."] ([".$chipNumber."] in chips)";
}
you'll have to split the file by linebreaks,
then loop thru each line and apply the following logic
$seat = 0;
$name = 1;
$chips = 2;
foreach( $string in $file ) {
if (preg_match("Seat ([1-0]): ([A-Za-z_0-9]*) \(([1-0]*) in chips\)", $string, $matches)) {
echo "Seat: " . $matches[$seat] . "<br>";
echo "Name: " . $matches[$name] . "<br>";
echo "Chips: " . $matches[$chips] . "<br>";
}
}
I haven't ran this code, so you may have to fix some errors...
Seat [number]: [letters&numbers&characters] ([number] in chips)
Your Regex should look something like this
Seat (\d+): ([a-zA-Z0-9]+) \((\d+) in chips\)
The brackets will let you capture the seat number, name and number of chips in groups.