tcl query for string pattern matching - php

I have a Tcl variable with the following contents:
m_hscaclbmbmmer_v11_2
m_letmmcbterbox_v1_2
m_osbbbbcmd_v16_0 v_proc_ss_v1_0
m_rmgbbb2cycrcb_v17_4
m_nscalbbbcer_v8_2
m_smpte2mbbc022_m12_rx_v2_3
m_smpte2m02cm2_12_tx_v2_2
m_smpte20m2mcm2_56_rx_v5_4
m_smpte202m2_c56_tx_v4_0
m_smpbbbte_sdbbci_v3_0
m_smpte_uhdmsdic_v1_2
m_tmmmc_v6_4
m_tpmmmg_v17_1
m_vcresamcpler_v1_2
m_vid_mminc_axi4s_v14_1
m_voip_femcc_rx_v11_3
m_voip_fmbmecc_tx_v1_2
m_vscalebcr_v1_4
m_ycrcb2bncrgb_v7_4
mid_phy_cnmcbmbontroller_v2_2
mibmmbbo_v3_4
miterbcbmbi_v9_4
madc_wicbmbz_v3_4
mambmbuic_v12_2 mvxfftc_v9_4
The last name, mvxfftc_v9_4, needs to be changed into microsemi.com:ip:mvxfft:9.4, the same needs to be done to all names. How do I do that?

The problem is slightly under-specified, but I'm assuming that you are taking each name like this (marking the parts to be extracted):
mvxfftc_v9_4
^^^^^^ ^ ^
and transforming that to (marking the parts inserted):
microsemi.com:ip:mvxfft:9.4
^^^^^^ ^ ^
That's not too hard with regsub, and since we're changing a well-formed word to a well-formed word, we can probably just use it directly on the variable as string processing without needing to map it over the list explicitly:
set changed [regsub -all {\y(\w+)\w_v(\d+)_(\d+)\y} $original {microsemi.com:ip:\1:\2.\3}]
# \y is a beginning-or-end-of-word anchor
# \w means any alphanumeric-or-underscore
# \d means any digit
# \1, \2 and \3 are substituted by the parenthesised matches
There's a limit to the complexity of mappings that you can do this way, but maybe this will be enough.

Maybe you could try (assuming your list of names is in name):
foreach name $names {
puts microsemi.com:ip:[regsub {_v(\d)_(\d)$} $name {:\1.\2}]
}
It's not pretty, but it does what you seem to be saying that you want done.
If you want a list instead of a printout, you could use this (Tcl 8.6 or later, leaves result in result):
set result [lmap name $names {
lindex microsemi.com:ip:[regsub {_v(\d)_(\d)$} $name {:\1.\2}]
}]
or this (most versions):
set result {}
foreach name $names {
lappend result microsemi.com:ip:[regsub {_v(\d)_(\d)$} $name {:\1.\2}]
}
Documentation: foreach, lappend, lindex, lmap, lmap replacement, puts, regsub, set
On regular expression writing:
http://tcl-lang.org/man/tcl8.5/tutorial/Tcl20.html (and the two following)
http://wiki.tcl.tk/_//search?submit=Search&S=regular+expression&charset=UTF-8
There are also a few sites dedicated to regular expressions for various programming languages: a Google search should help you.

Related

Unique regex for name validation

I want to check is the name valid with regex PHP, but i need a unique regex that allows:
Letters (upper and lowercase)
Spaces (max 2)
But there can't be a space after space..
For example:
Name -> Dennis Unge Shishic (valid)
Name -> Denis(space)(space) (not valid)
Hope you guys understand me, thank you :)
First, it's worth mentioning that having such restrictive rules for the names of persons is a very bad idea. However, if you must, a simple character class like this will limit you to just uppercase and lowercase English letters:
[A-Za-z]
To match one or more, you need to add a + after it. So, this will match the first part of the name:
[A-Za-z]+
To capture a second name, you just need to do the same thing preceded by a space, so something like this will capture two names:
[A-Za-z]+ [A-Za-z]+
To make the second name optional, you need to surround it by parentheses and add a ? after it, like this:
[A-Za-z]+( [A-Za-z]+)?
And to add a third name, you just need to do it again:
[A-Za-z]+( [A-Za-z]+)? [A-Za-z]+
Or, you could specify that the latter names can repeat between 1 and 2 times, like this:
[A-Za-z]+( [A-Za-z]+){1,2}
To make the resulting code easy to understand and maintain, you could use two Regex. One checking (by requiring it to be true) that only the allowed characters are used ^[a-zA-Z ]+$ and then another one, checking (by requiring it to be false) that there are no two (or more) adjacent spaces ( ){2,}
Try following working code:
Change input to whatever you want to test and see correct validation result printed
<?php
$input_line = "Abhishek Gupta";
preg_match("/[a-zA-Z ]+/", $input_line, $nameMatch);
preg_match("/\s{2,}/", $input_line, $multiSpace);
var_dump($nameMatch);
var_dump($multiSpace);
if(count($nameMatch)>0){
if(count($multiSpace)>0){
echo "Invalid Name Multispace";
}
else{
echo "Valid Name";
}
}
else{
echo "Invalid Name";
}
?>
A regex for one to three words consisting of only Unicode letters in PHP looks like
/^\p{L}+(?:\h\p{L}+){1,2}\z/u
Description:
^ - string start
\p{L}+ - one or more Unicode letters
(?:\h\p{L}+){1,2} - one or two sequences of a horizontal whitespace followed with one or more Unicode letters
\z - end of string, even disallowing trailing newline that a dollar anchor allows.

Delete multiple file for/while

I have a php pull down that I select an item and delete
all files associated with it.
It works well if there was only 5 or 6. After I put the
first 4 to test and get it working I realized it could
take a very long time to enter in a couple hundred and
would blot the script.
Not knowing enough about for and while loops is there
anyone that might have a way to help?
There will never be more than one set deleted at a time.
Thanks in advance.
<?php
$workitem = $_POST["workitem"];
$workdirPAth = "/var/work.files/";
if($workitem == 'item1.php')
{
unlink("$workdirPath/page1.php");
unlink("$workdirPath/temp1.php");
unlink("$workdirPath/all1.php");
}
if($workitem == 'item2.php')
{
unlink("$workdirPath/page2.php");
unlink("$workdirPath/temp2.php");
unlink("$workdirPath/all2.php");
}
if($workitem == 'item3.php')
{
unlink("$workdirPath/page3.php");
unlink("$workdirPath/temp3.php");
unlink("$workdirPath/all3.php");
}
if($workitem == 'item4.php')
{
unlink("$workdirPath/page4.php");
unlink("$workdirPath/temp4.php");
unlink("$workdirPath/all3.php");
?>
Some simple pattern matching and substitution is all you need here.
First, the code:
1. if (preg_match('/^item(\d+)\.php$/', $workitem, $matches)) {
2. $number = $matches[1];
3. foreach(array('page','temp','all') as $base) {
4. unlink("$workdirPath/$base$number.php");
5. }
6. } else {
7. # unrecognized work item value; complain to user or whatever
8. }
The preg_match function takes a pattern, a string, and an array. If the string matches the pattern, the parts that match are stored in the array. The particular type of pattern is a *p*erl5-compatible *reg*ular expression, which is where the preg_ part of the name comes from.
Regular expressions are scary-looking to the uninitiated, but they're a handy way to scan a string and get some values out of it. Most characters just represent themselves; the string "foo" matches the regular expression /foo/. But some characters have special meanings that let you make more general patterns to match a whole set of strings where you don't have to know ahead of time exactly what's in them.
The /s just mark the beginning and end of the actual regular expression; they're there because you can stick additional modifier flags inside the string along with the expression itself.
The ^and $ arepresent the beginning and end of the string. "/foo/" matches "foo", but also "foobar", "bunnyfoofoo", and so on - any string that contains "foo" will match. But /^foo$/ matches only "foo" exactly.
\d means "any digit". + means "one or more of that last thing". So \d+ means "one or more digits".
The period (.) is special; it matches any character at all. Since we want a literal period, we have to escape it with a backslash; \. just matches a period.
So our regular expression is '/^item\d+\.php$/', which will match any itemnumber.php filename. But that's not quite enough. The preg_match function is basically a binary test: does the string match the pattern or not, yes or no? In this case, it's not enough to just say "yup, the string is valid"; we need to know which items specifically the user specified. That's what capture groups are for. We use parentheses to say "remember what matched this part", and provide an array name that gets filled with those remembrances.
The part of the string that matches the whole regular expression (which may not be the whole string, if the regular expression isn't anchored with ^...$ like this one is) is always put in element 0 of the array. If you use parentheses in the regular expression, then the part of the string that matches the part of the regular expression inside the first pair of parentheses is stored in element 1 of the array; if there's a second set of parentheses, the matching part of the string goes in element 2 of the array, and so on.
So we put parentheses around our number ((\d+)) and then the actual number will be remembered in element 1 of our $matches array.
Great, we have a number. Now we just need to use it to build up the filenames we want to delete.
In each case, we want to delete three files: page$n.php, temp$n.php, and all$n.php, where $n is the number we extracted above. We could just put three unlink calls, but since they're all so similar, we can use a loop instead.
Take the different prefixes that are the same no matter the number, and make an array out of them. Then loop over that array. In the body of the loop, the variable $base will contain whichever element of the array it's currently on. Stick that between the $workdirPath prefix and the $number we got from the match, append .php, and that's your file. unlink it and go back to the top of the loop to grab the next one.

"OR" operator in RegEx syntax

OK, I've worked with RegEx numerous times but this is one of the things I honestly can't get my head around. And it looks as if I'm missing something rather simple...
So, let's say we want to match "AB" or "AC". In other words, "A" followed by either "B" OR "C".
This would be expressed like A[BC] or A[B|C] or A(B|C) and so on.
Now, what if A,B,C are not just single letters but sub-expressions?
Please, have a look at this example here (well, I admit it doesn't look that... simple! lol) : http://regexr.com?382a4
I'm trying to match capital = (and its variations) followed by either :
Pattern 1
Pattern 2
Why is it that using the | operator only works on the latter part (my regex also matches "Pattern 2" withOUT preceding capital =). Please note that I've also tried using positive look-arounds, but without any success.
Any ideas?
Your original regex could be summarized as:
capital = (ABC)|(DEF)
This matches capital = ABC or DEF. Add an extra pair of () that wraps the | clause properly.
Demo here
I suppose this regexp:
capital = (ABC|XYZ)
should work (if I did correctly understand your request...)
Actually [B|C] is incorrect, (B|C) is correct.
Character classes
In RegEx jargon [] is called a character class and it is used to represent one (single) character according to the options listed between the brackets.
In your case [B|C] matches either B or | or C. We can correct this by using [BC] to match either B or C. This matches exactly one character either B or C.
Capturing groups
In RegEx jargon () is called a capturing group. It is used to create boundaries between adjacent groups and whatever it matches will be present in the output array of a preg_match or as a variable in preg_replace.
Within that group you can us the | operator to specify that you want to match either whatever's before or whatever's after the operator.
This can be used to match strings with more than one characters such as (Ana|Maria) or various structures such as ([a-zA-Z]+|[0-9]+).
You can also use the | outside of a capturing group such as (group-1)|(group-2) and you can also use subgrouping such as ((group-1)|(group-2)).

How check different spellings of a persons full name

I try to create a regular expression with searches in a huge document for a persons full name. In the text the name can be written in full, or the first names can be either abbreviated to a single letter or a letter followed by a dot or omitted. For instance my search for _ALBERTO JORGE ALONSO CALEFACCION_now is:
preg_match('/([;:.,&\s\xc2\-(){}!"'<>]{1})(ALBERTO|A.|A)[\s\xc2-]+
(JORGE|J.|J)?[\s\xc2,]+(ALONSO)[\s\xc2*-]+(CALEFACCION))([;:.,&\s\xc2(){}
!"'<>]{1})/i', $text, $match);
Between the first names and last names an asterisk (*) can be present.
This is working for the case all first names are at least present some way. But I don't know to extend the expression when first names are omitted. Can you help me?
Let's start by simplifying what you have;
start:
/([;:.,&\s\xc2\-(){}!"'<>]{1})(ALBERTO|A.|A)[\s\xc2-]+(JORGE|J.|J)?[\s\xc2,]+(ALONSO)[\s\xc2*-]+(CALEFACCION)([;:.,&\s\xc2(){}!"'<>]{1})/i
as I said in my comment, \b is "word break", so you can simplify a lot of that:
/\b(ALBERTO|A.|A)[\s\xc2-]+(JORGE|J.|J)?[\s\xc2,]+(ALONSO)[\s\xc2*-]+(CALEFACCION)\b/i
(added bonus: it won't match the characters either side now, and it will match at the start and end of the text)
Next, you can use the ? token for the dots (which should be escaped by the way; . is special and means "match anything")
/\b(ALBERTO|A\.?)[\s\xc2-]+(JORGE|J\.?)?[\s\xc2,]+(ALONSO)[\s\xc2*-]+(CALEFACCION)\b/i
Finally, to actually answer your question, you have 2 choices. Either make the entire bracketed name optional, or add a new blank option. The first is the most flexible, since we'll need to cope with the whitespace too:
/\b((ALBERTO|A\.?)[\s\xc2-]+((JORGE|J\.?)[\s\xc2,]+)?)?(ALONSO)[\s\xc2*-]+(CALEFACCION)\b/i
Note that if you're reading the matched parts you'll need to update your indices. Also note that this fixed an issue where omitting the second name (JORGE) still required an extra space.
This will match things like A. J. ALONSO CALEFACCION, A. ALONSO CALEFACCION and ALONSO CALEFACCION, but not J. ALONSO CALEFACCION (it's only a small tweak if you do want that)
Breaking up that final string for clarity:
/\b
(
(ALBERTO|A\.?)[\s\xc2-]+
(
(JORGE|J\.?)[\s\xc2,]+
)?
)?
(ALONSO)[\s\xc2*-]+
(CALEFACCION)
\b/i
Finally, it's an odd thought, but you could change the names which can be initials to be in this form: (A(LBERTO|\.|)), which means you're not repeating the initials (a potential source of mistakes)

RegEx with character set inside positive lookbehind, Is it possible?

I need to match "name" only after "listing", but of course those words could be any url directory or page.
mydomain.com/listing/name
so the only thing I can "REGuest" (request) is to be some parent directory there.
In other words, I want to match the "position" i.e. whatever comes 2nd after the domain.
I'm trying something like
(?<=mydomain\.com/[^/\?&]+/)[^/\?&]+(?:/)?
But the character set won't work inside the positive lookbehind, at least it's setup to match only ONE character. As soon as I try to match other than one (e.g. modify it with +, ? or *) it just stops working.
I'm obviously missing the positive lookbehind syntax and it seems not intended for what I'm trying.
How can I match that 2nd level filename?
Thanks.
Regular-expressions.info states that
The bad news is that most regex flavors do not allow you to use just
any regex inside a lookbehind, because they cannot apply a regular
expression backwards. Therefore, the regular expression engine needs
to be able to figure out how many steps to step back before checking
the lookbehind...
(Read further, they even mention Perl, Python and Java.)
I think the quantifier might be the problem. I found this on stackoverflow and briefly flew over it.
Wouldn't it be possible to just match the whole path, and use a group for the second level filename:
mydomain\.com\/[^\/\?&]+\/([^\/\?&]+)(?:\/)?
(note: I had to escape the / for my tests...)
The result of this would be something like:
Array
(
[0] => mydomain.com/listing/name
[1] => name
)
Now, because I don't know the context of your problem, I just assumed you would be able to postprocess the results and get the group 1 (index 1) from the result. If not, I unfortunately don't know...

Categories