Regular Expressions - preg or substr? - php

I have several values(strings mostly) which I want to process and I was wondering which way is the best to use in my case.
What I will have will be a foreach loop in which I want to have a check and insert into the database edited values.
Structure example:
foreach($values as $value)
{
//string check is going to be here
// .....
//insert the data into the database
$sql = "INSERT INTO results VALUES ('', 'xx', 'xx', 'xx', 'xx')";
$result = mysql_query($sql);
}
Example of values I might get:
value, value
Somewhere in the universe
3.532523, -55.523525
value - value
What I want to do is not accept (actually change their value to 'Not given')
a) numbers
b) strings longer than 10 characters
c) no spaces between the words
d) if there is a , or - only keep the first part of the string before these
A string check which I am testing for example is if I have a value like this
=> string, string
I only want to keep the first part, which is done by
$str = value, value $str = substr($str, 0, stripos($str, ','));
With which technique I will be able to do all these checks better? (preg_match & replace or substr & stipos)

This regex may match what you want to keep, but will allow numbers and _:
/^\w{,10}/
Fix numbers allowed with a negative lookahead and avoids _:
/^(?=\d{,9}[a-zA-Z])[a-zA-Z]{,10}/
If digits are not allowed at all:
/^[a-zA-Z]{,10}/

Related

how to replace space of a string with "," and convert into array

I have a text box.
I am enter the value in text box like 12 13 14.
and i am want to convert this into 12,13,14 and then convert it into array and show each separate value.
If your form field asks for the values without a comma, then you will need to explode the POST data by space. What you're doing now is imploding it by comma (you can't implode a string to begin with), and then trying to pass that into a foreach loop. However, a foreach loop will only accept an array.
$ar = explode(' ',$da);
That simple change should fix it for you. You will want to get rid of the peculiar die() after your foreach (invalid syntax, and unclear what you're trying to do there!), and validate your data before the loop instead. By default, if you explode a string and no matching delimiters are found, the result will be an array with a single key, which you can pass into a loop without a problem.
Are you sure you want to expect the user enters data in that particular format? I mean, what if the user uses more than one space character, or separate the numbers actually with commas? or with semicolons? or enters letters instead of numbers? Anyway.. at least you could transform all the spaces to a single space character and then do the explode() as suggested:
$da = trim(preg_replace('/\s+/', ' ', $_POST['imp']));
$ar = explode(' ', $da);
before your foreach().
use explode instead of implode as
The explode() function breaks a string into an array.
The implode() function returns a string from the elements of an array.
and you cannot do foreach operation for a string.
$da=$_POST['imp'];
$ar = explode(' ',$da);
foreach($ar as $k)
{
$q="insert into pb_100_fp (draw_3_fp) values ('".mysqli_real_escape_string($conn, $k)."')";
$rs=mysqli_query($conn, $q);
echo $k.",";
}
then you will get this output
o/p : 12,13,14,

How to get a expression inside parentheses and commas inside a string (regex/PHP)

I think my question is so easy to be solved, but I can't.
I want to take this words inside of my query string:
$string = "INSERT INTO table (a,b) VALUES ('foo', 'bar')";
Expected result:
array one = [a,b]
array two = [foo, bar]
There are many regex strategies you could use for this depending on how flexible you need it to be. Here is one very simple implementation which assumes that you know the string 'VALUES' is in all caps, and there is exactly one space before and after 'VALUES' and the two sets of parenthesis.
$string = "INSERT INTO table (a,b) VALUES ('foo', 'bar')";
$matches = array();
// we escape literal parenthesis in the regex, and also add
// grouping parenthesis to capture the sections we're interested in
preg_match('/\((.*)\) VALUES \((.*)\)/', $string, $matches);
// make sure the matches you expect to be found are populated
// before referencing their array indices. index 0 is the match of
// our entire regex, indices 1 and 2 are our two sets of parens
if (!empty($matches[1]) && !empty($matches[2]) {
$column_names = explode(',', $matches[1]); // array of db column names
$values = explode(',', $matches[2]); // array of values
// you still have some clean-up work to do here on the data in those arrays.
// for instance there may be empty spaces that need to be trimmed from
// beginning/ending of some of the strings, and any of the values that were
// quoted need the quotation marks removed.
}
This is only a starting point, be sure to test it on your data and revise the regex as needed!
I recommend using a regex tester to test your regex string against actual query strings you need it to work on. http://regexpal.com/ (There are many others)

preg_replace with a word in an array

I am trying to use certain words in a array called keywords, which will be used to be replaced in a string by "as".
for($i = 0; $i<sizeof($this->keywords[$this->lang]); $i++)
{
$word = $this->keywords[$this->lang][$i];
$a = preg_replace("/\b$word\b/i", "as",$this->code);
}
It works with if I replace the variable $word with something like /\bhello\b/i, which then would replace all hello words with "as".
Is the approach am using even possible?
Before to be a pattern, it's a double quoted string, so variables will be replaced, it's not the problem.
The problem is that you use a loop to change several words and you store the result in $a:
the first iteration, all the occurences of the first word in $this->code are replaced and the new string is stored in $a.
but the next iteration doesn't reuse $a as third parameter to replace the next word, but always the original string $this->code
Result: after the for loop $a contains the original string but with only the occurences of the last word replaced with as.
When you want to replace several words with the same string, a way consists to build an alternation: word1|word2|word3.... It can easily be done with implode:
$alternation = implode('|', $this->keywords[$this->lang]);
$pattern = '~\b(?:' . $alternation . ')\b~i';
$result = preg_replace($pattern, 'as', $this->code);
So, when you do that, the string is parsed only once and all the words are replaced in one shot.
If you have a lot of words and a very long string:
Testing a long alternation has a significant cost. Even if the pattern starts with \b that highly reduces the possible positions for a match, your pattern will have hard time to succeed and more to fail.
Only in this particular case, you can use this another way:
First you define a placeholder (a character or a small string that can't be in your string, lets say §) that will be inserted in each positions of word boundaries.
$temp = preg_replace('~\b~', '§', $this->code);
Then you change all the keywords like this §word1§, §word2§ ... and you build an associative array where all values are the replacement string:
$trans = [];
foreach ($this->keywords[$this->lang] as $word) {
$trans['§' . $word . '§'] = 'as';
}
Once you have do that you add an empty string with the placeholder as key. You can now use the fast strtr function to perform the replacement:
$trans['§'] = '';
$result = strtr($temp, $trans);
The only limitation of this technic is that it is case-sensitive.
it will work if you keep it like bellow:
$a = preg_replace("/\b".$word."\b/i", "as",$this->code);

PHP - smart, error tolerating string comparison

I'm looking either for routine or way to look for error tolerating string comparison.
Let's say, we have test string Čakánka - yes, it contains CE characters.
Now, I want to accept any of following strings as OK:
cakanka
cákanká
ČaKaNKA
CAKANKA
CAAKNKA
CKAANKA
cakakNa
The problem is, that I often switch letters in word, and I want to minimize user's frustration with not being able (i.e. you're in rush) to write one word right.
So, I know how to make ci comparison (just make it lowercase :]), I can delete CE characters, I just can't wrap my head around tolerating few switched characters.
Also, you often put one character not only in wrong place (character=>cahracter), but sometimes shift it by multiple places (character=>carahcter), just because one finger was lazy during writing.
Thank you :]
Not sure (especially about the accents / special characters stuff, which you might have to deal with first), but for characters that are in the wrong place or missing, the levenshtein function, that calculates Levenshtein distance between two strings, might help you (quoting) :
int levenshtein ( string $str1 , string $str2 )
int levenshtein ( string $str1 , string $str2 , int $cost_ins , int $cost_rep , int $cost_del )
The Levenshtein distance is defined as
the minimal number of characters you
have to replace, insert or delete to
transform str1 into str2
Other possibly useful functions could be soundex, similar_text, or metaphone.
And some of the user notes on the manual pages of those functions, especially the manual page of levenshtein might bring you some useful stuff too ;-)
You could transliterate the words to latin characters and use a phonetic algorithm like Soundex to get the essence from your word and compare it to the ones you have. In your case that would be C252 for all of your words except the last one that is C250.
Edit    The problem with comparative functions like levenshtein or similar_text is that you need to call them for each pair of input value and possible matching value. That means if you have a database with 1 million entries you will need to call these functions 1 million times.
But functions like soundex or metaphone, that calculate some kind of digest, can help to reduce the number of actual comparisons. If you store the soundex or metaphone value for each known word in your database, you can reduce the number of possible matches very quickly. Later, when the set of possible matching value is reduced, then you can use the comparative functions to get the best match.
Here’s an example:
// building the index that represents your database
$knownWords = array('Čakánka', 'Cakaka');
$index = array();
foreach ($knownWords as $key => $word) {
$code = soundex(iconv('utf-8', 'us-ascii//TRANSLIT', $word));
if (!isset($index[$code])) {
$index[$code] = array();
}
$index[$code][] = $key;
}
// test words
$testWords = array('cakanka', 'cákanká', 'ČaKaNKA', 'CAKANKA', 'CAAKNKA', 'CKAANKA', 'cakakNa');
echo '<ul>';
foreach ($testWords as $word) {
$code = soundex(iconv('utf-8', 'us-ascii//TRANSLIT', $word));
if (isset($index[$code])) {
echo '<li> '.$word.' is similar to: ';
$matches = array();
foreach ($index[$code] as $key) {
similar_text(strtolower($word), strtolower($knownWords[$key]), $percentage);
$matches[$knownWords[$key]] = $percentage;
}
arsort($matches);
echo '<ul>';
foreach ($matches as $match => $percentage) {
echo '<li>'.$match.' ('.$percentage.'%)</li>';
}
echo '</ul></li>';
} else {
echo '<li>no match found for '.$word.'</li>';
}
}
echo '</ul>';
Spelling checkers do something like fuzzy string comparison. Perhaps you can adapt an algorithm based on that reference. Or grab the spell checker guessing code from an open source project like Firefox.

Validate measurements with PHP

I need to validate measurements entered into a form generated by PHP.
I intend to compare them to upper and lower control limits and decide if they fail or pass.
As a first step, I imagine a PHP function which accepts strings representing engineering measurements and converts them to pure numbers before the comparison.
At the moment I'm only expecting measurements of small voltages and currents, so strings like
'1.234uA', '2.34 nA', '39.9mV'. or '-1.003e-12'
will be converted to
1.234e-6, 2.34e-9, 3.99e-2 and -1.003e-12, respectively.
But the method should be generalisable to any measured quantity.
function convert($value) {
$units = array('p' => 'e-12',
'n' => 'e-9',
'u' => 'e-6',
'm' => 'e-3');
$unitstring = implode("", array_keys($units));
$matches = array();
$pattern = "/^(-?(?:\\d*\.\\d+)|(?:\\d+))\s*([$unitstring])([a-z])$/i";
$result = preg_match($pattern, $value, $matches);
if ($result)
$retval = $matches[1].$units[$matches[2]].$matches[3];
else
$retval = $value;
return $retval;
}
So to explain what the above does:
$units is an array to map unit-prefix to the exponent.
$unitstring conglomerates the units into a single string (in the example it would be 'pnum')
The regular expression will match an optional -, followed by either 0 or more digits, a period and 1 or more digits OR 1 or more digits, followed by one of the unit prefixes (only one) and then a single alphabetical character. There can be any amount of whitespace between the number and the units.
Because of the parethesis and the use of preg_match, the number section, the unit prefix, and the unit are all separately captured in the array $matches as elements 1, 2, and 3. (0 will contain the entire string)
$result will be 1 if it matched the regex, 0 otherwise.
$retval is constructed by just connecting the number, the exponent (based on the unit prefix from the array) and the units provided, or it will just be the passed in string (such as if you're given the -1.003e-12, it will be returned)
Of course you can tweak some things, but in general this is a good start. Hope it helps.
In your function
first you need to initialize values for units like -6 for u, -3 for m...etc
divide the string in Number and Unit(i.e micro(u),mili(m),etc).
and then say the entered no is NUM; and unit is UNIT..(char like u,m etc);
while(NUM>10)
{
NUM=NUM/10;
x++; //x is keeping track of the DOT.
}
UNIT=UNIT+x; //i.e UNIT is increased(for M,K,etc) or decreased(for u,m,etc)
echo NUM.e.UNIT;
May be it will do!
My own possibly simple-minded approach has been to use an array of patterns in preg_replace
function convert($value) {
$result = preg_replace($patterns, $replacements, $value);
return $result;
}
Where
$patterns = array('/p[av]/i', '/n[av]/i', '/u[av]/i', '/m[av]/i');
$replacements = array('e-12', 'e-9', 'e-6', 'e-3');
And it could be extended to higher prefixes, but it seems heavy-handed to keep adding increasingly complex regexes to the $patterns array.
Edit: The comparison, later, should interpret the return value as a real number.
I'm hoping someone can suggest something more elegant.

Categories