Extracting Sub-String using regex PHP

Extracting Sub-String using regex PHP - php

I have a need to extract a sub-string from a longer string. I know how I would approach it using PHP posstr(); and strpos();, but the data is very large and I suspect that it would be more efficient if I could extract the part string using regex.
For example, if I have a number, (say a latitude) that has the format
"3203.79453"
where the the two characters before and "all" the characters after the decimal point represent decimal seconds, then to obtain the decimal latitude I need to compute the following:
32 + (03.79453)/60 = 32.06324217
So in essence I need a regex method of extracting the sub-string "03.79453".
So two questions how do I achieve it using regex and is it faster than using the method of using strpos() and posstr().
Thanks

It's easy to achieve with both options:
substr($line, strpos($line, '.') - 2);
or:
preg_match("/(\d{2}\..*)/", $line, $matches);
As for performance, I guess you would need to benchmark it. I've done a quick test to compare the performance of each example by running one million reps of each of those lines:
preg_match: average around 1.6 seconds for 1,000,000 matches
substr: average around 0.85 seconds for 1,000,000 matches
In this case it seems clear that using substr is the winner in terms of performance.

You could use preg_replace() like so:
<?php
$geoCoordinate = "3203.79453";
$degrees = preg_replace("#(\d{2}\.\d*?$)#", "", $geoCoordinate);
$seconds = preg_replace("#(\d*?)(\d{2}\.\d*?)#", "$2", $geoCoordinate);
$degAndSecs = round($degrees + ($seconds/60), 8);
var_dump($degAndSecs); //<== PRODUCES::: float 32.06324217

Related

PHP String Pattern Task

A simple problem.
I have the following string "48063974806397"
You will notice that this is just "4806397" repeated twice.
I need a way to recognize the repeat point, and just get the first instance of the pattern. E.g final return should just be "4806397".
(The length of the first number will not always be the same.)
I wanted to return this a variable in php.
How could I do this?
Thanks

If it's always just a string duplicated twice, then it's as simple as just taking the first half of the string:
$halfstring = substr($string, 0, strlen($string) / 2);
Use strlen() to get the length of the string, and divide that by 2. Then use substr() to just get the first half.

If that's always a number, math helps:
$halfStr = $n / (pow(10, strlen($n) / 2) + 1);

Regular expression to match an exact number of occurrence for a certain character

I'm trying to check if a string has a certain number of occurrence of a character.
Example:
$string = '123~456~789~000';
I want to verify if this string has exactly 3 instances of the character ~.
Is that possible using regular expressions?

Yes
/^[^~]*~[^~]*~[^~]*~[^~]*$/
Explanation:
^ ... $ means the whole string in many regex dialects
[^~]* a string of zero or more non-tilde characters
~ a tilde character
The string can have as many non-tilde characters as necessary, appearing anywhere in the string, but must have exactly three tildes, no more and no less.

As single character is technically a substring, and the task is to count the number of its occurences, I suppose the most efficient approach lies in using a special PHP function - substr_count:
$string = '123~456~789~000';
if (substr_count($string, '~') === 3) {
// string is valid
}
Obviously, this approach won't work if you need to count the number of pattern matches (for example, while you can count the number of '0' in your string with substr_count, you better use preg_match_all to count digits).
Yet for this specific question it should be faster overall, as substr_count is optimized for one specific goal - count substrings - when preg_match_all is more on the universal side. )

I believe this should work for a variable number of characters:
^(?:[^~]*~[^~]*){3}$
The advantage here is that you just replace 3 with however many you want to check.
To make it more efficient, it can be written as
^[^~]*(?:~[^~]*){3}$

This is what you are looking for:
EDIT based on comment below:
<?php
$string = '123~456~789~000';
$total = preg_match_all('/~/', $string);
echo $total; // Shows 3

Regular Expression to extract dynamic strings up to a certain point

my RegEx is written here and it does not work no matter how I change it, substitute characters what not. I have a list of strings that may have 3 words or 8 words. Is there a easier way to cut off the RegEx when we hit a certain character or string? Let me show you what I mean:
Here are some examples of strings I will deal with:
WKT8100 Cooperative Education Work Term Preparation 15 hrs/w
CST8259 Web Programming Languages II 5 hrs/w
CST8265 Web Security Basics 5 hrs/w
CST8267 Ecommerce 4 hrs/w
I want to extract only the course name and ID from the string and leave out the number of hours I need, so leaving me with:
WKT8100 Cooperative Education Work Term Preparation
as a return.
My RegEx currently is like this:
RegEx = "/[a-zA-Z]{3}[0-9]{4}[A-Z]{0,1}\s[a-zA-Z]{3,20}\s[a-zA-Z]{0,20}\s[a-zA-Z]{0,20}\s[a-zA-Z]{0,20}\s/";
I a RegEx that extracts the hours correctly so maybe if there is a method I can use with substr. That way I can basically extract everything before the hours RegEx and don't have to worry about a complex RegEx line.:
HoursRegEx = "#\s[0-9]{1,2}?\shrs\/w#i";

Why not:
/(.*) \d+ hrs\/w/
This should capture all characters before the x hrs/w part.
For a little more explanation, this just creates a capturing group that contains whatever it found before seeing: a space, one or more digits, another space, and then the sequence "hrs/w". Since you don't care what's before the end part, why try to recognize it?

If it always ends in " hrs/w", you can do this:
$string = "WKT8100 Cooperative Education Work Term Preparation 15 hrs/w";
$string = trim($string)
$lastSpace = strrpos($string, " ");
$string = trim(substr($string, 0, $lastSpace));
$lastSpace = strrpos($string, " ");
$hours = trim(substr($string, $lastSpace));
$nameID = trim(substr($string, 0, $lastSpace));
That's a way off the top of my head w/o using regex. I can't give you any regex without first doing some extensive refresher research.
p.s. Jordan's looks much cleaner.

PHP: Split string into 2 after length 700 and . (period)

I cant seem to figure the best approach to a PHP problem. I want to accomplish the following
I get a string that is, ie. 1000 characters in lenght
I want to split a string into 2.
The first string need to be 600 characters based on the following condition:
a) String should only be split if after a period
The second section of the string can be the remainder.
I know how to check the length of a string strlen($string) and I know how to explode a string into substrings using ie. explode(). However, I am not sure how to bring everything together.

I have used, its works.. you have try this...
<?php
$app_title="HIOX INDIA.COM, a leading business web hosting company, is
currently involved in web services, software/application development, web content
development, web hosting, domain registration, internet solutions and web design.";
echo "<br>Before :".$app_title;
$length=100;
if(strlen($app_title) > $length) {
$app_title1 = substr($app_title, 0,strpos($app_title, ' ', $length));}
$app_title2=split ( $app_title1 , $app_title);
echo "<br><br>After1 :".$app_title1;
echo "<br><br>After2 :".$app_title2[1];
?>

Use substr() to yank out everything after 600 chars from the original string.
Do a strpos() on that resulting sub-string to find the first .
Use the pos + 600 to do a substr on the original string and use that position as your split point.

Using substr() you can get part of a string defining a initial position and the lenght of the substring. For example:
Getting the first 600 characters:
$first = substr ( $input , 0 , 600 );
Getting the remaining:
$second = substr ( $input , 600 );

Split data without delimiter?

Ok say I have my phone numbers stored in my table as:
"0008675309"
I obviously wouldn't want to display it just like that, I'd want to format it when I call it as:
(000)867-5309
Would it be better to store it in the database with a delimiter such as / - or . So that I can split it later? Or is it possible to split it by the number of characters?

The performance cost and code to process a phone number in any of those formats is simple, so it's really up to your preference. To answer your question, it is very easy to grab the first three characters, the next three, and the last four using for example, substr function.

Here is a one liner that does what you want:
$phone = preg_replace('^(\d{3})(\d{3})(\d{4})$', '($1)$2-$3', $phone);
As a added bonus it won't change the format if the input format doesn't match (international numbers).

If you are only storing North American phone numbers (10 digits), then as #mellamokb noted, you're ok either way. If you may be storing international numbers, you should capture as much detail as you can early on (if possible) since it might be hard to know how to punctuate the number later on.

use preg_split with PREG_SPLIT_NO_EMPTY

The other answers are perfectly correct. In case you wanted the actual code for it, I think the following should do the trick (the indexes may be off by one oops!):
$phone_number="0008675309"
$phone_number=substr_replace($phone_number, "(", 0, 0);
$phone_number=substr_replace($phone_number, ")", 4, 0);
$phone_number=substr_replace($phone_number, "-", 8, 0);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extracting Sub-String using regex PHP - php

Related

PHP String Pattern Task

Regular expression to match an exact number of occurrence for a certain character

Regular Expression to extract dynamic strings up to a certain point

PHP: Split string into 2 after length 700 and . (period)

Split data without delimiter?

Categories

Resources