I have some strings I need to scrape data from. I need a simple way of telling PHP to look in the string and delete data before and after the part I need. An example is:
When: Sat 19 Sep 2009 22:00 to Sun 20 Sep 2009 03:00
I want to delete the "When: " and then remove the & and everything after it. Is this a Regex thing? Not really used them before.
I would not use regular expressions for this.
$data = substr($input, 6, strpos($input, '&') - 6);
Yes, regex can do this kind of thing in its sleep.
$result = preg_replace('/When:(.*)&.*/', '$1', $text);
UPDATE
If you want to find the date range only, in the middle of a lot of other text, here is a crude regex that will match the one in the question...
if (preg_match('/[a-z]{3} [0-9]{2} [a-z]{3} [0-9]{4} [0-9]{2}:[0-9]{2} to [a-z]{3} [0-9]{2} [a-z]{3} [0-9]{4} [0-9]{2}:[0-9]{2}/i', $text, $regs)) {
$result = $regs[0];
} else {
$result = "";
}
So you would want to keep "Sat 19 Sep 2009 22:00 to Sun 20 Sep 2009 03:00"
Well you can go for a regexp alright. I don't know much about the Regexp in PHP, but in PERL, you could do somehing like
/^When: (.*)\ $/ .
The (.*) could then be used to get all that is what you want to keep. In PERL, that would be looking the $1 var.
Or you could do something like
/^When: (.)\&.$/ if the content after the & is variable.
Also, you must watch out. If the string you want to keep contains &, then it might a little more tricky.
But RegExp are usually the way to got for this type of work.
Related
I want to replace , with :character that is located in between [].
So [Hello, as] a, booby will change to [Hello: as] a, booby. I cannot figure out how to match the comma within brackets, I can match the word inside brackets with
\[(.*)\] but I don't know how to pick the comma from there.
Also if I get [[Hello, as] a, booby], then I also want to change only the first comma. I tried to use * or + but it doesn't work.
I need this
[["Sender", "mail#text.org"], ["Date", "Fri, 09 Jun 2017 13:29:22 +0000"]]
To became this
[["Sender": "mail#text.org"], ["Date": "Fri, 09 Jun 2017 13:29:22 +0000"]]
I wanted to use preg_replace but I It was not the right solution.
preg_replace("/(\[[^],]*),/U" , ':', $arr)
returns
": mail#text.org"], : "Fri, 09 Jun 2017 13:29:22 +0000"]
This seems as simple as I can make it: (Demo Link)
(?<="),
It makes some assumptions about your nested psuedo array values.
PHP Implementation:
$in='[["Sender", "mail#text.org"], ["Date", "Fri, 09 Jun 2017 13:29:22 +0000"], ["Name", "Dude"]]';
echo preg_replace('/(?<="),/',':',$in);
Output:
[["Sender": "mail#text.org"], ["Date": "Fri, 09 Jun 2017 13:29:22 +0000"], ["Name": "Dude"]]
If this doesn't suit your actual strings, please provide a string where my pattern fails, so that I can adjust it. Extending the pattern to ensure that that comma follows the quoted "key" can be done like this: "[^"]+"\K, ...at a slightly higher step cost (but still not bad).
Try grouping everything before and after the comma, then put them back around the colon.
preg_replace('/(\[.*?),(.*?\])/','$1:$2',$string)
You can use a \G based pattern:
$str = preg_replace('~(?:\G(?!\A)|\[(?=[^][]*]))[^][,]*\K,~', ':', $str);
This kind of pattern starts with 2 subpatterns in an alternation:
\[(?=[^][]*]) that searches a [ followed by a ] without other brackets between them.
\G(?!\A) that matches at the position after a previous match
Then, in the two cases [^][,]*\K, reaches the next , that can only be between [ and ].
But since you also need to skip commas between double quotes, you have to match double quotes parts before an eventual comma. To do that, change [^][,]* to [^][",]*(?:"[^"\\]*(?s:\\.[^"\\]*)*"[^][",]*)*+
$str = preg_replace('~(?:\G(?!\A)|\[(?=[^][]*]))[^][",]*+(?:"[^"\\\\]*(?s:\\\\.[^"\\\\]*)*"[^][",]*)*+\K,~', ':', $str);
demo
i have a string which is something like
<?php
$string = Monday & SUNDAY 11:30 PM et/pt;
//or
$string = Monday 11:30 PM et/pt;
?>
i want to fetch '11:30 PM' in both the cases for which i guess i cant use explode so what will be the regular expression for this ,,,also please tell me something pretty nice to learn regular expressions.
Thanks in advance
Credit goes to the commenters below for several fixes to the original approach, but there were still some unresolved issues.
If you want a fixed 2 hour format: (0[0-9]|1[0-2]):[0-5]\d [AP]M
to validly match a twelve-our-clock i'd use a regex like below. A twelve-hour-clock goes from 01:00 to 12:59:
$regex = "#\b(?:0[0-9]|1[0-2]):[0-5][0-9] [AP]M\b#i";
Malik, to retrieve time/date you might use premade library regexes, search this query: http://regexlib.com/Search.aspx?k=digit&c=5&m=-1&ps=20
Basically your time fields are similar, (having the same delimiter ':' ), i'd recommend simple regex: \d{1,2}:\d{2} [PA]M to match in the input string. If you want make it case-insensitive use i, pattern modifier.
For the basics of regex welcome to read here.
I give you this match function for PHP (i after second slash (/) makes pattern case-insensitive: am, AM, Am, aM will be equal):
preg_match('/\d{1,2}:\d{2} [PA]M/i', $string, $time);
print ($time);
If there might not be a space after digits (ex. 11:30am) or more then one space char., then the regex should look like this:
/\d{1,2}:\d{2}\s*[PA]M/i
this code will give you 11:30 PM
preg_match('$([0-9:]{3,5}) ([AP])M$','Monday & SUNDAY 11:30 PM et/pt',$m);
echo $m['1']." ".$m['2']."M";
my RegEx is written here and it does not work no matter how I change it, substitute characters what not. I have a list of strings that may have 3 words or 8 words. Is there a easier way to cut off the RegEx when we hit a certain character or string? Let me show you what I mean:
Here are some examples of strings I will deal with:
WKT8100 Cooperative Education Work Term Preparation 15 hrs/w
CST8259 Web Programming Languages II 5 hrs/w
CST8265 Web Security Basics 5 hrs/w
CST8267 Ecommerce 4 hrs/w
I want to extract only the course name and ID from the string and leave out the number of hours I need, so leaving me with:
WKT8100 Cooperative Education Work Term Preparation
as a return.
My RegEx currently is like this:
RegEx = "/[a-zA-Z]{3}[0-9]{4}[A-Z]{0,1}\s[a-zA-Z]{3,20}\s[a-zA-Z]{0,20}\s[a-zA-Z]{0,20}\s[a-zA-Z]{0,20}\s/";
I a RegEx that extracts the hours correctly so maybe if there is a method I can use with substr. That way I can basically extract everything before the hours RegEx and don't have to worry about a complex RegEx line.:
HoursRegEx = "#\s[0-9]{1,2}?\shrs\/w#i";
Why not:
/(.*) \d+ hrs\/w/
This should capture all characters before the x hrs/w part.
For a little more explanation, this just creates a capturing group that contains whatever it found before seeing: a space, one or more digits, another space, and then the sequence "hrs/w". Since you don't care what's before the end part, why try to recognize it?
If it always ends in " hrs/w", you can do this:
$string = "WKT8100 Cooperative Education Work Term Preparation 15 hrs/w";
$string = trim($string)
$lastSpace = strrpos($string, " ");
$string = trim(substr($string, 0, $lastSpace));
$lastSpace = strrpos($string, " ");
$hours = trim(substr($string, $lastSpace));
$nameID = trim(substr($string, 0, $lastSpace));
That's a way off the top of my head w/o using regex. I can't give you any regex without first doing some extensive refresher research.
p.s. Jordan's looks much cleaner.
I need to get everything before "On Sun, May 27, 2012 at 6:25 AM,"
I am hoping to get everything before "On xxx, xxx xx, xxxx at xx:xx xx,"
The problem here is that May, 27, and 6 are all variable in length. What is the best tool for this job. Due to my lack of experience with regex I am trying to use explode() but it doesn't appear it can do the job here. Is regex my best option?
[EDIT]
I ended up using a combination of answers. I went with:
preg_match("/(.*)On\s+(Sun|Sat|Fri|Thu|Wed|Tue|Mon),\s+(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d?\d,\s+\d{4}\s+at\s+\d?\d:\d\d\s+[AP]M,/i", $to, $end);
Something like this, I guess:
/On\s+(Sun|Sat|Fri|Thu|Wed|Tue|Mon),\s+(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d?\d,\s+\d{4}\s+at\s+\d?\d:\d\d\s+[AP]M,/i
[EDIT]
As per the comment: I have added support for case insensitive (by adding the i modifier to the end of the regex). I have also change the spaces in the expression to \s to allow any whitespace character, and added + to allow multiples spaces between words.
I haven't changed it to support long day names or short month names, as the questions specified that month name was variable in length but didn't specify day name as being variable. However, it should be trivial enough to add these variants if required.
[EDIT]
$to = "Let me know how this response looks..... On Sun, May 27, 2012 at 6:25 AM, Pr";
preg_match("/On\s+(Sun|Sat|Fri|Thu|Wed|Tue|Mon),\s+(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d?\d,\s+\d{4}\s+at\s+\d?\d:\d\d\s+[AP]M,/i", $to, $end);
This code works for the example given in your comment.
Hope that helps.
preg_match('/(.*?) On \w+, \w+ \d?\d, \d+ at \d?\d:\d?\d \w\w,/', 'grab this text here On Sun, May 27, 2012 at 6:25 AM,', $matches);
echo $matches[1];
// echoes 'grab this text here'
(.*?) matches everything in the beginning, \w+ matches any alphanumeric character 1 or more times, \d?\d matches either one or two digits
a regular expression would work since that's what it was made for: selecting data based on a pattern. You could however explode on ',' (comma) and just implode the first 4 elements together again to form your sentence. I doubt using regular expression will be faster in this case.
Ultimately it's your preference: which is better readable and understandable by you.
The main advantage regular expression would have in this particular case is hat they can extract specific values/patterns, so you could easily have them set aside the month for instance.
$dateString = "On Sun, May 27, 2012 at 6:25 AM, some other text here";
// using explode/implode
$result = explode(',',$dateString);
print "we got: " . implode(',', array_slice($result,0,3)) . "\n";
// using regular expression
$pattern = "/On [A-Z,a-z]{3}, [A-Z,a-z]{3} [0-9]+, [0-9]{4} at [0-9,:]+ (?:A|P)M/U";
preg_match($pattern,$dateString,$match);
print "We got: " . $match[0] . "\n";
Please also read the PHP manual, Regular Expressions subsection together with an initial tutorial
Personally in this case I think reg exp might be overkill both visually and performance wise. Do learn regular expressions though, they can be very helpful at times.
PHP REGEX is a weakness of mine, but still I manage to get some things done with online tools. Consider the following:
A subject string which generally follows this pattern: 1551 UTC 04 June 2012
I want to extract the "04" and assign it to the $day variable using below:
$day = preg_replace("/^([0-9]{4})\s([A-Z]{3})\s([0-9]{2})\s([A-Za-z]{3,})\s([0-9]{4})$/", "$3", $weather['date']);
This works on the following website: http://sqa.fyicenter.com/Online_Test_Tools/Test_Regular_Expression_Search_Replace.php
but I can't get it to work in my script... $day would equal the whole subject string.
The result of your var_dump() is string(38) "1551 UTC 04 June 2012 ". It has 38 chars while it should be only 21. So it looks like there are multiple whitespaces in the string.
Try to trim() your input string and replace \s with \s+ to support multiple whitespaces:
$day = preg_replace("/^([0-9]{4})\s+([A-Z]{3})\s+([0-9]{2})\s+([A-Za-z]{3,})\s+([0-9]{4})$/", "$3", trim($weather['date']));
you say preg_replace, but I think you want to use preg_match(). Is that correct that you don't want to replace the "04" but you just want to put it into a the variable $day? If so use preg_match(). In your description you say you want to capture only the "04" part, but your regex has many capture groups (anything within "()" is a capture group and will be returned in the array you give to preg_match).