Selecting certain links with a REGEX - php

I'm working to do a "Wiki Game" with PHP, and i'd like to match all the links in a string starting by /wiki/something, for example /wiki/Chiffrement_RSA or /wiki/OSS_117_:_Le_Caire,_nid_d%27espions. I know just a few thigs about REGEX, so I'm struct. If someone could help me, it would be nice.
For the time, I just have \/wiki\/*...
Thanks for your help !

You can do by regex or strpos:
<?php
$mystring = 'abc';
$find = '/wiki/';
$statusLink = strpos($mystring, $find);
// Note our use of ===. Simply == would not work as expected
// because the position of 'a' was the 0th (first) character.
if ($statusLink === false) {
echo "Not the link that you want";
} else {
echo "You found the link";
}
//or by explode
$link = explode('/', $originalLink);
if ($link[1] == 'wiki' && isset($link[2])){
//is your link
}
?>
I don't use pure regex so much unless it's very necessary.

You can reduce your output array size by by 50% using \K in your pattern. It eliminates the need for a capture group and puts your desired substrings in the "fullstrings" array.
Pattern:
\/wiki\/\K[^ ]+
\K says "start the fullstring match from here". This means no memory waste. It may be a microimprovement, but I believe it to be best practice and I think more people should use it.

I finally chose Cody.code's answer with this regex : \/wiki\/([^ ]+).
I will use this code to check if i keep a link in an array or not (I will parse my html with DOMDocument an get all the <a>, it's faster) , so the preg_match() solution is the best for me, instead of strpos.
Thanks for your help !

Related

Check if text contains url, email and phone number with php and regex

I have a text, for example, like: $descrizione = "Tel.+39.1234.567899 asd.test#testwebsite.com
www.testwebsite.com" and I would like to obtain three different variable with:
"+39.1234.567899""asd.test#testwebsite.com"
"www.testwebsite.com".
To check if text contains email I use regex and I write this code:
$regex = '/[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})/';
if (preg_match($regex, $descrizione, $email_is)) {
for($e = 0; $e < count($email_is) ; $e++){
if(strpos($email_is[$e], "#") !== false){
$linkEmail = $email_is[$e];
}
}
}
now, I would like to find website url, so I try to write:
$regex = '/[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/gi';
if( preg_match($regex, $descrizione, $matches)){
$linkWebsite = $matches[0];
}
but the preg_match return false. I control the regex with the website http://regexr.com/ and it's correct, so I don't understand why return always false. Where is the problem?I try to use "/[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/" but I have the same problem and I try to check errors with trycatch but it doesn't return errors.
Finally I would like to find phone number but I don't know how to write regex.
Is there someone thet can help me, please?
Your regex fails because it's faulty. You've escaped the slashes (/) with slashes. You should use backslashes:
[-a-zA-Z0-9#:%_\+.~#?&\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&\/=]*)?
Here at regex101.
Since regexr uses JS regex it doesn't complain, but if you try it at regex101 selecting php you'll easily detect such errors.
About regex for phone numbers - search! E.g https://stackoverflow.com/search?q=%5Bregex%5D+phone+number
I have find the solution, I hope thet this can help someone.
The preg_match returns only first result and not all the result thet it has find.
So, if I check the regex using a website like regex101, it returns the corrects result with all matches, but if I use the same regex in php, it returns only one.
The regex option "g" (global = don't return after first match) corresponds to the function preg_match_all.

Match words in file with regex php

I'm new with regex and php. I know this quite simple but i just can't get it. Now, i have file words.txt that contain:
happy
sad
laugh
I want to find match this sentence with my words.txt:
I am happy
So far, i've tried this but it doesn't valid because it read as a sentence not words: (not yet implement regex bcs im confused)
$input0= "I am happy";
$handle = fopen('words.txt', 'r');
$valid = false;
while (($buffer = fgets($handle)) !== false) {
if (strpos($buffer, $input0) !== false) { // here's the problem
$valid = TRUE;
break;
}
}
if($valid == TRUE){
//print the matches word
}
fclose($handle);
can u help me? :(
Depending on your final goal you may not even need regexp here, since you want to match entire word with no variable part.
if you want to have a loop on your keywords a simple str_replace() would do the job to replace the word by an emphasize one for instance, or simple if (strpos($input0, $word) !== false) to just check if found in sentence and find position.
But if you want to avoid a loop, for faster results and especially if you have many words preg_match_all() will do what you need as said by Zanderwar.
Here is an example:
$input0= "I am happy but sometimes quite pretty sad. It depends but I prefer to be happy in general.\nMy paragraph also continue on multilines\nend it makes me laugh and rejoy. I am so happy. HAPPY?";
// $contents = file_get_contents('words.txt');
$contents = "happy\nsad\nlaugh";
$words_list = str_replace("\n", '|', $contents);
if (preg_match_all("~($words_list)~si", $input0, $matches))
{
print_r(array($matches));
// Do what you want
}
The i flag match case insensitive if you need.
The s flag match on multilines content.
[EDIT] to add more details on regexp
In the pattern you need a delimiter which can be ~ because it is very seldom used in sentences and strings to match so you wont need to escape / as when you use / delimiter.
also I am joining your words like ~(sad|joy|happy)~ if you want to capture the words. if you don't you need a group like (?:sad|joy|happy)
the | means or.
You can try to replace regex ~($words_list)~si by ~(?:$words_list)~si if you dont need capturing - and you don't - you will then have only one level of captures in $matches array, at position [0] it is always the full match. but here you don't have more complex patterns to match and so no need to capture

PHP preg_match regular expression for find date in string

I try to make system that can detect date in some string, here is the code :
$string = "02/04/16 10:08:42";
$pattern = "/\<(0?[1-9]|[12][0-9]|3[01])\/\.- \/\.- \d{2}\>/";
$found = preg_match($pattern, $string);
if ($found) {
echo ('The pattern matches the string');
} else {
echo ('No match');
}
The result i found is "No Match", i don't think that i used correct regex for the pattern. Can somebody tell me what i must to do to fix this code
First of all, remove all gibberish from the pattern. This is the part you'll need to work on:
(/0?[1-9]|[12][0-9]|3[01]/)
(As you said, you need the date only, not the datetime).
The main problem with the pattern, that you are using the logical OR operators (|) at the delimiters. If the delimiters are slashes, then you need to replace the tube characters with escaped slashes (/). Note that you need to escape them, because the parser will not take them as control characters. Like this: \/.
Now, you need to solve some logical tasks here, to match the numbers correctly and you're good to go.
(I'm not gonna solve the homework for you :) )
These articles will help you to solve the problem tough:
Character classes
Repetition opetors
Special characters
Pipe character (alternation operator)
Good luck!
In your comment you say you are looking for yyyy, but the example says yy.
I made a code for yy because that is what you gave us, you can easily change the 2 to a 4 and it's for yyyy.
preg_match("/((0|1|2|3)[0-9])\/\d{2}\/\d{2}/", $string, $output_array);
Echo $output_array[1]; // date
Edit:
If you use this pattern it will match the time too, thus make it harder to match wrong.
((0|1|2|3)[0-9])/\d{2}/\d{2}\s+\d{2}:\d{2}:\d{2}
http://www.phpliveregex.com/p/fjP
Edit2:
Also, you can skip one line of code.
You first preg_match to $found and then do an if $found.
This works too:
If(preg_match($pattern, $string, $found))}{
Echo $found[1];
}Else{
Echo "nothing found";
}
With pattern and string as refered to above.
As you can see the found variable is in the preg_match as the output, thus if there is a match the if will be true.

Regex fomatting and design for a query

I'm having a some trouble formatting my regular expression for my PHP code using preg_match().
I have a simple string usually looking like this:
"q?=%23asdf".
I want my regular expression to only pass true if the string begins with "q?=%23" and there is a character at the end of the 3. So far one of the problems I have had is that the ? is being pulled up by the regex so doing something like
^q?=23 doesn't work. I am also having problems with contiguous searching in Regex expressions (because I can't figure out how to search after the 3).
So for clarification: "q?=%23asd" should PASS and "q?=%23" should FAIL
I'm no good with Regex so sorry if this seems like a beginner question and thanks in advance.
Just use a lookahead to check whether the character following 3 is an alphabet or not,
^q\?=%23(?=[a-zA-Z])
Add . instead of [A-Za-z] only if you want to check for any character following 3,
^q\?=%23(?=.)
Code would be,
$theregex = '~^q\?=%23(?=[a-z])~i';
if (preg_match($theregex, $yourstring)) {
// Yes! It matches!
}
else { // nah, no luck...
}
So the requirement is: Start with q?=%23, followed by at least one [a-z], the pattern could look like:
$pattern = '/^q\?=%23[a-z]+/i';
Used i (PCRE_CASELESS) modifier. Also see example at regex101.
$string = "q?=%23asdf";
var_dump(figureOut($string));
function figureOut($string){
if(strpos($string, 'q?=%23') == 0){
if(strlen($string) > 6){
return true;
}else{ return false;}
}
}

compare portion of the string using php

I want to check whether the search keyword 'cli' or 'ent' or 'cl' word exists in the string 'client' and case insensitive. I used the preg_match function with the pattern '\bclient\b'. but it is not showing the correct result. Match not found error getting.
Please anyone help
Thanks
I wouldn't use regular expressions for this, it's extra overhead and complexity where a regular string function would suffice. Why not go with stripos() instead?
$str = 'client';
$terms = array('cli','ent','cl');
foreach($terms as $t) {
if (stripos($str,$t) !== false) {
echo "$t exists in $str";
break;
}
}
Try the pattern /cli?|ent/
Explanation:
cli matches the first part. The i? makes the i optional in the search.
| means or, and that matches cli, or ent.
\b is word boundary, It would not match cli in client, you need to remove \b

Categories