Sorry if the title is confusing. All I'm trying to do is some simple regex:
The text: /thing/images/info.gif
And what I want is: info
My regex (not fully working): ([^\/]+$)(.*?)(?=\.gif)
(Note: [^\/]+$ returns info.gif)
Thanks for any help!
I'd say you don't need to match all the string, so you can be much more generic. If you know your string always contains a path you can just use:
preg_match( '/([^\/]+)\.\w+$/', "/thing/images/info.gif", $matches) ;
print_r( $matches );
and it will be valid for any filename, even names that contains dots like my_file.name.jpg or spaces like /thing/images/my image.gif
Demo here.
The structure is (from the end of the regex moving to the left):
Match before the end of the string
any number of characters preceded by a dot
any character that is not a slash (your filename, if there is a slash, there starts the directories)
Not sure how much more complex the string is but this seems to work on the test string:
preg_match('![^/.]+(?=\.gif)!', '/thing/images/info.gif', $m);
Matching NOT / NOT . followed by .gif.
In editors (Sublime):
Find:^(.*)(\/)(.*)(\.)(.*)$
Replace it with:\3
In PHP:
<?php
preg_match('/^(.*)(\/)(.*)(\.)(.*)$/', '/thing/images/info.gif', $match);
echo $match[3];
Related
I'm sure I'm missing something. I know just enough to be dangerous.
In my php code I use file_get_contents() to put a file into a variable.
I then loop through an array and use preg_match to search the same variable many times. The file is a tab-delimited txt file. It does fine 800 times but one time randomly in the middle it does something very odd.
$current = file_get_contents($file);
foreach($blahs as $blah){
$image = 'somefile.jpg';
$pattern = '/https:\/\/www\.example\.com\/media(.*)\/' . preg_quote($image) . '/';
preg_match($pattern, $current, $matches);
echo $matches[0];
}
For some reason that one time it turns two URL's with a tab between them. When I look at the txt file the image i'm looking for is listed first then followed by the second iamge but echo $matches[0] returns it in reverse order. it does not exist like echo $matches[0] returns it. It would be like if you searched the string 'one two' and $matches returned 'two one'.
The regex engine is trying to do you a favor and capture the longest match. The \t tab between the two urls is being matched by the . (dot / any character).
Demonstration: (Link)
$blah='test case: https://www.example.com/media/foo/bar.jpg https://www.example.com/media/cat/fish.jpg some text';
$image = 'fish.jpg';
$your_pattern = '/https:\/\/www\.example\.com\/media(.*)\/'.preg_quote($image).'/';
echo preg_match($your_pattern,$blah,$matches)?$matches[0]:'fail';
echo "\n----\n";
$my_pattern='~https://www\.example\.com/media(?:[^/\s]*/)+'.preg_quote($image).'~';
echo preg_match($my_pattern,$blah,$out)?$out[0]:'fail';
Output:
https://www.example.com/media/foo/bar.jpg https://www.example.com/media/cat/fish.jpg
----
https://www.example.com/media/cat/fish.jpg
To crystallize...
test case: https://www.example.com/media/foo/bar.jpg https://www.example.com/media/cat/fish.jpg some text
// your (.*) is matching ---------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
My suggested pattern (I may be able to refine the pattern if you provide smoe sample strings) uses (?:[^/\s]*/)+ instead of the (.*).
My non-capturing group breaks down like this:
(?: #start non-capturing group
[^/\s]* #greedily match zero or more non-slash, non-whitespace characters
/ #match a slash
) #end non-capturing group
+ #allow the group to repeat one or more times
*note1: You can use \t where I use \s if you want to be more literal, I am using \s because a valid url shouldn't contain a space anyhow. You may make this adjustment in your project without any loss of accuracy.
*note2: Notice that I changed the pattern delimiters to ~ so that / doesn't need to be escaped inside the pattern.
I am looking for a way to get a valid url out of a string like:
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
My original solution was:
preg_match('#^[^:|]*#', str_replace('//', '/', $string), $modifiedPath);
But obviously its going to remove a slash from the http:// instead of the one in the middle of the string.
My expected output that I want from the original is:
http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
I could always break off the http part of the string first but would like a more elegant solution in the form of regex if possible. Thanks.
This will do exactly what you are asking:
<?php
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
preg_match('/^([^|]+)/', $string, $m); // get everything up to and NOT including the first pipe (|)
$string = $m[1];
$string = preg_replace('/(?<!:)\/\//', '/' ,$string); // replace all occurrences of // as long as they are not preceded by :
echo $string; // outputs: http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
exit;
?>
EDIT:
(?<!X) in regular expressions is the syntax for what is called a lookbehind. The X is replaced with the character(s) we are testing for.
The following expression would match every instance of double slashes (/):
\/\/
But we need to make sure that the match we are looking for is NOT preceded by the : character so we need to 'lookbehind' our match to see if the : character is there. If it is then we don't want it to be counted as a match:
(?<!:)\/\/
The ! is what says NOT to match in our lookbehind. If we changed it to (?=:)\/\/ then it would only match the double slashes that did have the : preceding them.
Here is a Quick tutorial that can explain it all better than I can lookahead and lookbehind tutorial
Assuming all your strings are in the form given, you don't need any but the simplest of regexes to do this; if you want an elegant solution, then a regex is definitely not what you need. Also, double slashes are legal in a URL, just like in a Unix path, and mean the same thing a single slash does, so you don't really need to get rid of them at all.
Why not just
$url = array_shift(preg_split('/\|/', $string));
?
If you really, really care about getting rid of the double slashes in the URL, then you can follow this with
$url = preg_replace('/([^:])\/\//', '$1/', $url);
or even combine them into
$url = preg_replace('/([^:])\/\//', '$1/', array_shift(preg_split('/\|/', $string)));
although that last form gets a little bit hairy.
Since this is a quite strictly defined situation, I'd consider just one preg to be the most elegant solution.
From the top of my head:
$sanitizedURL = preg_replace('~((?<!:)/(?=/)|\\|.+)~', '', $rawURL);
Basically, what this does is look for any forward slash that IS NOT preceded by a colon (:), and IS followed bij another forward slash. It also searches for any pipe character and any character following it.
Anything found is removed from the result.
I can explain the RegEx in more detail if you like.
I want to match parts of a string that start with a certain character (asterisk):
abc*DEFxyz => *DEF
abc*DE*Fxyz => *DE, *F
Tried preg_match_all('/[$\*A-Z]+/', $string, $matches); But it doesn't seem to work. I get *DE*F on the 2nd example
Change your regex to this :
\*[A-Z]+
http://regexr.com?34itc
Your regex here : [$\*A-Z]+ means a string containing * and A-Z characters, not mentioning anything about start.
Try:
^[^*]*\*
which says "from the start of the line, skip over all non-asterisk characters and stop at the first"
Extending this:
s/^[^*]*\*(.*)/
Will return the remainder of the string after the asterisk. To include the asterisk, adjust like this
s/^[^*]*(\*.*)/
Here's a great tool for checking your regex: http://gskinner.com/RegExr/
Hope this helps
I got the following URL
http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego
and I want to extract
B000NO9GT4
that is the asin...to now, I can get search between the string, but not in this way I require. I saw the split functin, I saw the explode. but cant find a way out...also, the urls will be different in length so I cant hardcode the length two..the only thing which make some sense in my mind is to split the string so that
http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/
become first part
and
B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego
becomes the 2nd part , from the second part , I should extract B000NO9GT4
in the same way, i would want to get product name LEGO-Ultimate-Building-Set-Pieces from the first part
I am very bad at regex and cant find a way out..
can somebody guide me how I can do it in php?
thanks
This grabs both pieces of information that you are looking to capture:
$url = 'http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego';
$path = parse_url($url, PHP_URL_PATH);
if (preg_match('#^/([^/]+)/dp/([^/]+)/#i', $path, $matches)) {
echo "Description = {$matches[1]}<br />"
."ASIN = {$matches[2]}<br />";
}
Output:
Description = LEGO-Ultimate-Building-Set-Pieces
ASIN = B000NO9GT4
Short Explanation:
Any expressions enclosed in ( ) will be saved as a capture group. This is how we get at the data in $matches[1] and $matches[2].
The expression ([^/]+) says to match all characters EXCEPT / so in effect it captures everything in the URL between the two / separators. I use this pattern twice. The [ ] actually defines the character class which was /, the ^ in this case negates it so instead of matching / it matches everything BUT /. Another example is [a-f0-9] which would say to match the characters a,b,c,d,e,f and the numbers 0,1,2,3,4,5,6,7,8,9. [^a-f0-9] would be the opposite.
# is used as the delimiter for the expression
^ following the delimiter means match from the beginning of the string.
See www.regular-expressions.info and PCRE Pattern Syntax for more info on how regexps work.
You can try
$str = "http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego" ;
list(,$desc,,$num,) = explode("/",parse_url($str,PHP_URL_PATH));
var_dump($desc,$num);
Output
string 'LEGO-Ultimate-Building-Set-Pieces' (length=33)
string 'B000NO9GT4' (length=10)
I'm new at regular expressions and wonder how to phrase one that collects everything after the last /.
I'm extracting an ID used by Google's GData.
my example string is
http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123
Where the ID is: p1f3JYcCu_cb0i0JYuCu123
Oh and I'm using PHP.
This matches at least one of (anything not a slash) followed by end of the string:
[^/]+$
Notes:
No parens because it doesn't need any groups - result goes into group 0 (the match itself).
Uses + (instead of *) so that if the last character is a slash it fails to match (rather than matching empty string).
But, most likely a faster and simpler solution is to use your language's built-in string list processing functionality - i.e. ListLast( Text , '/' ) or equivalent function.
For PHP, the closest function is strrchr which works like this:
strrchr( Text , '/' )
This includes the slash in the results - as per Teddy's comment below, you can remove the slash with substr:
substr( strrchr( Text, '/' ), 1 );
Generally:
/([^/]*)$
The data you want would then be the match of the first group.
Edit Since you’re using PHP, you could also use strrchr that’s returning everything from the last occurence of a character in a string up to the end. Or you could use a combination of strrpos and substr, first find the position of the last occurence and then get the substring from that position up to the end. Or explode and array_pop, split the string at the / and get just the last part.
You can also get the "filename", or the last part, with the basename function.
<?php
$url = 'http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123';
echo basename($url); // "p1f3JYcCu_cb0i0JYuCu123"
On my box I could just pass the full URL. It's possible you might need to strip off http:/ from the front.
Basename and dirname are great for moving through anything that looks like a unix filepath.
/^.*\/(.*)$/
^ = start of the row
.*\/ = greedy match to last occurance to / from start of the row
(.*) = group of everything that comes after the last occurance of /
you can also normal string split
$str = "http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123";
$s = explode("/",$str);
print end($s);
This pattern will not capture the last slash in $0, and it won't match anything if there's no characters after the last slash.
/(?<=\/)([^\/]+)$/
Edit: but it requires lookbehind, not supported by ECMAScript (Javascript, Actionscript), Ruby or a few other flavors. If you are using one of those flavors, you can use:
/\/([^\/]+)$/
But it will capture the last slash in $0.
Not a PHP programmer, but strrpos seems a more promising place to start. Find the rightmost '/', and everything past that is what you are looking for. No regex used.
Find position of last occurrence of a char in a string
based on #Mark Rushakoff's answer the best solution for different cases:
<?php
$path = "http://spreadsheets.google.com/feeds/spreadsheets/p1f3JYcCu_cb0i0JYuCu123?var1&var2#hash";
$vars =strrchr($path, "?"); // ?asd=qwe&stuff#hash
var_dump(preg_replace('/'. preg_quote($vars, '/') . '$/', '', basename($path))); // test.png
?>
Regular Expression to collect everything after the last /
How to get file name from full path with PHP?