I am trying to find if a string is of a format <initial_part_of_name>_0.pdf i.e.
Find if it ends with .pdf (could be eliminated using rtrim)
The initial part is followed by an _ underscore.
The underscore is followed by a whole number (0, 1, 2, ... , etc.)
What could be the optimum way to achieve this? I have tried combinations of the string functions strpos (to find the position. but could not get to do anything from the end of the string).
Any pointers would be appreciated!
Edit:
Sample strings:
public://Big_Data_Tutorial_part4_0.pdf
public://Big_Data_Tutorial_part4_1.pdf
public://Big_Data_Tutorial_part4_3.pdf
The reason why I need to check is to avoid duplicate files which are stored with the _<number> appended.
You can use preg_match() function for matching patterns
Check the function preg_match()
preg_match("/(.*)_(\d+)\.pdf$/", "<initial_part_of_name>_0.pdf",$arr);
In $arr[1], you will get the <initial_part_of_name>
in $arr[2], you will get the number after underscore
a non-array and regex way
$str1 = "public://Big_Data_Tutorial_part4_0a.pdf"; // no match because 0a
$str2 = "public://Big_Data_Tutorial_part4_1.pdf"; // match
$str3 = "public://Big_Data_Tutorial_part4_3.pdf"; // match
$last_part = strrchr($str1, "_");
if (trim(strstr($last_part, ".", true), "_0..9") == "" && strstr($last_part, ".") == ".pdf") {
echo "match";
}
$str = '<initial_part_of_name>_0.pdf';
$exploded = explode('.', $str);
echo $exploded[1];
This could be done using regex. Something like
^[0-9A-Za-z]+\_[\d]\.pdf$
Implementation:
$filename = '<initial_part_of_name>_0.pdf';
if(preg_match('/^[0-9A-Za-z]+\_[\d]\.pdf$/i', $filename)){
// name pattern Matched
}
SOLUTION 2
use pathinfo()
$filename = '<initial_part_of_name>_0.pdf';
$path_parts = pathinfo($filename);
if(strtolower($path_parts['extension']) == 'pdf') {
if(preg_match('/.*_[\d]$/', $path_parts['filename'])){
// name pattern Matched
}
} else {
// Not a PDF file
}
Related
I have a user-input string with 2 comma-delimited integers.
Example (OK):
3,5
I want to reject any user input that contains leading 0's for either number.
Examples (Bad):
03,5
00005,3
05,003
Now what I could do is separate the two numbers into 2 separate string's and use ltrim on each one, then see if they have changed from before ltrim was executed:
$string = "03,5";
$string_arr = explode(",",$string);
$string_orig1 = $string_arr[0];
$string_orig2 = $string_arr[1];
$string_mod1 = ltrim($string_orig1, '0');
$string_mod2 = ltrim($string_orig2, '0');
if (($string_mod1 !== $string_orig1) || ($string_mod2 !== $string_orig2)){
// One of them had leading zeros!
}
..but this seems unnecessarily verbose. Is there a cleaner way to do this? Perhaps with preg_match?
You could shorten the code and check if the first character of each part is a zero:
$string = "03,5";
$string_arr = explode(",",$string);
if ($string_arr[0][0] === "0" || $string_arr[1][0] === "0") {
echo "not valid";
} else {
echo "valid";
}
Here is one approach using preg_match. We can try matching for the pattern:
\b0\d+
The \b would match either the start of the string, or a preceding comma separator.
If we find such a match, it means that we found one or more numbers in the CSV list (or a single number, if only one number present) which had a leading zero.
$input = "00005,3";
if (preg_match("/\b0\d+/", $input)) {
echo "no match";
}
You can do a simple check that if the first character is 0 (using [0]) or that ,0 exists in the string
if ( $string[0] == "0" || strpos($string, ",0") !== false ) {
// One of them had leading zeros!
}
All the current answers fail if any of the values are simply 0.
You can just convert to integer and back and compare the result.
$arr = explode(',', $input);
foreach($arr as $item) {
if( (str)intval($item) !== $item ) {
oh_noes();
}
}
However I am more curious as to why this check matters at all.
One way would be with /^([1-9]+),(\d+)/; a regex that checks the string starts with one or more non-zero digits, followed by a comma, then one or more digits.
preg_match('/^([1-9]+),(\d+)/', $input_line, $output_array);
This separates the digits into two groups and explicitly avoids leading zeros.
This can be seen on Regex101 here and PHPLiveRegex here.
This may be a dupe, but I cannot seem to find a thread which matches this issue. I want to remove all chars from a string after a given sub-string - but the chars and the number of chars after the sub-string is unknown. Most solutions I have found seem to only work for removing the given sub-string itself or a fixed length after a given sub-string.
I have
$str = preg_replace('(.gif*)','.gif$',$str);
Which locates 'blahblah.gif?12345' ok, but I cannot seem to remove the chars after the sub-string '.gif'. I read that $ denotes EOS so I thought this would work, but apparently not. I also tried
'.gif$/'
and simply
'.gif'
It can be done without regex:
echo substr('blahblah.gif?12345', strpos('blahblah.gif?12345', '.gif') + 4);
// returns ?12345 this is the length of the substring ^
So the code is:
$str = 'original string';
$match = 'matching string';
$output = substr($str, strpos($str, $match) + strlen($match));
Ok, now I'm not sure if you want to keep the first or the second part of the string. Anyway, here's the code for keeping the first part:
echo substr('blahblah.gif?12345', 0, strpos('blahblah.gif?12345', '.gif') + 4);
// returns blahblah.gif ^ this is the key
And the full code:
$str = 'original string';
$match = 'matching string';
$output = substr($str, 0, strpos($str, $match) + strlen($match));
See the both examples work here: http://ideone.com/Ge30rY
Assuming (from OP's comment) that you are working with actual URLs as your source string, I believe that the best course of action here would be to use PHP's built-in functionality for working with and parsing URLs. You do this by using the parse_url() function:
(PHP 4, PHP 5)
parse_url — Parse a URL and return its components
This function parses a URL and returns an associative array containing any of the various components of the URL that are present.
This function is not meant to validate the given URL, it only breaks it up into the above listed parts. Partial URLs are also accepted, parse_url() tries its best to parse them correctly.
From your example: www.page.com/image.gif?123 (or even just image.gif?123) using parse_url() will look something like this:
var_dump( parse_url( "www.page.com/image.gif?123" ) );
array(2) {
["path"]=>
string(22) "www.page.com/image.gif"
["query"]=>
string(3) "123"
}
As you can see, without the need for regular expressions or string manipulations we have broken up the URL into it's separate components. No need to re-invent the wheel. Nice and clean :)
You could do this:
$str = "somecontent.gif?anddata";
$pattern = ".gif";
echo strstr($str,$pattern,true).$pattern;
// Set up string to search through
$haystack = "blahblah.gif?12345";
// Determine substring and length of it
$needle = ".gif";
$length = strlen($needle);
// Find position of last substring
$location = strrpos($haystack, $needle);
// Use location of last occurence + it's length to get new string
$newtext = substr($haystack, 0, $location+$length);
I have a regexp that match to something like : wiseman.google.com.jp, me.co.uk, paradise.museum, abcd-abc.net, www.google.jp, 12345-daswe-23dswe-dswedsswe-54eddss.info, del.icio.us, jo.ggi.ng, all of this is from a textarea value.
used regexp (in preg_match_all($regex1, $str, $match)) to get the above values: /(?:[a-zA-Z0-9]{2,}\.)?[-a-zA-Z0-9]{2,}\.[a-zA-Z0-9]{2,7}(?:\.[-a-zA-Z0-9]{2,3})?/
Now, my question is : how can I make the regexp to trim down the "wiseman.google.com.jp" into "google.com.jp" and "www.google.jp" into "google.jp"?
I am thingking to make a second preg_match($regex2, $str, $match) function with each value coming from the preg_match_all function.
I have tried this regexp in $regex2 : ([-a-zA-Z0-9\x{0080}-\x{00FF}]{2,}+)\.[a-zA-Z0-9\x{0080}-\x{00FF}]{2,7}(?:\.[-a-zA-Z0-9\x{0080}-\x{00FF}]{2,3})? but it doesn't work.
Any inputs? TIA
here is my little solution :
preg_match_all($regex, $str, $matches, PREG_PATTERN_ORDER);
$arrlength=count($matches[0]);
for($x=0;$x<$arrlength;$x++){
$dom = $matches[0][$x];
$newstringcount = substr_count($dom, '.'); // this line is to count how many "." present in the string.
if($newstringcount == 3){ // if there are 3 '.' present in the string = true
$pos = strpos($dom, '.', 0); // this line is to find the first occurence of the '.' in the string
$find = substr($dom, $pos+1); //this line is to get the value after the first occurence of the '.' in the string
echo $find;
}else if($newstringcount == 2){
if ($pos = strpos($dom,'www.') !== false) {
$find = substr($dom, $pos+3);
echo $find;
}else{
echo $dom;
}
}else if($newstringcount == 1){
echo $dom;
}
echo "<br>";
}
(Caution: this answer will only fit your needs if you HAVE to use regex or you're somewhat... desperate...)
What you want to achieve isn't possible with general rules due to domains like .com.jp or .co.uk.
The only general rule one can find is:
When read from right to left there are one or two TLDs followed by one second level domain
Thus, we have to whitelist all available TLDs. I think i'll call the following the "domain-kraken".
Release the kraken!
([a-z0-9\-]{2,63}(?:\.(?:a(?:cademy|ero|rpa|sia|[cdefgilmnoqrstuwxz])|b(?:ike
|iz|uilders|uzz|[abdefghijlmnoqrstvwyz])|c(?:ab|amera|amp|areers|at|enter|eo
|lothing|odes|offee|om(?:pany|puter)?|onstruction|ontractors|oop|
[acdfghiklmnoruvwxyz])|d(?:iamonds|irectory|omains|[ejkmoz])|e(?:du(?:cation)?
|mail|nterprises|quipment|state|[ceghrstu])|f(?:arm|lorist|[ijkmor])|g(?:allery|
lass|raphics|uru|[abdefghlmnpqrstuwy])|h(?:ol(?:dings|iday)|ouse|[kmnrtu])|
i(?:mmobilien|n(?:fo|stitute|ternational)|[delmnoqrst])|j(?:obs|[emop])|
k(?:aufen|i(?:tchen|wi)|[eghimnprwxyz])|l(?:and|i(?:ghting|mo)|[abcikrstuvy])|
m(?:anagement|enu|il|obi|useum|[acdefghklmnopqrstuvwxyz])|n(?:ame|et|inja|
[acefgilopruz])|o(?:m|nl|rg)|p(?:hoto(?:graphy|s)|lumbing|ost|ro|[aefghklmnrstwy])|
r(?:e(?:cipes|pair)|uhr|[eosuw])|s(?:exy|hoes|ingles|ol(?:ar|utions)|upport|
ystems|[abcdeghijklmnorstuvxyz])|t(?:attoo|echnology|el|ips|oday|
[cdfghjklmnoprtvwz])|u(?:no|[agkmsyz])|v(?:entures|iajes|oyage|[aceginu])|
w(?:ang|ien|[fs])|xxx|y(?:[et])|z(?:[amw]))){1,2})$
Use it together with the i and m flags.
This supposes your data is on mutiple lines.
In case your data is seperated by a ,, change the last character in the regex ($) to ,? and use the g and i flags.
Demos are available on regex101 and debuggex.
(Both of the demos have an explanation: regex101 describes it with text while debuggex visualizes the beast)
A list of available TLDs can be found at iana.org, the used TLDs in the regex are as of January 2014.
I have two example filename strings:
jquery.ui.min.js
jquery.ui.min.css
What regex can I use to only match the LAST dot? I don't need anything else, just the final dot.
A little more on what I'm doing. I'm using PHP's preg_split() function to split the filename into an array. The function deletes any matches and gives you an array with the elements between splits. I'm trying to get it to split jquery.ui.min.js into an array that looks like this:
array[0] = jquery.ui.min
array[1] = js
If you're looking to extract the last part of the string, you'd need:
\.([^.]*)$
if you don't want the . or
(\.[^.]*)$
if you do.
I think you'll have a hard time using preg_split, preg_match should be the better choice.
preg_match('/(.*)\.([^.]*)$/', $filename, $matches);
Alternatively, have a look at pathinfo.
Or, do it very simply in two lines:
$filename = substr($file, 0, strrpos($file, '.'));
$extension = substr($file, strrpos($file, '.') + 1);
At face value there is no reason to use regex for this. Here are 2 different methods that use functions optimized for static string parsing:
Option 1:
$ext = "jquery.ui.min.css";
$ext = array_pop(explode('.',$ext));
echo $ext;
Option 2:
$ext = "jquery.ui.min.css";
$ext = pathinfo($ext);
echo $ext['extension'];
\.[^.]*$
Here's what I needed to exclude the last dot when matching for just the last "part":
[^\.]([^.]*)$
Using a positive lookahead, I managed this answer:
\.(?=\w+$)
This answer matches specifically the last dot in the string.
I used this - give it a try:
m/\.([^.\\]+)$/
I need help while trying to spin articles. I want to find text and replace synonymous text while keeping the case the same.
For example, I have a dictionary like:
hello|hi|howdy|howd'y
I need to find all hello and replace with any one of hi, howdy, or howd'y.
Assume I have a sentence:
Hello, guys! Shouldn't you say hello me when I say you HELLO?
After my operation it will be something like:
hi, guys! Shouldn't you say howd'y to me when I say howdy?
Here, I lost the case. I want to maintain it! It should actually be:
Hi, guys! Shouldn't you say howd'y to me when I say HOWDY?
My dictionary size is about 5000 lines
hello|hi|howdy|howd'y go|come
salaries|earnings|wages
shouldn't|should not
...
I'd suggest using preg_replace_callback with a callback function that examines the matched word to see if (a) the first letter is not capitalized, or (b) the first letter is the only capitalized letter, or (c) the first letter is not the only capitalized letter, and then replace with the properly modified replacement word as desired.
You can find your string and do two tests:
$outputString = 'hi';
if ( $foundString == ucfirst($foundString) ) {
$outputString = ucfirst($outputString);
} else if ( $foundString == strtoupper($foundString) ) {
$outputString = strtoupper($outputString);
} else {
// do not modify string's case
}
Here's a solution for retaining the case (upper, lower or capitalized):
// Assumes $replace is already lowercase
function convertCase($find, $replace) {
if (ctype_upper($find) === true)
return strtoupper($replace);
else if (ctype_upper($find[0]) === true)
return ucfirst($replace);
else
return $replace;
}
$find = 'hello';
$replace = 'hi';
// Find the word in all cases that it occurs in
while (($pos = stripos($input, $find)) !== false) {
// Extract the word in its current case
$found = substr($input, $pos, strlen($find));
// Replace all occurrences of this case
$input = str_replace($found, convertCase($found, $replace), $input);
}
You could try the following function. Be aware that it will only work with ASCII strings, as it uses some of the useful properties of ASCII upper and lower case letters. However, it should be extremely fast:
function preserve_case($old, $new) {
$mask = strtoupper($old) ^ $old;
return strtoupper($new) | $mask .
str_repeat(substr($mask, -1), strlen($new) - strlen($old) );
}
echo preserve_case('Upper', 'lowercase');
// Lowercase
echo preserve_case('HELLO', 'howdy');
// HOWDY
echo preserve_case('lower case', 'UPPER CASE');
// upper case
echo preserve_case('HELLO', "howd'y");
// HOWD'Y
This is my PHP version of the clever little perl function:
How do I substitute case insensitively on the LHS while preserving case on the RHS?