I am trying to mass replace some keywords in a csv file, for example I have a list of keywords cat,mouse,dog with i would like to replace with something,else,here , I am currently using this http://phpcsv.sourceforge.net/phpcsv-1.0.php it is perfect and it says it uses PCRE for replacing values , my question is what do i need to type in the search and replace field to achieve this result ?
You could use
Search (?<=^|,)(cat|mouse|dog)(?=,|$) Replace ${1}2
The ${1} is used to reference the string captured by the () in the Search field.
Normally you could just use $1, but because it is followed immediately by 2, the 1 needs to be enclosed in {}.
If the values may be enclosed in " add "? before and after (cat|mouse|dog).
(?<=^|,) means looking behind there must be the start of a line or a comma.
(?=,|$) means looking ahead there must be a comma or the end of a line.
If the replacements are different for each keyword, I think you would have to do each separately, e.g.
Search (?<=^|,)cat(?=,|$) Replace hamster
Alternatively, if using your own code you could make all the replacements in one go by passing arrays as arguments to preg_replace.
Related
I was trying to split a string on non-alphanumeric characters or simple put I want to split words. The approach that immediately came to my mind is to use regular expressions.
Example:
$string = 'php_php-php php';
$splitArr = preg_split('/[^a-z0-9]/i', $string);
But there are two problems that I see with this approach.
It is not a native php function, and is totally dependent on the PCRE Library running on server.
An equally important problem is that what if I have punctuation in a word
Example:
$string = 'U.S.A-men's-vote';
$splitArr = preg_split('/[^a-z0-9]/i', $string);
Now this will spilt the string as [{U}{S}{A}{men}{s}{vote}]
But I want it as [{U.S.A}{men's}{vote}]
So my question is that:
How can we split them according to words?
Is there a possibility to do it with php native function or in some other way where we are not dependent?
Regards
Sounds like a case for str_word_count() using the oft forgotten 1 or 2 value for the second argument, and with a 3rd argument to include hyphens, full stops and apostrophes (or whatever other characters you wish to treat as word-parts) as part of a word; followed by an array_walk() to trim those characters from the beginning or end of the resultant array values, so you only include them when they're actually embedded in the "word"
Either you have PHP installed (then you also have PCRE), or you don't. So your first point is a non-issue.
Then, if you want to exclude punctuation from your splitting delimiters, you need to add them to your character class:
preg_split('/[^a-z0-9.\']+/i', $string);
If you want to treat punctuation characters differently depending on context (say, make a dot only be a delimiter if followed by whitespace), you can do that, too:
preg_split('/\.\s+|[^a-z0-9.\']+/i', $string);
As per my comment, you might want to try (add as many separators as needed)
$splitArr = preg_split('/[\s,!\?;:-]+|[\.]\s+/', $string, -1, PREG_SPLIT_NO_EMPTY);
You'd then have to handle the case of a "quoted" word (it's not so easy to do in a regular expression, because 'is" "this' quoted? And how?).
So I think it's best to keep ' and " within words (so that "it's" is a single word, and "they 'll" is two words) and then deal with those cases separately. For example a regexp would have some trouble in correctly handling
they 're 'just friends'. Or that's what they say.
while having "'re" and a sequence of words of which the first is left-quoted and the last is right-quoted, the first not being a known sequence ('s, 're, 'll, 'd ...) may be handled at application level.
This is not a php-problem, but a logical one.
Words could be concatenated by a -. Abbrevations could look like short sentences.
You can match your example directly by creating a solution that fits only on this particular phrase. But you cant get a solution for all possible phrases. That would require a neuronal-computing based content-recognition.
I have created a Regular Expression (using php) below; which must match ALL terms within the given string that contains only a-z0-9, ., _ and -.
My expression is: '~(?:\(|\s{0,},\s{0,})([a-z0-9._-]+)(?:\s{0,},\s{0,}|\))$~i'.
My target string is: ('word', word.2, a_word, another-word).
Expected terms in the results are: word.2, a_word, another-word.
I am currently getting: another-word.
My Goal
I am detecting a MySQL function from my target string, this works fine. I then want all of the fields from within that target string. It's for my own ORM.
I suppose there could be a situation where by further parenthesis are included inside this expression.
From what I can tell, you have a list of comma-separated terms and wish to find only the ones which satisfy [a-z0-9._\-]+. If so, this should be correct (it returns the correct results for your example at least):
'~(?<=[,(])\\s*([a-z0-9._-]+)\\s*(?=[,)])~i'
The main issues were:
$ at the end, which was anchoring the query to the end of the string
When matching all you continue from the end of the previous match - this means that if you match a comma/close parenthesis at the end of one match it's not there at match at the beginning of the next one. I've solved this with a lookbehind ((?<=...) and a lookahead ((?=...)
Your backslashes need to be double escaped since the first one may be stripped by PHP when parsing the string.
EDIT: Since you said in a comment that some of the terms may be strings that contain commas you will first want to run your input through this:
$input = preg_replace('~(\'([^\']+|(?<=\\\\)\')+\'|"([^"]+|(?<=\\\\)")+")~', '"STRING"', $input);
which should replace all strings with '"STRING"', which will work fine for matching the other regex.
Maybe using of regex is overkill. In this kind of text you can just remove parenthesis and explode string by comma.
Hi Everybody,
I'm Currently using preg_match and I'm trying to extract some informations enclosed in square brackets.
So far, I have used this:
/\[(.*)\]/
But I want it to be only the content of the last occurence - or the first one, if starting from the end!
In the following:
string = "Some text here [value_a] some more text [value_b]"
I need to get:
"value_b"
Can anybody suggest something that will do the trick?
Thanks!
Match against:
/.*\[([^]]+)\]/
using preg_match (no need for the _all version here, since you only want the last group) and capture the group inside.
Your current regex, with your input, would capture value_a] some more text [value_b. Here, the first .* swallows everything, but must backtrack for a [ to be matched -- the last one in the input.
If you are only expecting numbers/letter (no symbols) you could use \[([\w\d]+)\] with preg_match_all() and pull the last of the array as the end variable. You can add any custom symbols by escaping them in the character class definition.
\[([^\]]*)\][^\[]*$
See it here on regexr
var someText="Some text here [value_a] some more text [value_b]";
alert(someText.match(/\[([^\]]*)\][^\[]*$/)[1]);
The part inside the brackets is stored in capture group 1, therefor you need to use match()1 to access the result.
For simple brakets, see the source to make this answer: Regex for getting text between the last brackets ()
i dont undestand the () in regexp.
eg. what is the difference between these lines:
"/hello/"
"/(hello)/"
() provide a way to capture matches and for grouping take a look here for a more full description.
By wrapping it in ( and ) you can capture it and handle it as a whole. Suppose we have the following text:
hellohello hello!
If I wanted to find the word "hello" in two's, I could do this:
/hellohello/
Or, I could do this:
/(hello){2}/
As you have written it, there is no actual difference between the two examples. But parantheses allow you to apply post-logic to that entire group of characters (e.g. as another poster used as an example, {2} afterwards would state the the string "hello" is typed two times in a row without anything in between - hellohello. Parantheses also allow you to use "or" statements - "/(hello|goodbye)/" would match EITHER hello OR goodbye.
The most powerful use of them however is extracting data from a string, rather than just matching it, it lets you pull the data out of the string and do what you want with it.
e.g. in PHP if you did
preg_replace( "/hello (.+)/i", "hello how are you?", $holder );`
Then $holder[1] will contain ALL the text after "hello ", which in this case would be "how are you?"
I need my following code to work.
I am trying to use a PHP Variable and also add a [charlist] Wildcard statement to it
Could some one please help figure out why it wont work.
Basically it works if i remove the [charlist] Wildcard but I need it to find all the letters which are in the PHP Variable
my code is as followed
LIKE ''[' $searchWord ']%''
To use a character class, you need to use the REGEXP operator.
Additionally, after a character class, you need to indicate a repetition operator. % matches any string (and is only for LIKE), but if you want to apply it so that it will match any series of letters contained within your character class, you need to do:
$query = '`column` REGEXP "[' . $searchWord . ']+"';
Alternatively, use a * to match 0 or more. (+ is 1 or more)
If $searchWord is an array, try it by calling implode first on it:
$listOfCharacters = str_split($searchWord, 1);
$implodedString = implode(',', $listOfCharacters;
The imploded string is a comma seperated string now instead of an array. PHP doesn't convert arrays to string by itself.
Then you can probably use it like:
LIKE ''[' $implodedString ']%''
Although I'm suspicious about using it without string concatonation here. Do we miss some piece of code here?