How to find text in webpage through PHP? - php

I have a pure text file without any HTML formatting. I want to search for a word in it? how should i do that? and I also want the next words before a comma. can i do that to?
so means:
if its: "word1":"i like stackoverflow", next thing
i want to find word1 which is in inverted commas and then i want the whole phrase i like stackoverflow without the inverted commas. and it should stop at that point.
is there any way to do that?
thanks...

Use a regular expression. In PHP, you can use a function like preg_match to do this.
For the pattern you described, your regular expression code could be something like:
$match_array = array();
$string_to_match = // load the string from a file and put it in this variable
preg_match('/"([A-Za-z0-9\s]+?)":"([A-Za-z0-9\s]+?)"/', $string_to_match, $match_array);
The results of the match will then be placed in $match_array.

Related

How to do this PHP find-replace

I think I need to use the preg_replace function but not sure exactly how to type in the patterns I want to find and replace. Basically, I want to replace this:
: u"x"x",
with this:
: u"x'x",
x means that any characters can go there. But I don't know how to write the x in PHP.
Thank you!
Edit: basically, I want to replace that middle double-quote with a single-quote. And I'll be searching through a big JSON file to do it. Probably should have said this at the start.
You could use this regular expression:
$result = preg_replace('#(: u".*?)"(.*?")#', "$1'$2", $string);

php regex replace within results

I want to run over a css file and replace some values.
I want the replace to take place only within the Braces.
for example lets say we have the next css:
.redColor{color:red;padding-right:45px;/*etc....*/}
and I want to replace all the red values with blue.
I had tried to use the next code :
preg_replace("/{(.*)red(.*)}/","blue",$cssString)
but the result where:
.redcolorblue I want it to replace just the red only if it's withing braces and avoid the pattern around it...
the expected result should be:
.redColor{color:blue ;padding-right:45px;/*etc....*/}
This just an example for what I am trying to do, I want to change the css file itself, and change a lot of values inside it.
some clarifications
I want to do this replace in a CSS file, so I am loading the whole file into a variable and doing the replace, so solutions that replace only one value are not what I amlooking for
preg_replace('/(\{.*?)red(.*?\})/s', '$1blue$2', $cssString);
Try this:
preg_replace("/({.*)red(.*})/","${1}blue${2}",$cssString);
By using parentheses matching string is saved and can be referenced in replacement string as $1.
More details in http://php.net/preg_replace

Php regex match a string between two html tags with the tags been unknown

Ok, so here's my issue:
I have a link, say: http://www.blablabla.com/watch?v=1lyu1KKwC74&feature=list_other&playnext=1&list=AL94UKMTqg-9CfMhPFKXPXcvJ_j65v7UuV
And the link is between two tags say like this:
<br>http://www.blablabla.com/watch?v=1lyu1KKwC74&feature=list_other&playnext=1&list=AL94UKMTqg-9CfMhPFKXPXcvJ_j65v7UuV<br></p>
Using this regex with preg_replace:
'#(^|[^\/]|[^>])('.addcslashes($link,'.?+').')([^\w\/]|[^<]$)#i'
As such:
preg_replace('#(^|[^\/]|[^>])('.addcslashes($link,'.?+').')([^\w\/]|[^<]$)#i', "***",$strText);
The resulted string is :
<br***p>
Which is wrong!!
It should have been
<br>***<br></p>
How can I get the desired result? I have blasted my head out trying to solve this one out.
I would like to mention that str_replace replaces even the link within another valid link, so it's not a good method, I need an exact match between two boundaries, even if the boundary is text or another HTML tag.
Assuming you don't want to use a DOM parser for some reason, I believe doing what you intended is as simple as the following:
preg_replace('#(^|[^\/]|[^>])('.addcslashes($link,'.?+').')([^\w\/]|[^<]$)#i', "$1***$3",$strText);
This uses $1 and $3 to put back the delimiting text you matched in your regular expression.
As others have pointed out, using a DOM parser is more reliable.
Does this do what you want?

Regex replace matched subexpression (and nothing else)?

I've used regex for ages but somehow I managed to never run into something like this.
I'm looking to do some bulk search/replace operations within a file where I need to replace some data within tag-like elements. For example, converting <DelayEvent>13A</DelayEvent> to just <DelayEvent>X</DelayEvent> where X might be different for each.
The current way I'm doing this is such:
$new_data = preg_replace('|<DelayEvent>(\w+)</DelayEvent>|', '<DelayEvent>X</DelayEvent>', $data);
I can shorten this a bit to:
$new_data = preg_replace('|(<DelayEvent>)(\w+)(</DelayEvent>)|', '${1}X${2}', $data);
But really all I want to do is simulate a "replace text between tags T with X".
Is there a way to do such a thing? In essence I'm trying to prevent having to match all the surrounding data and reassembling it later. I just want to replace a given matched sub-expression with something else.
Edit: The data is not XML, although it does what appear to be tag-like elements. I know better than parsing HTML and XML with RegEx. ;)
It is possible using lookarounds:
$new_data = preg_replace('|(?<=<DelayEvent>)\w+(?=</DelayEvent>)|', 'X', $data);
See it working online: ideone

php preg_match two examples

I need to preg_match for
src="http:// "
where the blank space following // is the rest of the url ending with the ". My adapted doesn't seem to work:
preg_match('#src="(http://[^"]+)#', $data, $match);
And I am also struggling to get text that starts with > and ends with EITHER a full stop . or an exclamation mark ! or a question mark ? I have no idea how to do this one. An example of the text I want to preg_match for is:
blahblahblah>Hello world this is what I want.
I'm hoping a kind preg_match guru can tell me the answer and save me hours of headscratching.
Thanks for reading.
As for the URL:
preg_match('#src="(.*?)"#', $data, $match);
and for the second case, use />(.*?)(\.|!|\?)/
(.*?)" will match any character greedily up until the time it sees the end double quote
It seems that you want to parse a document or string which follows a HTML, DOM, XML or something similiar structure.
Use XPath, and parse to the Tag and let it return the src Attribute, this will save much trouble and you can forget about regular expressions.
Example: CLICK ME

Categories