I am having some problems trying to datamine this site, maybe you can help?
$content = file_get_contents('http://store.steampowered.com/app/8190/');
$regexp='#(.*?)#';
preg_match($regexp,$content,$string1);
print_r($string1);
This code doesn't seem to work, maybe it seems obvious to you? Thanks :)
You have not escaped the hyperlink tag, try this:
$regexp='#<a\s+href\s+=\s+"http://store\.steampowered\.com/search/\?category2=2"\s+class\s+=\s+"name">(.*?)</a>#';
You need to escape parts of the string that you do not want parsed as special regex characters such as '?'.
Related
I have a blog page on my website where a user edit's a post by going to a URL like this... http://www.example.com/blog?edit=blog post here. The script used to replace the spaces with %20 like it should but now it is replacing the spaces with %2520 and now the script can't search the database because there is no post called blog20post20here. I was going to go down the path of preg_replace, so I tried this...
preg_replace("/%2520/"," ",$_GET['edit']);
but that didn't seem to work.
I have never used preg_replace() and I just now read up on it in the manual. If someone could either point me down the right path and or show me how to correctly use preg_replace that would be awesome.
Sounds like you're double-escaping somewhere when generating the urls. %25 is the coding for the % character, so it sounds like it's going from %20 to %2520.
As an aside, there's better ways to decode that url (urldecode() for example), so perhaps preg_replace isn't really necessary...
EDIT: oh, and you should just use urlencode to generate the url in the first place.
For %2520
<?php echo urldecode(urldecode($_GET['edit'])); ?>
For %20
<?php echo urldecode($_GET['edit']); ?>
I'm trying to write a regular expression for a redirect and not having any luck. In this example, an old URL might exist like this:
example.com/about-us/Default.asp
example.com/the-team/Default.asp
Which I want to redirect to:
example.com/about-us/
example.com/the-team/
I've come up with this:
/(\d*)/Default.asp
Which doesn't work...
I've also tried this:
/(\d*)/Default\.asp
As I thought there might be a problem with not having an escape char for the '.', still no luck. Can anyone see what I'm doing wrong?
Got it working thanks to what minitech pointed out:
/(.*)/Default.asp$
worked a treat! Thanks
Since you only need to remove the "Default.asp", you only have to search for that. The regex would look something like this
/Default\.asp/
The dot being escaped since the dot is a special character.
If you're using php, you can do a simple preg_replace
preg_replace('/Default\.asp/', '', 'example.com/about-us/Default.asp');
I have a form into which I entered a newline character which looked correct when I entered it, but when the data is now pulled from the database, instead of the white space, I get the \n\r string showing up.
I try to do this:
$hike_description = nl2br($hike_description);
But it doesn't work. Does anyone know how this can be fixed? I am using PHP.
And here is the page where this is happening. See the description section of the page:
http://www.comehike.com/hikes/scheduled_hike.php?hike_id=130
Thanks,
Alex
Does anyone know how this can be fixed?
Sure.
Your code doing unnecessary escaping, most likely before adding text to the database.
So, instead of replacing it back, you have to find that harmful code and get rid of it.
This means, you have probably plain text '\n\r' strings in the db.
Try to sanitize db output before display:
$sanitized_text = preg_replace('/\\[rn]/','', $text_from_db);
(just a guess).
Addendum:
Of course, as Col. Shrapnel pointed out, there's something fundamentally wrong
with the contents of the database (or, it is used this way by convention and you don't know that).
For now, you have fixed a symptom partially
but it would be much better to look for the reason for these escaped characters
being in the database at all.
Regards
rbo
You can use str_replace to clean up the input.
$hike_description = nl2br(str_replace("\r\n", "\n", $hike_description));
$hike_description = str_replace(array('\n','\r'),'',$hike_description);
You may want to read up on the differences between the single quote and double quote in PHP as well: http://php.net/manual/en/language.types.string.php
I'm having a lot of difficulty matching an image url with spaces.
I need to make this
http://site.com/site.com/files/images/img 2 (5).jpg
into a div like this:
.replace(/(http:\/\/([^\s]+\.(jpg|png|gif)))/ig, "<div style=\"background: url($1)\"></div>")
Here's the thread about that:
regex matching image url with spaces
Now I've decided to first make the spaces into entities so that the above regex will work.
But I'm really having a lot of difficulty doing so.
Something like this:
.replace(/http:\/\/(.*)\/([^\<\>?:;]*?) ([^\<\>?:;]*)(\.(jpe?g|png|gif))/ig, "http://$1/$2%20$3$4")
Replaces one space, but all the rest are still spaces.
I need to write a regex that says, make all spaces between http:// and an image extension (png|jpg|gif) into %20.
At this point, frankly not sure if it's even possible. Any help is appreciated, thanks.
Trying Paolo's escape:
.escape(/http:\/\/(.*)\/([^\<\>?:;]*?) ([^\<\>?:;]*)(\.(jpe?g|png|gif))/)
Another way I can do this is to escape serverside in PHP, and in PHP I can directly mess with the file name without having to match it in regex.
But as far as I know something like htmlentities do not apply to spaces. Any hints in this direction would be great as well.
Try the escape function:
>>> escape("test you");
test%20you
If you want to control the replacement character but don't want to use a regular expression, a simple...
$destName = str_replace(' ', '-', $sourceName);
..would probably be the more efficient solution.
Lets say you have the string variable urlWithSpaces which is set to a URL which contains spaces.
Simply go:
urlWithoutSpaces = escape(urlWithSpaces);
What about urlencode() - that may do what you want.
On the JS side you should be using encodeURI(), and escape() only as a fallback. The reason to use encodeURI() is that it uses UTF-8 for encoding, while escape() uses ISO Latin. Same problems applies for decoding.
encodeURI = encodeURI || escape;
alert(encodeURI('image name.png'));
I am using tinyMCE and, rather annoyingly, it replaces all of my apostrophes with their HTML numeric equivalent. Now most of the time this isn't a problem but for some reason I am having a problem storing the apostrophe replacement. So i have to search through the string and replace them all. Any help would be much appreciated
did you try:
$string = str_replace("'", "<replacement>", $string);
Is it just apostrophes that you want decoded from HTML entities, or everything?
print html_entity_decode("Hello, that's an apostophe.", ENT_QUOTE);
will print
Hello, that's an apostrophe.
Why work around the problem when you can fix the cause? You can just turn of the TinyMCE entity encoding*. More info: here
*Unless you want all the other characters encoded, that is.