getting number from external website

getting number from external website - php

I need to get a number from this website: Current STC price which displays a market driven figure: STCs.
i tried this:
$html = file_get_contents('http://www.greenenergytrading.com.au/certificates/todays-pricing');
$html = strip_tags($html);
which leaves me with a long string. I then tried to remove anything before the figure I'm after, assuming that the text wont change:
$html = preg_replace('/.*Current STC price/', '', $html);
However, this doesnt work. it seems to work on online RexExp tester but not in production. also, is this a reasonable approach?
cheers

You can use preg_match with the $matches parameter provided to extract all ocurrences of from the website source and store them in an array. Then, just access the first element of the array.
Check out the documentation for preg_match here:
http://php.net/manual/en/function.preg-match.php
EDIT: Oh, I just saw your comment that you tried preg_match already. What regexp's have you tried? Have you tried something like "/$[0-9]{1}/" ?

Related

preg_match Follower Count on Google+

I searched online for a while and there is no way right now to get the Follower count of Google+ Profiles.
My plan now was to get the webpage code with file_get_contents and then to use preg_match.
But I´m using this function the first time and I have no clue how to use it. I read online a bit about it but I dont understand it.
The pattern is quite simple. There is a Number with periods/dots (.) on every thousand, a space and then the word "Follower".
How can I express that as the pattern for preg_match ?
And I read something about preg_replace which I could then use to replace the periods/dots (.) with nothing. Am I right ?
Thanks a lot !
Regards
Selfster

There is an API for that.
Here is the reference.

It's not a fool proof method but it works.
$test = file_get_contents("https://plus.google.com/+SundarPichai");
$res = preg_match_all("%\b([\.\,\d]+) followers%",$test, $output, PREG_SET_ORDER);
var_dump($output);
The regex "%\b([\.\,\d]+) followers%" starts with a non alphanumeric character and than takes commas and points (depending on the localization).
I think it by default falls back to the American (thats why there are comma's) and than followed by a space and the word followers

First, you need a Google API key, then it works like this:
id is like 103382528845345115881
key like AIzaSyAqlZ1MJSGXMSs8a5WbfvLpZTGJeHLVc2w
https://www.googleapis.com/plus/v1/people/<id>?key=<key>
The field you need is this: "circledByCount": 10,
Shortes form to use is like this:
<?php
$followers = json_decode(file_get_contents($url), true);
$followers = isset($followers['circledByCount'])
? $followers['circledByCount'] : 0;
Only works when the followers amount is publicily visible.

Preg_match somehow not finding part of string

I am having a problem with preg_match in that it is not returning anything. While according to: http://gskinner.com/RegExr/?=35ls9
It should be functioning properly.
This is my current code:
$string == <a class="twitter-timeline" href="https://twitter.com/...." data-widget-id="352777062139922223">....</a>...
its simply the embed code twitter throws out when creating a widget. Was also included in the example.
$string = get_field('twitter_feed'); //contains the string.
preg_match('/data-widget-id="([0-9]*)"/', $string, $match);
var_dump($match);
Its probably something really simple that i am missing. Hopefully somebody is able to help me with this problem.
edit: added the sample string.

Test it with the following string. I did, and it works fine:
$string = 'data-widget-id="352777062139922223"';
Make sure that get_field is returning a string in that form.

Need PHP Regex help

I've been working on this simple script all day trying to figure it out. I'm new to regex so please keep that in mind. On top of that, I've tried just about anything and everything I could to get this to work.
I'm trying to (to learn, please don't point me to the API) download a TSV file from Yahoo Site Explorer via either cURL or file_get_contents (both work, just messing with different things) and then using regex to get only the URL column to appear. I realize I might have more luck with other functions, but I can't find anything dealing with TSV and now it's become a challenge. I've literally spent the entire day trying to get this correct.
So a URL would be:
https://siteexplorer.search.yahoo.com/search?p=www.google.com&bwm=i&bwmo=&bwmf=s
And my regex currently looks like this (I know it's horrible...it's probably the millionth attempt):
preg_match_all('((http(s?)://?(([^/]+(\/.+))))^[\t]$)', $dl, $matches);
My issue right now is that there's 4 columns. TITLE URL SIZE FORMAT. I'm able to strip out everything from the first column (TITLE) and the last (FORMAT) column, but I cannot seem to strip out the SIZE column and get rid of the last slash in case the sites linking in don't have that last slash.
Another thing - I've actually accomplished getting JUST the URL to appear, but they all had ending slashes which leave out links from, say, Twitter.
Any help would be greatly appreciated!

Don't know much about PHP, but this regex works in python (should be the same in PHP):
".+?\t(.+?)\t.*"
Just match it and get the content of group 1. FWIW, code in Python:
import re
import fileinput
urlre = re.compile(".+?\t(.+?)\t.*")
for line in fileinput.input():
m = urlre.match(line)
if m:
print m.group(1)

Personally, I'd split the lines by tab. For example:
$stuff = file_get_contents($url);
// split the whole file by newlines, to get an array of lines
$lines = explode("\n", $stuff);
// loop through the lines
foreach ($lines as $line) {
// split by tab
$parts = explode("\t", $line);
// put the URLs in a list
$urls[] = $parts[1];
// or keep track of them by title
$urls[$parts[0]] = $parts[1];
// or whatever...
}

Just use parse_url or parse_str instead. Always try to find anything else than regular expressions which are extremely slow.

Regexp for cleaning the empty, unnecessary HTML tags

I'm using TinyMCE (WYSIWYG) as the default editor in one of my projects and sometimes it automatically adds <p> </p> , <p> </p> or divs.
I have been searching but I couldn't really find a good way of cleaning any empty tags with regex.
The code I've tried to used is,
$pattern = "/<[^\/>]*>([\s]?)*<\/[^>]*>/";
$str = preg_replace($pattern, '', $str);
Note: I also want to clear &nbsp too :(

Try
/<(\w+)>(\s| )*<\/\1>/
instead. :)

That regexp is a little odd - but looks like it might work. You could try this instead:
$pattern = ':<[^/>]*>\s*</[^>]*>:';
$str = preg_replace($pattern, '', $str);
Very similar though.

I know it's not directly what you asked for, but after months of TinyMCE, coping with not only this but the hell that results from users posting directly from Word, I have made the switch to FCKeditor and couldn't be happier.
EDIT: Just in case it's not clear, what I'm saying is that FCKeditor doesn't insert arbitrary paras where it feels like it, plus copes with pasted Word crap out of the box. You may find my previous question to be of help.

You would want multiple Regexes to be sure you do not eliminated other wanted elements with one generic one.
As Ben said you may drop valid elements with one generic regex
<\s*[^>]*>\s*` `\s*<\s*[^>]*>
<\s*p\s*>\s*<\s*/p\s*>
<\s*div\s*>\s*<\s*/div\s*>

Try this:
<([\w]+)[^>]*?>(\s| )*<\/\1>

Scrape a price off a website

I'm trying to scrape a price from a web page using PHP and Regexes. The price will be in the format £123.12 or $123.12 (i.e., pounds or dollars).
I'm loading up the contents using libcurl. The output of which is then going into preg_match_all. So it looks a bit like this:
$contents = curl_exec($curl);
preg_match_all('/(?:\$|£)[0-9]+(?:\.[0-9]{2})?/', $contents, $matches);
So far so simple. The problem is, PHP isn't matching anything at all - even when there are prices on the page. I've narrowed it down to there being a problem with the '£' character - PHP doesn't seem to like it.
I think this might be a charset issue. But whatever I do, I can't seem to get PHP to match it! Anyone have any ideas?
(Edit: I should note if I try using the Regex Test Tool using the same regex and page content, it works fine)

Have you try to use \ in front of £
preg_match_all('/(\$|\£)[0-9]+(\.[0-9]{2})/', $contents, $matches);
I have try this expression with .Net with \£ and it works. I just edited it and removed some ":".
(source: clip2net.com)
Read my comment about the possibility of Curl giving you bad encoding (comment of this post).

maybe pound has it's html entity replacement? i think you should try your regexp with some sort of couching program (i.e. match it against fixed text locally).
i'd change my regexp like this: '/(?:\$|£)\d+(?:\.\d{2})?/'

This should work for simple values.
'#(?:\$|\£|\€)(\d+(?:\.\d+)?)#'
This will not work with thousand separator like 234,343 and 34,454.45.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

getting number from external website - php

Related

preg_match Follower Count on Google+

Preg_match somehow not finding part of string

Need PHP Regex help

Regexp for cleaning the empty, unnecessary HTML tags

Scrape a price off a website

Categories

Resources