I'm trying to strip the numeric and punctuations from a string leaving only alpha characters in SIMPLE HTML DOM, with no success I've tried multiple approaches and just can't get it!
Example string: The Amazing Retard (2012) #1
Output string: The Amazing Retard
I understand it's for an undefined method and I've looked at multiple pages for this, however I'm brain farting for how to include the method. Any help would be appreciated. The error that I get is
Fatal error: Call to undefined method simple_html_dom_node::preg_replace() in /home/**/public_html/wp-content/themes/*/***.php on line 123
The code is as follows:
<?php
function scraping_comic()
{
// create HTML DOM
$html = file_get_html('http://page-to-scrape.com');
// get block
foreach($html->find('li.browse_result') as $article)
{
// get title
$item['title'] = trim($article->find('h4', 0)->find('span',0)->outertext);
// get title url
$item['title_url'] = trim($article->find('h4', 0)->find('a.grid-hidden',0)->href);
// get image
$item['image_url'] = trim($article->find('img.main_thumb',0)->src);
// get details
$item['details'] = trim($article->find('p.browse_result_description_release', 0)->plaintext);
// get sale info
$item['on_sale'] = trim($article->find('.browse_comics_release_dates', 0)->plaintext);
// strip numbers and punctuations
$item['title2'] = trim($article->find('h4',0)->find('span',0)->preg_replace("/[^A-Za-z]/","",$item['title2'], 0)->plaintext);
$ret[] = $item;
}
// clean up memory
$html->clear();
unset($html);
return $ret;
}
// -----------------------------------------------------------------------------
$ret = scraping_comic();
if ( ! empty($ret))
{
$scrape = 'http://the-domain.com';
foreach($ret as $v)
{
echo '<p>'.$v['title2'].'</p>';
echo '<p>'.$v['title'].'</p>';
echo '<p><img src="'.$v['image_url'].'"></p>';
echo '<p>'.$v['details'].'</p>';
echo '<p> '.$v['on_sale'].'</p>';
}
}
else { echo 'Could not scrape site!'; }
?>
preg_replace is a php function, not a member of the simple_html_dom_node class. call it like this:
$matches = preg_replace ($pattern, $replacement, mixed $subject);
http://php.net/manual/en/function.preg-replace.php
it looks like your $pattern and replacement are OK; you'll just pass in as the $subject the input you're trying to change.
for example, this might be what you're trying to achieve:
$item['title2'] =
trim(preg_replace("/[^A-Za-z]/","",$article->find('h4',0)->find('span',0));
I think it's because of this line :
// strip numbers and punctuations
$item['title2'] = trim($article->find('h4',0)->find('span',0)->preg_replace("/[^A-Za-z]/","",$item['title2'], 0)->plaintext);
written like this it means that preg_replace is a method of your class simple_html_dom_node which is not as it's standard php function.
you might have in your class something like execute_php_function("a_php_function",anArrayOfArguments)
so you'll write something like this :
// strip numbers and punctuations
$item['title2'] = trim($article->find('h4',0)->find('span',0)->execute_php_function("preg_replace",anArrayOfArguments)->plaintext);
Related
i'm trying to do something in PHP
I'm trying to get the link of an image -> store it to my DB, but I'd like the user to be able to store text before it, and after it, I've gotten my hands on a similar function for links, but the image part is missing.
As you can see the turnUrlIntoHyperlink does a regex check over the entire arg passed, turning the text that contains it to the url, so users can post something like
Hey check this cool site "https://stackoverflow.com" its dope!
And the entire argument posting to my database.
However i can't seem to get the same function working for the Convert Image, as it simply won't post and removed text before/after it before when i made the attempt.
How would i do this in a correct way, and can i combine these 2 functions in to 1 function?
function convertImg($string) {
return preg_replace('/((https?):\/\/(\S*)\.(jpg|gif|png)(\?(\S*))?(?=\s|$|\pP))/i', '<img src="$1" />', $string);
}
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $newLinks){
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
// Create Search and Replace strings
$search = $newLinks;
$replace = ''.$link.'';
$string = str_replace($search, $replace, $string);
}
}
//Return result
return $string;
}
more explained in detail :
When i post a link like https://google.com/ I'd like it to be a href,
But if i post an image like https://image.shutterstock.com/image-photo/duck-on-white-background-260nw-1037486431.jpg , i'd like it to be a img src,
Currently, i'm storing it in my db and echoing it to a little debug panel,
Do you mean that you want to make an <img> inside <a> element?
Your turnUrlIntoHyperlink function have captured the url successfully, so we can just use explode to get string before and after the link.
$exploded = explode($link, $string);
$string_before = $exploded[0];
$string_after = $exploded[1];
Code example:
<?php
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// add http protocol if the url does not already contain it
$newLinks = $url[0][0];
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
$exploded = explode($link, $string);
$string_before = $exploded[0];
$string_after = $exploded[1];
return $string_before.'<img src="'.$link.'">'.$string_after;
}
return $string;
}
echo turnUrlIntoHyperlink('Hey check this cool site https://stackoverflow.com/img/myimage.png its dope!');
Output:
Hey check this cool site <img src="https://stackoverflow.com/img/myimage.png"> its dope!
Edit: the question has been edited
Since an image URL is just another kind of link/URL, your logic should go like this pseudocode:
if link is image and link is url
print <img src=link> tag
else if link is url and link is not image
print <a href=link> tag
else
print link
So you can just write a new function to "merge" those two function:
function convertToImgOrHyperlink($string) {
$result = convertImg($string);
if($result != $string) return $result;
$result = turnUrlIntoHyperlink($string);
if($result != $string) return $result;
return $string;
}
echo convertToImgOrHyperlink('Hey check this cool site https://stackoverflow.com/img/myimage.png its dope!');
echo "\r\n\r\n";
echo convertToImgOrHyperlink('Hey check this cool site https://stackoverflow.com/ its dope!');
echo "\r\n\r\n";
Output:
Hey check this cool site <img src="https://stackoverflow.com/img/myimage.png" /> its dope!
Hey check this cool site https://stackoverflow.com/ its dope!
The basic idea is that since image url is also a link, such check must be done first. Then if it's effective (input and return is different), then do <img> convertion. Otherwise do <a> convertion.
I'm trying to create a simple PHP find and replace system by looking at all of the images in the HTML and add a simple bit of code at the start and end of the image source. The image source has something like this:
<img src="img/image-file.jpg">
and it should become into this:
<img src="{{media url="wysiwyg/image-file.jpg"}}"
The Find
="img/image-file1.jpg"
="img/file-2.png"
="img/image3.jpg"
Replace With
="{{media url="wysiwyg/image-file.jpg"}}"
="{{media url="wysiwyg/file-2.png"}}"
="{{media url="wysiwyg/image3.jpg"}}"
The solution is most likely simple yet from all of the research that I have done. It only works with one string not a variety of unpredictable strings.
Current Progress
$oldMessage = "img/";
$deletedFormat = '{{media url="wysiwyg/';
$str = file_get_contents('Content Slots/Compilied Code.html');
$str = str_replace("$oldMessage", "$deletedFormat",$str);
The bit I'm stuck at is find the " at the end of the source to add the end of the required code "}}"
I don't like to build regular expressions to parse HTML, but it seems that in this case, a regular expression will help you:
$reg = '/=["\']img\/([^"\']*)["\']/';
$src = ['="img/image-file1.jpg"', '="img/file-2.png"', '="img/image3.jpg"'];
foreach ($src as $s) {
$str = preg_replace($reg, '={{media url="wysiwyg/$1"}}', $s);
echo "$str\n";
}
Here you have an example on Ideone.
To make it works with your content:
$content = file_get_contents('Content Slots/Compilied Code.html');
$reg = '/=["\']img\/([^"\']*)["\']/';
$final = preg_replace($reg, '={{media url="wysiwyg/$1"}}', $content);
Here you have an example on Ideone.
In my opinion what you are doing is not the best way this can be done. I would use abstract template for this.
<?php
$content = file_get_contents('Content Slots/Compilied Code.html');
preg_match_all('/=\"img\/(.*?)\"/', $content, $matches);
$finds = $matches[1];
$abstract = '="{{media url="wysiwyg/{filename}"}}"';
$concretes = [];
foreach ($finds as $find) {
$concretes[] = str_replace("{filename}", $find, $abstract);
}
// $conretes[] will now have all matches formed properly...
Edit:
To return full html use this:
<?php
$content = file_get_contents('Content Slots/Compilied Code.html');
preg_match_all('/=\"img\/(.*)\"/', $content, $matches);
$finds = $matches[1];
$abstract = '="{{media url="wysiwyg/{filename}"}}"';
foreach ($finds as $find) {
$content = preg_replace('/=\"img\/(.*)\"/', str_replace("{filename}", $find, $abstract), $content, 1);
}
echo $content;
is there a way to get instagram page the contents of an post using simple php
i did some search and i found this script
$url = 'https://www.instagram.com/pagename/';
$str = file_get_contents($url);
$count = 0;
if(preg_match('#followed_by": {"count": (.*?)}#', $str, $match)) {
$count = $match[1]; // get the count from Regex pattern
}
echo $count;
but it is getting only number of follower is there a way
to get the contents of an Instagram post using same concept ?
Here's a code that works (today). But as #Andy said, it's not reliable and it's dirty af :)
<?php
$source = file_get_contents("https://www.instagram.com/p/POST_ID/");
preg_match('/<script type="text\/javascript">window\._sharedData =([^;]+);<\/script>/', $source, $matches);
if (!isset($matches[1]))
return false;
$r = json_decode($matches[1]);
print_r($r);
// Example to get the likes count
// $r->entry_data->PostPage[0]->graphql->shortcode_media->edge_media_preview_like->count
i took #Andy advice as he it's not reliable and it's dirty af
So this is what i found to go over the html
Instagram change their page markup, your application will break.
is this
$username = 'username';
$instaResult=
file_get_contents('https://www.instagram.com/'.$username.'/media/');
//decode json string into array
$data = json_decode($instaResult);
foreach ($data as $posts) {
foreach($posts as $post){
$postit = (array) json_decode(json_encode($post), True);
/* get post text and image */
echo '<p>' .$postit["caption"]["text"].'</p>';
echo '<img src="'.$postit["images"]["standard_resolution"]["url"].'" />';
echo "</br>-----------</br>";
}
}
I am trying to get the link of a background
<div class="mine" style="background: url('http://www.something.com/something.jpg')"></div>
I am using find('div.mine')
$link = find('div.mine');
$link returns the html code containing all the
How do I parse so it returns only the link?
That syntax isn't quite correct. You're doing $link = find('div.mine'); but that should be $link = $yourHTML->find('div.mine'); instead.
Get all the divs with the class name mine first, loop through them, and get the style attributes. Now you'll have a string like:
background: url('http://www.something.com/something.jpg')
You could then use a CSS Parser (recommended way), or a regular expression to grab just the URL part from that string.
if(preg_match('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $link, $matches)) {
$image_url = $matches[0];
}
Full code:
$html = file_get_html('file.html');
$divs = $html->find('div.mine');
foreach ($divs as $div) {
$link = $div->style;
}
if(preg_match('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $link, $matches)) {
$image_url = $matches[0];
}
echo $image_url;
Output:
http://www.something.com/something.jpg
The URL matching regex pattern is from Wordpress' make_clickable function in wp-includes/formatting.php. See this post for the complete implementation.
try with substr() function to extract the text
I am trying to parse html page of Google play and getting some information about apps. Simple-html-dom works perfect, but if page contains code without spaces, it completely ingnores attributes. For instance, I have html code:
<div class="doc-banner-icon"><img itemprop="image"src="https://lh5.ggpht.com/iRd4LyD13y5hdAkpGRSb0PWwFrfU8qfswGNY2wWYw9z9hcyYfhU9uVbmhJ1uqU7vbfw=w124"/></div>
As you can see, there is no any spaces between image and src, so simple-html-dom ignores src attribute and returns only <img itemprop="image">. If I add space, it works perfectly. To get this attribute I use the following code:
foreach($html->find('div.doc-banner-icon') as $e){
foreach($e->find('img') as $i){
$bannerIcon = $i->src;
}
}
My question is how to change this beautiful library to get full inner text of this div?
I just create function which adds neccessary spaces to content:
function placeNeccessarySpaces($contents){
$quotes = 0; $flag=false;
$newContents = '';
for($i=0; $i<strlen($contents); $i++){
$newContents.=$contents[$i];
if($contents[$i]=='"') $quotes++;
if($quotes%2==0){
if($contents[$i+1]!== ' ' && $flag==true) {
$newContents.=' ';
$flag=false;
}
}
else $flag=true;
}
return $newContents;
}
And then use it after file_get_contents function. So:
$contents = file_get_contents($url, $use_include_path, $context, $offset);
$contents = placeNeccessarySpaces($contents);
Hope it helps to someone else.