Mon Aug 20, 2018 8:04 pm
Login Register Lost Password? Contact Us


Pulling data from social sites

Post questions or comments on how best to manage your big data problem

Mon Jan 28, 2013 10:24 am Change Time Zone

I went through the Introduction(white paper PDF) of HPCC and also through the Sentilyze use-case which uses a csv as input.

The Introduction pdf claims that HPCC can pull data from external web sites.

How can this be achieved for social sites like FB,Twitter etc.

Note: How to specify the record structure for pulling such data?Will NLP support of HPCC be required?

Thanks and regards !
kaliyugantagonist
 
Posts: 43
Joined: Mon Jul 23, 2012 11:23 am

Tue Jan 29, 2013 5:03 pm Change Time Zone

One of our developers has used the Twitter API (https://dev.twitter.com/docs/api) to collect tweets.

1/ a linux app to harvest tweets
2/ a javascript app to do selective tweet gets
3/ a linux app callable from ECL using PIPE to do selective tweet gets

She basically used 2 different approaches, but since the Twitter API has been evolving, I am not sure whether both are still available/supported.
1/ She was repeatedly calling the API to get all the tweets. We needed that to create a reasonable training set for our ML classifier
2/ She calls the API a few times, passing in a specific term to filter the tweets by. We needed that to get a set of tweets associated with specific topic

In both cases, the app has to be written to keep calling the twitter api, and the code should be written in such a way not to get the twitter service upset :-)

Regards,

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm

Wed Jan 30, 2013 3:12 pm Change Time Zone

Hi bob,

Thanks for the reply :)

I'm totally clueless about the ECL PIPE which is faintly mentioned in the HPCC Introduction documentation - please refer the below question.
http://hpccsystems.com/bb/viewtopic.php?f=10&t=722&sid=22459bc0c057c631e3d7fc685ffe6fa3

Where do I get hold of ECL PIPE - documentation, examples etc. ?

Thanks and regards !
kaliyugantagonist
 
Posts: 43
Joined: Mon Jul 23, 2012 11:23 am

Wed Jan 30, 2013 3:39 pm Change Time Zone

The PIPE ECL function is located in the Language Reference Manual.
You can also just type in the word "PIPE" in any ECL file in the ECL IDE, and press the F1 key.

I will have a look at your other post.

Regards,

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 975
Joined: Wed Jun 29, 2011 7:13 pm


Return to Managing Big Data

Who is online

Users browsing this forum: No registered users and 1 guest

cron