Tue Jul 07, 2020 10:37 pm
Login Register Lost Password? Contact Us


Accessing internet data

Post questions or comments related to the Virtual Machine

Sat Jul 21, 2012 12:17 am Change Time Zone

I have completed HPCC data tutorial. I have gone through the HPCC documentation (HPCCDataHandling.pdf) and understand that HPCC works with data files.

I want to know if there is way a to configure HPCC to work or to access internet data.
mrudul
 
Posts: 7
Joined: Sat Jul 21, 2012 12:08 am

Mon Jul 23, 2012 3:34 pm Change Time Zone

ECL has powerful language tools that support a variety of parsing options. If the data extracted from the internet is in XML format, you can spray and parse XML directly. If the internet data is raw text, you can use PARSE for free form text parsing. Any file can be sprayed as a variable length file, and then parsing applied to it using ECL. To quote the Language Reference Manual:

Natural Language Parsing is accomplished in ECL by combining pattern definitions with an output RECORD structure specifically designed to receive the parsed values, then using the PARSE function to perform the operation.
Pattern definitions are used to detect "interesting" text within the data. Just as with all other attribute definitions, these patterns typically define specific parsing elements and may be combined to form more complex patterns, tokens,
and rules.
The output RECORD structure (or TRANSFORM function) defines the format of the resulting recordset. It typically contains specific pattern matching functions that return the "interesting" text, its length or position.
The PARSE function implements the parsing operation. It returns a recordset that may then be post-processed as needed using standard ECL syntax, or simply output.


There are a number of resources on this site to help get you started.

http://hpccsystems.com/download/docs/six-degrees%20
Shows how to parse and format an IMDB movie file.

http://hpccsystems.com/Why-HPCC/case-studies/engauge-pinterest
Links to an article where a partner of ours uses ECL to process sentiment data extracted from twitter

http://hpccsystems.com/download/docs/machine-learning
The Machine Learning libraries also have a section on document parsing if you are interested.

Finally, refer to the Language Reference and the PARSE statement for some great examples, and also review the section on PATTERN, RULE and TOKEN.

Hope this helps!

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1002
Joined: Wed Jun 29, 2011 7:13 pm

Wed Jul 25, 2012 1:52 am Change Time Zone

Thanks Bob, that was helpful. I have been going through various documentation that's available on the site.

I am trying to do a small proof of concept and have a question on the SOAPCALL.

I have some data in data files which I will load in THOR and build queries to fetch the data. I also want to invoke a SOAP service (external service on the internet) to fetch another set of data and then process the same.

I was reading about the SOAPCALL and wanted to know if this function works only with SOAP service or does it also work with XML over HTTP.

Thank you once again for replying to the post.
mrudul
 
Posts: 7
Joined: Sat Jul 21, 2012 12:08 am


Return to VM Image

Who is online

Users browsing this forum: No registered users and 1 guest

cron