Sun Dec 16, 2018 10:28 pm
Login Register Lost Password? Contact Us


A short list of problems

Questions around writing code and queries

Tue Jun 28, 2011 9:01 pm Change Time Zone

Hey all,

I am currently doing research to see if HPCC/ECL would be able to speed up some big data problems. Any thoughts would be appreciated.

1. I have large matrices (on the order of millions of rows) and I need to find the Eigenvalues / Eigenvectors.

2. I have large Web server log files and I need to do tabulation and anomaly detection. I don't think the tabulation would be hard, but trying to chain together requests into a graph might be more difficult.

3. I need to rank documents based on some natural language processing (mark it zero, positive, or negative) for a large number of documents. (I did take the Advanced ECL training class, but the pattern primitives seemed to be very similar to regular expressions.)

I would be interested to get any thoughts on these problems.
cmastrange3
 
Posts: 3
Joined: Fri Jun 17, 2011 4:39 am

Thu Jun 30, 2011 4:29 pm Change Time Zone

1. I have large matrices (on the order of millions of rows) and I need to find the Eigenvalues / Eigenvectors.

Well - ECL does not have direct support for Matrix computation built in. I have done matrix math of sparse matrices using a record of the form: xpos, ypos, value. Things such as add, subtract and multiply a fairly straightfoward. Determinants are a rather more interesting proposition - of course I last did that in college which was probably before you were born ...
If you want to start a matrix library; especially if you want to share it; I would happy to help with the ECL side - you may need to supply (or at least remind me of) the math!

2. I have large Web server log files and I need to do tabulation and anomaly detection. I don't think the tabulation would be hard, but trying to chain together requests into a graph might be more difficult.
What do you mean by 'chain together requests' - we have done significant work internally one this - I did not encounter any significant issues

3. I need to rank documents based on some natural language processing (mark it zero, positive, or negative) for a large number of documents. (I did take the Advanced ECL training class, but the pattern primitives seemed to be very similar to regular expressions.)

Natural language processing is a highly abused term; right up there with 'Artificial Intelligence'. What exactly are you trying to do? The ECL 'NLP' capability has two different grammers within it one is similar to RE although with extensions (extremely similar to Snobol 4 if you are familiar with that). The other is a Tomita parser ...

David
dabayliss
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 109
Joined: Fri Apr 29, 2011 1:35 pm


Return to Programming

Who is online

Users browsing this forum: Bing [Bot] and 1 guest

cron