## A short list of problems

Hey all,

I am currently doing research to see if HPCC/ECL would be able to speed up some big data problems. Any thoughts would be appreciated.

1. I have large matrices (on the order of millions of rows) and I need to find the Eigenvalues / Eigenvectors.

2. I have large Web server log files and I need to do tabulation and anomaly detection. I don't think the tabulation would be hard, but trying to chain together requests into a graph might be more difficult.

3. I need to rank documents based on some natural language processing (mark it zero, positive, or negative) for a large number of documents. (I did take the Advanced ECL training class, but the pattern primitives seemed to be very similar to regular expressions.)

I would be interested to get any thoughts on these problems.

I am currently doing research to see if HPCC/ECL would be able to speed up some big data problems. Any thoughts would be appreciated.

1. I have large matrices (on the order of millions of rows) and I need to find the Eigenvalues / Eigenvectors.

2. I have large Web server log files and I need to do tabulation and anomaly detection. I don't think the tabulation would be hard, but trying to chain together requests into a graph might be more difficult.

3. I need to rank documents based on some natural language processing (mark it zero, positive, or negative) for a large number of documents. (I did take the Advanced ECL training class, but the pattern primitives seemed to be very similar to regular expressions.)

I would be interested to get any thoughts on these problems.

- cmastrange3
**Posts:**3**Joined:**Fri Jun 17, 2011 4:39 am

1. I have large matrices (on the order of millions of rows) and I need to find the Eigenvalues / Eigenvectors.

Well - ECL does not have direct support for Matrix computation built in. I have done matrix math of sparse matrices using a record of the form: xpos, ypos, value. Things such as add, subtract and multiply a fairly straightfoward. Determinants are a rather more interesting proposition - of course I last did that in college which was probably before you were born ...

If you want to start a matrix library; especially if you want to share it; I would happy to help with the ECL side - you may need to supply (or at least remind me of) the math!

What do you mean by 'chain together requests' - we have done significant work internally one this - I did not encounter any significant issues2. I have large Web server log files and I need to do tabulation and anomaly detection. I don't think the tabulation would be hard, but trying to chain together requests into a graph might be more difficult.

3. I need to rank documents based on some natural language processing (mark it zero, positive, or negative) for a large number of documents. (I did take the Advanced ECL training class, but the pattern primitives seemed to be very similar to regular expressions.)

Natural language processing is a highly abused term; right up there with 'Artificial Intelligence'. What exactly are you trying to do? The ECL 'NLP' capability has two different grammers within it one is similar to RE although with extensions (extremely similar to Snobol 4 if you are familiar with that). The other is a Tomita parser ...

David

- dabayliss
- Community Advisory Board Member
**Posts:**109**Joined:**Fri Apr 29, 2011 1:35 pm

2 posts
• Page

**1**of**1**### Who is online

Users browsing this forum: Bing [Bot] and 1 guest