Challenge Three: A Computer Engineering Problem

Gwanky · PostPosted:Wed Oct 24, 2007 7:46 pm

Multicore is not bad, however especially for something like gaming an out of order execution core could handle most situations as fast at half the development cost as six in order execution cores. However as a mainly PC gamer with a dual core Opteron I am happy because now everything will be optimized for dual core.

prophile · PostPosted:Thu Oct 25, 2007 3:49 pm

OpenMP!

Tycho · PostPosted:Fri Oct 26, 2007 2:51 am

Quote:

OpenMP!

The biggest issue isn't how to implement multithreading but how to properly serialize operations that cannot yet be done in parallel (for example, inserting into a tree without screwing up node linkage and balanced structure). Most programmers don't quite understand locks and get them wrong.

Tycho · PostPosted:Fri Oct 26, 2007 3:12 am

Come on, folks, let's see some more potential solutions! We're here to help you learn.

Tycho · PostPosted:Sun Nov 11, 2007 4:55 am

Wow, I can almost hear the crickets in here.

Fattierob · PostPosted:Mon Nov 12, 2007 9:43 am

Hm. Well, I thought I would add in my random, quirky idea that I just had.

Basically, two threads, hopefully on a multi-core cpu.

One thread is sleeping and the other begins by going through the whole archive and takes out every entry beginning with the nth letter of the alphabet (let's say a for now). It then sends that array to the second thread (you'll have to excuse my wording - I only vaguely understand how threading works) and the second thread begins to quick sort the letter 'a' array. While that thread is quick sorting (hopefully just in main memory, otherwise the read/write's would clash with the first thread and reduce the speed of this idea) that array, the first thread goes through the (now smaller) archive and the process repeats.

When the second thread finishes sorting one letter 'n' array, it saves it to the disk (by either starting a new file or appending it to the end of the last file) it saved to, and then waits for the other thread to wake it up to begin again. If it, by any chance the second thread doesn't finish before the first thread sends it a new archive, then their should be a buffer of sorts, now that I think about it.

Pros:
Overall time is reduced as two instructions are done at once because of threading
If the letter 'n' array is done in main memory, the only kind of I/O needed is reading from the archive and writing down the second thread's quicksort

cons:
Would not work well on single core cpu's, or even worse
If the main memory is not big enough to hold 4gb (100gb archive / 26 letters of alphabet) on average of data, then their will be problems

tl;dr - two threads, one selects data, other quicksorts with a buffer. Probably not useable at all.

UnLightable · PostPosted:Mon Nov 12, 2007 1:19 pm

Fattierob's idea looks good, if all the data can be divided on equal groups, in his case - name starts from one letter.
But if all the records will start from letter "a" for example - algorithm sucks.
I think, that splitting is good.
So we need to break all data into "small" pices (that can be just stacked up together, after being sorted, to get our solution (*)), that can be fastly sorted in memory AND we can predict piece position.
First letter is the simpliest example - there is 26 groups, and we know that group "a" is first, "b" is next, etc.
But we can't hardcode breaking method, we need to do one pass to determine key ranges and break the data correctly to groups.
Proc of that solution is multithreading, all groups can be sorted by separate thread easily in one time, or one-by-one.
Con is exceeding pass, but it is not a big deal, i think. it just read the keys, not the full data. if the structures has equal size, like 1024 bytes, its easy.
The stuck point is there is no (or i just can't mention) good way to make-up a function, that can predict bounds of a group, to make them equal and save the (*) statement.
But i think that elegant way to calculate that borders exist.

Another "for-fun" solution in neural network, that will predict the position of a record in sorted set.
It need's waay much time to develop, even more time to learn, but the result can be awesome.
Position of a record can be predicted with high accuracy, and it needs almost no time at all.
Biggest con is sorted data will be ALMOST sorted (:
It still needs some small corrections, like switch some records in places, and even after that there is no 100% guarantee it will be totally sorted out.
But proc is SPEED! it is like O(n), just read-judge-write.

And sorry, English is not my native language.

Fattierob · PostPosted:Mon Nov 12, 2007 9:58 pm

I've thought about it a bit more, after being awake for more then an hour.

Basically, the idea you're suggesting are hashing - where you break it up into all possible choices, make buffers for it, and then start sorting like that. (Note: I am completely and utterly talking out of a five minute skim of my textbook on that.) The problem with hashing is you need to have all the buffers ready for all choices, and you have to have them big enough so they don't overflow (or dynamic, which makes more overhead)

Now, what I was thinking was something like that, but revised abit to be threaded and a bit optimized

1st thread reads through the archive sequently. It checks the first letter of the array to n. If it is n, it sends that 1kb of data to the second thread. If it finds no more n's in the archive, it sends a signal to the second thread to write what it has sorted to disk. Then it increments n.

Now, the second thread wakes up when it gets this data, and does some things
1) Drops it into an empty array if it's the first n
2) If it's not the first n, it does a sort that I don't have the name for. Basically, you look at the middle data entry(let's call it the pivot) of the 'n' array and see if what you're trying to drop into it is before or after. If it's before, you take from zero to pivot/2 and repeat until you find your spot. Same idea for if it's after. from pivot to max, however.
3) If it receives a signal(method call, I guess), it writes the now-sorted data to the disk.

example:
1st thread sends ABC to the second thread. that thread's 'n' array contains "ABC". 1st thread sends "ACB". The 'n' array now contains "ABC", "ACB". 1st thread sends "AAB". The 'n' array realizes that 'AAB' comes before 'abc' and places it before it (since thats the only spot left). the array now contains 'AAB','ABC,"ACB". This continues until the 1st thread tells the second thread to write to disk, which it does so.

Pros:
Threaded and easy for dual core cpus
The second thread will not spend much time sorting, as it is already sorted when a new piece of data requests to be entered into it.

Cons:
if the program crashes, it needs to pick up at the last letter
Would not work well on single-core
Memory hog - 'n' array of 4gb average? however, you can't write to disk in chunks before you're done sorting. hm.

All in all, a good thought exercise. I should ask my CS teacher what she would do.

Tycho · PostPosted:Mon Nov 12, 2007 11:41 pm

Okay, you guys are making a huge mistake by doing a division based on the 26 letters of the alphabet. Your assumption is that the data is evenly distributed, which is almost never the case, and if the data is heavily skewed (i.e. not evenly distributed) you could end up with a pathological case in which all the data ends up in one or two of the buckets, in which case you've made no progress.

You need to have a method that has no prior knowledge of the data. It's also best to not need to pass over the data more than once because of how expensive disk reads/writes are.

Fattierob · PostPosted:Tue Nov 13, 2007 10:45 am

Hm. Well, if you don't want to pass over the data more the once, you read the data from the disk, and push it to a method that does the sorting I described earlier. Perhaps even having 26 threads for each letter? Of course, like you said Tycho, some threads would be worked more heavier then others, and this would be pretty bad.

Interesting, interesting. I can't wait to see what the optimum answer is.

Tycho · PostPosted:Thu Nov 15, 2007 5:47 pm

Quote:

Hm. Well, if you don't want to pass over the data more the once, you read the data from the disk, and push it to a method that does the sorting I described earlier. Perhaps even having 26 threads for each letter? Of course, like you said Tycho, some threads would be worked more heavier then others, and this would be pretty bad.

Interesting, interesting. I can't wait to see what the optimum answer is.

With anything more than 3 or 4 threads, you start seeing diminishing returns from multithreading. There's a reason in hardware for why the speed increase isn't linear or exponential. I could explain, but I feel like I'd bore you.

Gwanky · PostPosted:Thu Nov 15, 2007 6:29 pm

Fattierobs, approach would work well with a multi CPU server.

Tycho · PostPosted:Thu Nov 15, 2007 6:40 pm

Quote:

Fattierobs, approach would work well with a multi CPU server.

I guess you didn't read what I said. There's a point of diminishing returns to multithreading. I suppose I didn't appropriately state that this especially affects multi-CPU systems, especially when more than one thread is accessing a given memory page.

Just wait for transactional memory before you start considering multithreading seriously.

Gwanky · PostPosted:Thu Nov 15, 2007 6:56 pm

What about a program that has two algorithms, the first one sorts it by letter, and the second algorithm sorts within the subgroups. I don't know C++ that well, but here would be the psuedocode.

Organize data into subgroups by first letter of last name.
Have a parallel thread use qsort or some other algorithm on various subgroups that the first algorithm created.

I could be wrong, but I believe this would be faster than an approach of running qsort on one giant block of data, as it would only have to parse the first letter.

Forgive me if I'm making no sense.

Tycho · PostPosted:Thu Nov 15, 2007 7:00 pm

The only way for a multithreaded approach to work quickly would be to calculate the size of a memory page, and then be certain that your two datablocks from the file are on separate pages. If that's the case, then it can easily be a multithreaded task, and would theoretically be faster. The hardware limitation that would be most hampering is disk I/O.

Challenge Three: A Computer Engineering Problem

Who is online