Challenge Three: A Computer Engineering Problem

Lorehead · **Joined:**Thu Aug 09, 2007 9:09 am **Posts:**26

Quote:

I have no clue how hard or how complicated the program would be or if its even possible, thats all I can think of for now.

The algorithm to use with a multiple-file approach would be merge sort. I would use quicksort instead, because:

It's usually faster, even with both data sets in memory
It uses less memory
Disk I/O is very slow, so optimizing performance involves doing as little of it as possible
Quicksort is part of the C standard library, so we don't need to reimplement it
We're sorting a data file that we ourselves created

If that last item were not true, I would not suggest quicksort, because on the wrong data, it becomes a very, very slow sort. A data file missorted into exactly the wrong order might tie up our server for a long time. If that's a possibility, I'd use the introsort algorithm instead.

An additional benefit of my approach is that it's a lot faster to move a few dozen bytes around than two kilobytes. One point in favor of a merge-sort approach (without the rigmarole of temporary files) is that seeking on a magnetic disk is especially slow, so you want as much of your disk I/O as possible to be in sequential order. This is entirely hardware-dependent, though; if you're using a solid-state drive, the opposite is true.

Tycho · PostPosted:Thu Oct 11, 2007 10:10 pm

Quote:

Well, lets say there is a file, 1000 mb (1 gb),

1. Split (5 pieces). (Or other size so its possible to fit it in ram)
2. Organize by A,B,C...
3. Combine then split again this time into letter groups.
4. Now you have a file for every letter group that is organized into alphabetical order.

Do this again if you want it organized by area code or something.

I have no clue how hard or how complicated the program would be or if its even possible, thats all I can think of for now.

Interestingly, this is the method that the head of the computer science department here suggested. I'll explain later (I can't give the solution out!) why it's not the best way.

prophile · PostPosted:Fri Oct 12, 2007 10:37 am

I wouldn't TOUCH QuickSort.

Lorehead · **Joined:**Thu Aug 09, 2007 9:09 am **Posts:**26

Quote:

I wouldn't TOUCH QuickSort.

May I ask why not? It's the fastest general-purpose sorting algorithm ever invented. On top of that, it's in the standard library. The only complication here is that the author posited too large a data file to fit into main memory (on most systems). Converting the data into a representation that will fit into memory and sort much faster has a lower time complexity than sorting it, O(N) as compared to O(N log N), so my approach should be very fast for high N, unless I've misread the problem.

You can't even mmap a 100GB file on a 32-bit system: take another look at the POSIX spec. Your solution would work only on a 64-bit system with 200GB of virtual memory, and even then it would be slower. The only advantage merge sort has for this problem is that it's very sequential, but even that becomes irrelevant as soon as you plug in a solid-state drive.

Lorehead · **Joined:**Thu Aug 09, 2007 9:09 am **Posts:**26

Quote:

Interestingly, this is the method that the head of the computer science department here suggested. I'll explain later (I can't give the solution out!) why it's not the best way.

A k-way merge sort?

Tycho · PostPosted:Fri Oct 12, 2007 4:18 pm

Nope.

prophile · PostPosted:Fri Oct 12, 2007 4:19 pm

Quote:

I wouldn't TOUCH QuickSort.

May I ask why not? It's the fastest general-purpose sorting algorithm ever invented. On top of that, it's in the standard library. The only complication here is that the author posited too large a data file to fit into main memory (on most systems). Converting the data into a representation that will fit into memory and sort much faster has a lower time complexity than sorting it, O(N) as compared to O(N log N), so my approach should be very fast for high N, unless I've misread the problem.

You can't even mmap a 100GB file on a 32-bit system: take another look at the POSIX spec. Your solution would work only on a 64-bit system with 200GB of virtual memory, and even then it would be slower. The only advantage merge sort has for this problem is that it's very sequential, but even that becomes irrelevant as soon as you plug in a solid-state drive.

The fastest sorting algorithm is MergeSort I believe, or HeapSort. *NOT* QuickSort.

And it's not in the standard library, qsort != QuickSort.

Lorehead · **Joined:**Thu Aug 09, 2007 9:09 am **Posts:**26

Quote:

The fastest sorting algorithm is MergeSort I believe, or HeapSort. *NOT* QuickSort.

I'm afraid that you're mistaken. Quicksort is faster than either on average, just not in the worst case. Please read the paper I linked, or just Google the algorithm.

Don't take my word for it. Here are what the first five non-Wikipedia hits have to say:
Dr. Hans Werner Lang: "Quicksort turns out to be the fastest sorting algorithm in practice."
John Morris: "Empirical studies show that generally quick sort is considerably faster than heapsort."
(The next link says nothing about performance.)
National Institute of Standards and Technology: "In practical situations, a finely tuned implementation of quicksort beats most sort algorithms, including sort algorithms whose theoretical complexity is O(n log n) in the worst case [such as Heapsort and Mergesort].... [W]ith Bentley-McIlroy 3-way partitioning Quicksort Is Optimal (C)"
MyCSResource.net: "The Quicksort algorithm is fastest when the median of the array is chosen as the pivot value.... In practice, the Quicksort algorithm becomes very slow when the array passed to it is already close to being sorted."
Nikos Drakos: "In practice, the fastest sorting algorithm is Quicksort"

Now do you believe me?

Quote:

And it's not in the standard library, qsort != QuickSort.

In fact, qsort is short for "quicksort." It's true that the standard doesn't say how qsort must sort the array, but most implementations I've seen use quicksort.

I would actually use std::sort in the C++ Standard Template Library, which the standard specifies the way it does (not necessarily a stable sort, average of O(N log N) comparisons, up to O(N²) comparisons in the worst case) precisely so that implementors can use quicksort. Some implementations use introsort instead, which is why I said, "quicksort or introsort."

Tycho · PostPosted:Fri Oct 12, 2007 9:51 pm

Quote:

The fastest sorting algorithm is MergeSort I believe, or HeapSort. *NOT* QuickSort.

I'm afraid that you're mistaken. Quicksort is faster than either on average, just not in the worst case. Please read the paper I linked, or just Google the algorithm.

Don't take my word for it. Here are what the first five non-Wikipedia hits have to say:
Dr. Hans Werner Lang: "Quicksort turns out to be the fastest sorting algorithm in practice."
John Morris: "Empirical studies show that generally quick sort is considerably faster than heapsort."
(The next link says nothing about performance.)
National Institute of Standards and Technology: "In practical situations, a finely tuned implementation of quicksort beats most sort algorithms, including sort algorithms whose theoretical complexity is O(n log n) in the worst case [such as Heapsort and Mergesort].... [W]ith Bentley-McIlroy 3-way partitioning Quicksort Is Optimal (C)"
MyCSResource.net: "The Quicksort algorithm is fastest when the median of the array is chosen as the pivot value.... In practice, the Quicksort algorithm becomes very slow when the array passed to it is already close to being sorted."
Nikos Drakos: "In practice, the fastest sorting algorithm is Quicksort"

Now do you believe me?

Quote:

And it's not in the standard library, qsort != QuickSort.

In fact, qsort is short for "quicksort." It's true that the standard doesn't say how qsort must sort the array, but most implementations I've seen use quicksort.

I would actually use std::sort in the C++ Standard Template Library, which the standard specifies the way it does (not necessarily a stable sort, average of O(N log N) comparisons, up to O(N²) comparisons in the worst case) precisely so that implementors can use quicksort. Some implementations use introsort instead, which is why I said, "quicksort or introsort."

I almost died laughing reading this, because you hit on something I expected someone to hit on eventually. Researching each of the people who cited QuickSort as basically the best algorithm ever, I've found that the individuals are all computer science majors/professors. NIST is the only one which cites it correctly, saying the theoretical complexity, speed and so forth in big O notation.

Here's the fun part: They're dead wrong. Why? O(n log n) may be true when n is 10, 1000, 10000. But when n is a billion, or a hundred billion, you start seeing that the timing is no longer what you expected. You will start getting 10n log n, 1000n log n, and so forth. Why is this? What are the computer science majors missing here?

They're not taking hardware limitations into account. Computer science works in the theoretical plane. Stuff that you have to run for months on end on a supercomputer or not even solve for years. Engineering is about making things work with what you have to meet a set of constraints, and deals in milliseconds or microseconds. In game programming, you have a CPU, a GPU, memory that has a certain latency, a disk with a certain seek time, etc. You need to accomplish so much every 60th of a second given those hardware constraints or else the game will suck.

Put another way, computer science is about exploring new ideas and not having to worry about the practical implementation, which is the engineering portion. Prime numbers for example. Computer scientists love to talk about ridiculously large prime numbers and all you can do with such things, like unbreakable encryption and such. Engineering is taking that abstract algorithm and making it work on, say, a 200MHz ARM processor running on your battery-powered MP3 player, where things like battery life and response time matter.

What I’ve found is that computer science majors write really bad code. Not code that doesn’t work, but code that’s slow. They don’t know how to optimize. They can’t tell you the difference between an AMD Athlon XP and an Intel Pentium 4. They can’t explain why the new Core 2 is such a good thing. In their little abstract world of trees and lists and Java, they don’t need to understand the low level hardware. Many of them can’t ever read an x86 disassembly or tell me the first thing about how many registers in a Pentium processor or what the registers are for.

Parallelization is the big thing right now. The Xbox 360, for example, has a 6-way processor. How does one write video games when you can have 6 concurrent pieces of code running at the same time? How does that change your rendering engine? Your game logic? Memory allocation? The Playstation 3 has nine hardware threads. This totally changes the way one writes video games from now on.

Computer Science has a long standing solution to concurrency - the concept of a lock. Which in Linux and Windows are synchronization objects known as locks, mutexes, semaphores, and other names. They’re used in every operating system and just about every shipping Windows and Linux application today. Even calling malloc() is a seralizing operation on the memory heap that causes a lock. Have multiple threads calling malloc() and they basically get to stand in line (a queue data structure) and execute serially. That’s not very parallel. So once again, Computer Science has given us something that doesn’t translate well into real-world performance.

But back to the challenge.

From my own testing, the fastest sort (tested on a data set of 100MB in RAM) has been MergeSort. QuickSort, HeapSort, and all the other sorts all finish in about 10-15 seconds. CombSort takes 20 seconds, and BubbleSort could take ages. MergeSort? 3 seconds. I couldn't believe the speed difference at first, so I verified that the data was actually sorted properly by doing a sequential string compare. It was perfectly sorted. I also did tests on presorted and reverse sorted data. It achieves almost the same time, plus or minus two seconds.

So you can read these notations that say things are O(n log n), O(n) or whatever, and you can listen to computer science majors babble endlessly, and believe this bullshit, or you can get out there and test things yourself. Be an engineer, don't be a theorist.

This post should serve as a fairly major hint as to what the solution I used is. I am interested to see what people come up with. There are a few other traps people have fallen into when writing their solutions here, but I'm hopeful that someone here will come up with the ideal solution.

Lorehead · **Joined:**Thu Aug 09, 2007 9:09 am **Posts:**26

Tycho, believe it or not, the statements about quicksort are based on empirical data. It is true that merge sort parallelizes particularly well, but the best external-sort routine I've seen for distributed systems is based on column sort. Most of the complexity deals with handling the slowest cases, an issue you don't even appear to be considering.

Also, just about all the pages mentioned the algorithm's asymptotic execution time at some point. I just didn't quote that part.

Quote:

O(n log n) may be true when n is 10, 1000, 10000. But when n is a billion, or a hundred billion, you start seeing that the timing is no longer what you expected. You will start getting 10n log n, 1000n log n, and so forth. Why is this? What are the computer science majors missing here?

By definition, high values of N make complexity bounds more important, not less. In this particular case, passing some particular value of N turns the problem into an external sort, which is a considerably different problem. Quicksort's locality of reference still gives it good performance in many configurations. Your optimal algorithm might well change with the next disk upgrade. I've explained how merge sort's main advantage is going to disappear when random disk accesses become as fast as sequential ones.

The computer science majors are trying to write code that will still work efficiently in ten years, or twenty, when all the hardware looks different and all the threshholds have changed. How's that optimized code for MS-DOS holding up? All that hand-tuned assembly language to write to the I/O ports directly and swap in overlays from disk really comes in handy, doesn't it? What's that? On modern machines, you have to run the program in an emulator and you've slowed the program down so much that it stutters? Those assumptions about the hardware you're running on are so completely outdated that it would be easier just to start over from scratch?

Quote:

They can’t tell you the difference between an AMD Athlon XP and an Intel Pentium 4. They can’t explain why the new Core 2 is such a good thing.

Any optimization you make specifically to speed up performance on the Core 2 processor will be counterproductive five years from now. If you're writing a console game or embedded app, designed to run only on identical platforms for the rest of its life cycle, you can make more assumptions. If real-time response is a priority, then you also need to make whatever trade-offs you have to today on each individual platform.

Quote:

Many of them can’t ever read an x86 disassembly or tell me the first thing about how many registers in a Pentium processor or what the registers are for.

All I can say is that this was certainly a required part of the curriculum my sophomore year. Any 32-bit assembly language you write will immediately break on a 64-bit operating system, though, or on any hardware other than a PC, or at the very least hold up the port until your replacement rips it out and replaces it with high-level code. Even if it doesn't, a completely different set of instructions will be optimal on the next CPU that comes down the pike.

Quote:

From my own testing, the fastest sort (tested on a data set of 100MB in RAM) has been MergeSort. QuickSort, HeapSort, and all the other sorts all finish in about 10-15 seconds. CombSort takes 20 seconds, and BubbleSort could take ages. MergeSort? 3 seconds.

What happens when you compare an external merge sort to my method for converting the external sort back into an internal one?

One data point, by itself, is meaningless, particularly since you're trying to determine the 'optimal' external sort by benchmarking an in-memory sort and your box probably doesn't look much like a server. In the first challenge, I easily got this much of a speedup over your entry just by compiling my program for a 64-bit processor. When I tried to do that for your program, it broke. There wasn't some minor fix for this. Your approach inherently could not scale up in any practical way because you hadn't thought about what would happen if you tried to run it on different input. Even on a 32-bit system, your program failed on valid input because you'd focused on optimizing the speed of one arbitrary benchmark on one box at the cost of correctness. I'm sorry to hear that you chose that approach deliberately. You did acknowledge that the abstract approach at least works.

Tycho · PostPosted:Sat Oct 13, 2007 12:18 pm

Your post is full of assumptions, omissions, etc. No worries. I'll sort you out.

Quote:

Tycho, believe it or not, the statements about quicksort are based on empirical data. It is true that merge sort parallelizes particularly well, but the best external-sort routine I've seen for distributed systems is based on column sort. Most of the complexity deals with handling the slowest cases, an issue you don't even appear to be considering.

I'm aware of that. Which is why I stressed not only highly randomized data, presorted and reverse sorted data. If there's a worst case, these three different data sets are going to nail it.

Quote:

Also, just about all the pages mentioned the algorithm's asymptotic execution time at some point. I just didn't quote that part.

My point wasn't that they didn't cite the O notation, but that they didn't use the word 'theoretically'. This is the problem with computer science majors. They think in terms of what would work best on ideal hardware where there are no latencies, and everything runs without delay.

Quote:

O(n log n) may be true when n is 10, 1000, 10000. But when n is a billion, or a hundred billion, you start seeing that the timing is no longer what you expected. You will start getting 10n log n, 1000n log n, and so forth. Why is this? What are the computer science majors missing here?

By definition, high values of N make complexity bounds more important, not less. In this particular case, passing some particular value of N turns the problem into an external sort, which is a considerably different problem. Quicksort's locality of reference still gives it good performance in many configurations. Your optimal algorithm might well change with the next disk upgrade. I've explained how merge sort's main advantage is going to disappear when random disk accesses become as fast as sequential ones.

Your first sentence shows you're not thinking about this practically. I can guarantee you that your "worst case" timing will get exponentially worse with a large enough dataset. You don't have to push the envelope very far to see this yourself. I've seen this firsthand.

Ironically, you bring up one of the very points that I find crucial, and you've taken the side that I do. What algorithm is ideal, and how will it be ideal 5 years from now? 10 years? 20? There's no way you can program for the future. Take the change from Pentium III to Pentium 4 for example. The Pentium 4 had design flaws (a topic for a later time) which made it both very difficult to write code for, and it broke compatibility with the tricks that people had used for years to optimize for the Pentium, Pentium II and Pentium III. Intel saw the flaws and treated them as though they were a feature of the new chip. They wrote up optimization docs that were carefully worded, trying to both work around the flaws and not make them readily apparent.

This is one of the many things I'm working to solve. In the past decade, we've seen no less than 8 versions of MMX (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE5). This is ridiculous. Intel and AMD have ridiculous fights which have no bearing on what's best for the future of the x86 architecture.

I think the solution is a hypervisor which runs at the lowest possible level, emulates x86 instructions from the numerous MMX versions, as well as any other extraneous instructions. This would be combined with a stripped x86 chip, containing perhaps only the instructions from a 586. This project is one I'm involved in right now, working with fellow programmer and computer engineer Darek Mihocka (http://www.emulators.com).

Quote:

The computer science majors are trying to write code that will still work efficiently in ten years, or twenty, when all the hardware looks different and all the threshholds have changed. How's that optimized code for MS-DOS holding up? All that hand-tuned assembly language to write to the I/O ports directly and swap in overlays from disk really comes in handy, doesn't it? What's that? On modern machines, you have to run the program in an emulator and you've slowed the program down so much that it stutters? Those assumptions about the hardware you're running on are so completely outdated that it would be easier just to start over from scratch?

Quote:

They can’t tell you the difference between an AMD Athlon XP and an Intel Pentium 4. They can’t explain why the new Core 2 is such a good thing.

Any optimization you make specifically to speed up performance on the Core 2 processor will be counterproductive five years from now. If you're writing a console game or embedded app, designed to run only on identical platforms for the rest of its life cycle, you can make more assumptions. If real-time response is a priority, then you also need to make whatever trade-offs you have to today on each individual platform.

You're partially correct. First of all, yes, computer science majors have written algorithms that will theoretically stand the test of time. I'm not calling for hand-tuned assembly, though. That was an interesting conclusion for you to make. I'm saying you need to optimize for the here and the now. There are many ways to do this in C and C++ without even writing a single line of assembly. Most of the time, it's learning how the compiler optimizes and tricking it into doing better than it already does.

But why I am telling you to optimize for current hardware? I very carefully named this challenge a computer engineering problem. It's not about trees, linked lists and Java. It's about fine-tuning something to work as fast as possible on modern hardware, considering hardware constraints and overcoming them. That's what engineering is.

The problem could just as easily have been stated in early 1980's terms of you have a floppy disk full of a few tens of thousands of integers and an Apple II with say 48K of RAM, sort the data on the floppy disk. The issue isn't how large the data set is, the issue is how you do a sort where the dataset is much larger than your "fast memory" and thus you have to write an algorithm that takes into account real world resource limitations. Real software solutions have to deal with such issues.

Quote:

Many of them can’t ever read an x86 disassembly or tell me the first thing about how many registers in a Pentium processor or what the registers are for.

All I can say is that this was certainly a required part of the curriculum my sophomore year. Any 32-bit assembly language you write will immediately break on a 64-bit operating system, though, or on any hardware other than a PC, or at the very least hold up the port until your replacement rips it out and replaces it with high-level code. Even if it doesn't, a completely different set of instructions will be optimal on the next CPU that comes down the pike.

Your argument seems to be repeating itself. Again, we're talking computer engineering.

Quote:

From my own testing, the fastest sort (tested on a data set of 100MB in RAM) has been MergeSort. QuickSort, HeapSort, and all the other sorts all finish in about 10-15 seconds. CombSort takes 20 seconds, and BubbleSort could take ages. MergeSort? 3 seconds.

What happens when you compare an external merge sort to my method for converting the external sort back into an internal one?

One data point, by itself, is meaningless, particularly since you're trying to determine the 'optimal' external sort by benchmarking an in-memory sort and your box probably doesn't look much like a server. In the first challenge, I easily got this much of a speedup over your entry just by compiling my program for a 64-bit processor. When I tried to do that for your program, it broke. There wasn't some minor fix for this. Your approach inherently could not scale up in any practical way because you hadn't thought about what would happen if you tried to run it on different input. Even on a 32-bit system, your program failed on valid input because you'd focused on optimizing the speed of one arbitrary benchmark on one box at the cost of correctness. I'm sorry to hear that you chose that approach deliberately. You did acknowledge that the abstract approach at least works.

One data point? I've tested on varying data sizes, something I agree I didn't mention here, but I did tell you that it was tested as 1) randomized, 2) sorted, 3) reverse-sorted. Again, that should find the worst case in ANY sort.

Which program are you referring to? I've compiled/run my primality test programs on 64-bit machines, and have had no problem getting it working. If you find the code, PM it to me, and I'll PM it back with full 64-bit support. But that's not a priority, because you have missed the point here. The primality test program was a very practical real-world test of optimization, just as this challenge is. The point is not whether it's cross-platform, object-oriented, and 'elegant'. It's how fast it works on the hardware you're given.

You seem to think that optimizing for specific hardware is a terrible idea. I don't understand why. It's easily adaptable to whatever hardware comes your way in the future. Computer science majors and C++ programmers are a dime a dozen. There's a job fair event that happened downtown here in Seattle just recently where thousands of C++ programmers were lining up with signs that said things like "HIRE ME, C++ PROGRAMMER". Why do you think they had to do this? Because the job market is flooded with them, and they're not valued. C++ programmers are very often hired for a single job and then laid off. Computer engineers are harder to find and do much better at optimization and software engineering in general.

You seem to also think that just because I say we should take a computer engineering approach, we should throw our algorithms out the window. Computer engineering isn't exclusively about how hardware works. It's a combination of electrical engineering and computer science. Understand how the hardware works, know what algorithms you can use, and optimize the algorithm to work best on given hardware.

Let's put this challenge into better perspective. Your boss has given you this task to sort 100GB of data. Now, the computer science approach may finish in 10-16 hours. This is fine if you are just doing this as an end-of-day thing where you can leave it running overnight and have the results in the morning. But what if you must have it sooner? This is where computer science becomes useless.

eddieringle · PostPosted:Mon Oct 22, 2007 6:10 am

It's the battle of the essays! :p

Two things:
1. I learned that you can sort stuffs with c++
2. I don't know how to do it.

See, even though I can't actually DO the challenges, I can still learn from them.

Gwanky · PostPosted:Wed Oct 24, 2007 10:37 am

Quote:

The fastest sorting algorithm is MergeSort I believe, or HeapSort. *NOT* QuickSort.

I'm afraid that you're mistaken. Quicksort is faster than either on average, just not in the worst case. Please read the paper I linked, or just Google the algorithm.

Don't take my word for it. Here are what the first five non-Wikipedia hits have to say:
Dr. Hans Werner Lang: "Quicksort turns out to be the fastest sorting algorithm in practice."
John Morris: "Empirical studies show that generally quick sort is considerably faster than heapsort."
(The next link says nothing about performance.)
National Institute of Standards and Technology: "In practical situations, a finely tuned implementation of quicksort beats most sort algorithms, including sort algorithms whose theoretical complexity is O(n log n) in the worst case [such as Heapsort and Mergesort].... [W]ith Bentley-McIlroy 3-way partitioning Quicksort Is Optimal (C)"
MyCSResource.net: "The Quicksort algorithm is fastest when the median of the array is chosen as the pivot value.... In practice, the Quicksort algorithm becomes very slow when the array passed to it is already close to being sorted."
Nikos Drakos: "In practice, the fastest sorting algorithm is Quicksort"

Now do you believe me?

Quote:

And it's not in the standard library, qsort != QuickSort.

In fact, qsort is short for "quicksort." It's true that the standard doesn't say how qsort must sort the array, but most implementations I've seen use quicksort.

I would actually use std::sort in the C++ Standard Template Library, which the standard specifies the way it does (not necessarily a stable sort, average of O(N log N) comparisons, up to O(N²) comparisons in the worst case) precisely so that implementors can use quicksort. Some implementations use introsort instead, which is why I said, "quicksort or introsort."

I almost died laughing reading this, because you hit on something I expected someone to hit on eventually. Researching each of the people who cited QuickSort as basically the best algorithm ever, I've found that the individuals are all computer science majors/professors. NIST is the only one which cites it correctly, saying the theoretical complexity, speed and so forth in big O notation.

Here's the fun part: They're dead wrong. Why? O(n log n) may be true when n is 10, 1000, 10000. But when n is a billion, or a hundred billion, you start seeing that the timing is no longer what you expected. You will start getting 10n log n, 1000n log n, and so forth. Why is this? What are the computer science majors missing here?

They're not taking hardware limitations into account. Computer science works in the theoretical plane. Stuff that you have to run for months on end on a supercomputer or not even solve for years. Engineering is about making things work with what you have to meet a set of constraints, and deals in milliseconds or microseconds. In game programming, you have a CPU, a GPU, memory that has a certain latency, a disk with a certain seek time, etc. You need to accomplish so much every 60th of a second given those hardware constraints or else the game will suck.

Put another way, computer science is about exploring new ideas and not having to worry about the practical implementation, which is the engineering portion. Prime numbers for example. Computer scientists love to talk about ridiculously large prime numbers and all you can do with such things, like unbreakable encryption and such. Engineering is taking that abstract algorithm and making it work on, say, a 200MHz ARM processor running on your battery-powered MP3 player, where things like battery life and response time matter.

What I’ve found is that computer science majors write really bad code. Not code that doesn’t work, but code that’s slow. They don’t know how to optimize. They can’t tell you the difference between an AMD Athlon XP and an Intel Pentium 4. They can’t explain why the new Core 2 is such a good thing. In their little abstract world of trees and lists and Java, they don’t need to understand the low level hardware. Many of them can’t ever read an x86 disassembly or tell me the first thing about how many registers in a Pentium processor or what the registers are for.

Parallelization is the big thing right now. The Xbox 360, for example, has a 6-way processor. How does one write video games when you can have 6 concurrent pieces of code running at the same time? How does that change your rendering engine? Your game logic? Memory allocation? The Playstation 3 has nine hardware threads. This totally changes the way one writes video games from now on.

Computer Science has a long standing solution to concurrency - the concept of a lock. Which in Linux and Windows are synchronization objects known as locks, mutexes, semaphores, and other names. They’re used in every operating system and just about every shipping Windows and Linux application today. Even calling malloc() is a seralizing operation on the memory heap that causes a lock. Have multiple threads calling malloc() and they basically get to stand in line (a queue data structure) and execute serially. That’s not very parallel. So once again, Computer Science has given us something that doesn’t translate well into real-world performance.

But back to the challenge.

From my own testing, the fastest sort (tested on a data set of 100MB in RAM) has been MergeSort. QuickSort, HeapSort, and all the other sorts all finish in about 10-15 seconds. CombSort takes 20 seconds, and BubbleSort could take ages. MergeSort? 3 seconds. I couldn't believe the speed difference at first, so I verified that the data was actually sorted properly by doing a sequential string compare. It was perfectly sorted. I also did tests on presorted and reverse sorted data. It achieves almost the same time, plus or minus two seconds.

So you can read these notations that say things are O(n log n), O(n) or whatever, and you can listen to computer science majors babble endlessly, and believe this bullshit, or you can get out there and test things yourself. Be an engineer, don't be a theorist.

This post should serve as a fairly major hint as to what the solution I used is. I am interested to see what people come up with. There are a few other traps people have fallen into when writing their solutions here, but I'm hopeful that someone here will come up with the ideal solution.

Somewhat off topic, but the advent of the multicore consoles is very good for PC gamers. The PS3 and Xbox 360 both use simple in order multicore CPUs, which is actually a rather poor decision. The inorder cores are much slower and less effecient than out of order cores however they have attempted to negate this problem by using multicore CPUs. The PS3 has assymetric CPUs making it's power very hard to use. The 360 has symmetric cores making its power more accsessible.

Back to my point, as almost all new PCs are dual core, this has been the case even towards the end of the PS2 and Xbox's dominance. Games that were ported from Xbox and Ps2 were either poorly optomized for dual core or that feature was simply ignored. Games that are ported from consoles now HAVE to be written for dual core from scratch, as they did not make the CPUs that much more powerful they just added more. Point is this era of consoles will actually make ports very optomized for PCs.

Tycho · PostPosted:Wed Oct 24, 2007 1:43 pm

Quote:

It's the battle of the essays! :p

Two things:
1. I learned that you can sort stuffs with c++
2. I don't know how to do it.

See, even though I can't actually DO the challenges, I can still learn from them.

That's why I put the challenges here. Even college friends of mine are learning from these.

Tycho · PostPosted:Wed Oct 24, 2007 2:11 pm

Quote:

Somewhat off topic, but the advent of the multicore consoles is very good for PC gamers. The PS3 and Xbox 360 both use simple in order multicore CPUs, which is actually a rather poor decision. The inorder cores are much slower and less effecient than out of order cores however they have attempted to negate this problem by using multicore CPUs. The PS3 has assymetric CPUs making it's power very hard to use. The 360 has symmetric cores making its power more accsessible.

Back to my point, as almost all new PCs are dual core, this has been the case even towards the end of the PS2 and Xbox's dominance. Games that were ported from Xbox and Ps2 were either poorly optomized for dual core or that feature was simply ignored. Games that are ported from consoles now HAVE to be written for dual core from scratch, as they did not make the CPUs that much more powerful they just added more. Point is this era of consoles will actually make ports very optomized for PCs.

Multicore isn't bad. I wouldn't argue that it is. What's bad is that programmers currently don't have the skill to utilize all the cores at once. If programmers had more resources at their disposal to learn how to take advantage of multiple cores, we'd all be better off. But unfortunately, certain things like transactional memory are in their infancy, which makes multicore programming very challenging.

I fully agree that multicore is the future, but I also think that programmers aren't adequately prepared to handle that!

Challenge Three: A Computer Engineering Problem

Who is online