Last visit was: It is currently Thu Mar 28, 2024 7:39 am


All times are UTC-05:00




Post new topic Reply to topic  [64 posts ] 
Author Message
 Post subject:
PostPosted:Tue Aug 28, 2007 4:14 am 
 

Joined:Thu Aug 09, 2007 9:09 am
Posts:26
Not sure whether the challenge is still open, but I've kept fiddling. This version posts a nice little speed-up:
Code:
#define NDEBUG 1 #include <algorithm> #include <bitset> #include <cassert> #include <climits> #include <cstdlib> #include <iostream> #include <iomanip> #include <vector> #include <math.h> using std::bitset; using std::clock; using std::cout; using std::endl; using std::ostream; using std::setw; using std::transform; using std::vector; /* This template contains a generic implementation of a N-queens algorithm. * The compiler should be able to optimize for speed (at the expense of code * size, which is far less important) based on the fact that N is a constant. * For efficiency, we can plug in hand-optimized implementations for any * value of N. * * The algorithm assumes that an unsigned long has at least N bits. Given * that it has at least 32 bits, and any computer capable of solving the * problem for N > 32 in any feasible amount of time will have at least a 64- * bit architecture, this seems safe. */ typedef unsigned long column_t; // A column_t is a bitfield with at most a single bit set, in the bit // position corresponding to the specified column. // This functor performs a bitwise AND on its two predicates. template <unsigned int N> struct bitwise_or { public: inline bitset<N> operator() ( const bitset<N> a, const bitset<N> b ); }; template <unsigned int N> inline bitset<N> bitwise_or<N>::operator() ( const bitset<N> a, const bitset<N> b ) { return bitset<N> ( a | b ); } template <unsigned int N> class board { /* This object represents a N * N chessboard. Internally, it maintains a * vector of bitfields, each of size N. A modern compiler should be able to * vectorize operations on these data. On most implementations, this should * fit each bitfield into the smallest appropriate integer type, permitting * a SIMD instruction such as POR. This is probably not the absolutely most * compact representation (e.g., the 5x5 board will probably need 5 bytes * instead of 4), but it should generate appropriately-aligned code for * speed. * * A 1 in the position representing any square means that a queen definitely * threatens that square. A 0 means that we do not know of any queen which * threatens that square. * * The bit positions are stored in row-major order, with the square on the * upper left represented by squares[0][0]. */ public: // We should not need to override the implicit member functions. void clear (void); // Wipes all queens off the board; void place_queen( unsigned row, column_t column_bit ); // Place a queen, updating the board as to which squares it attacks. column_t first_safe_space( unsigned row ) const; // Returns the first column in the given row not under attack, or // (1 << N) if no such column exists. column_t next_safe_space ( unsigned row, column_t column_bit ) const; /* Returns either the first column to the right of the specified column * that is not under attack, or (1 << N) if no such column exists. */ static void initialize_attack_matrix(void); static void destroy_attack_matrix(void); private: static vector< vector< bitset<N> > > attack_matrix; bitset<N> squares[N]; // Stores N * N bits representing the positions under attack on a N * N // chessboard. }; template <unsigned int N> vector< vector< bitset<N> > > board<N>::attack_matrix; /* Stores bitfields representing positions attacked by a queen in a given * square. The place_queen() member function fills the individual matrix * entries in as needed. * * A queen in row i, column j corresponds to the vector in * attack_matrix[ i*N + j ]. There are no entries for the last row, N-1, * because they should never be needed. Only the bitfields representing rows * i+1 through N-1 are stored. */ template <unsigned int N> void board<N>::clear( void ) // Wipes all queens off the board. { for ( size_t i = 0; i < N; ++i ) squares[i].reset(); } template <unsigned int N> void board<N>::place_queen( unsigned row, column_t column_bit ) // Place a queen on the board, updating all rows below row to show which // squares are under attack. { /* One annoyance of representing column by a bitfield is that, to convert it * back to an index, we use floating-point math. This is still a constant- * time operation on modern computers. Furthermore, finding the next column * to search is also a constant-time operation. Therefore, this method * beats linear search for large N. * * We aren't using large N, so whether this is really faster is doubtful. */ const unsigned column = static_cast<unsigned> ( ilogb( static_cast<double> (column_bit) ) ); const ptrdiff_t i = N * row + column; static const bitwise_or<N> or_functor = bitwise_or<N>(); // If the test run passes all these sanity checks, it should be safe to // recompile with NDEBUG, and run without them. assert( row >= 0 ); assert( row < N - 1 ); assert( column >= 0 ); assert( column < N ); #ifndef NDEBUG { const bitset<sizeof(unsigned long) * CHAR_BIT> temp(column_bit); assert ( 1 == temp.count() ); } #endif const ptrdiff_t squares_to_fill = N - 1 - row; vector< bitset<N> > attacks; attacks.reserve( static_cast<size_t> (squares_to_fill) ); // If the requested square is already in the database, don't enter it again. if ( attack_matrix[i].empty() ) { // Find the highest square below us that is in the matrix, then fill upwards // from there. ptrdiff_t j; unsigned long shift = 0; for ( j = i + N; j < N * N - N; j += N ) if ( ! attack_matrix[j].empty() ) { attacks = attack_matrix[j]; shift = attacks.size(); break; } // If we fell through the loop, we are on the final row. If we didn't, we // are on the last filled square. In either case, we want to move one square // up. j -= N; { /* Starting with the second-to-last row, fill in the attack matrix for the * square in that row directly beneath the one we want. */ for ( ; j >= i; j -= N ) { ++shift; const bitset<N> bottom_row = column_bit | column_bit << shift | column_bit >> shift; attacks.push_back(bottom_row); attack_matrix[j] = attacks; } // end for } // end if assert ( attacks.size() == (size_t) squares_to_fill ); } // end if ( attack_matrix[i].empty() ) else attacks = attack_matrix[i]; // Now that we have the attack matrix, AND it with the board. transform ( squares + 1 + row, squares + N, attacks.begin(), squares + 1 + row, or_functor ); } template <unsigned int N> column_t board<N>::first_safe_space( unsigned row ) const { const unsigned long to_return = squares[row].to_ulong(); return static_cast<column_t> ( ~to_return & (to_return + 1) ); } template <unsigned int N> column_t board<N>::next_safe_space( unsigned row, column_t column_bit ) const { unsigned long to_return = squares[row].to_ulong(); #ifndef NDEBUG { bitset<sizeof(column_t) * CHAR_BIT> temp(column_bit); assert ( 1 == temp.count() ); } #endif // Mask the columns up to column_bit. to_return |= column_bit; to_return |= column_bit - 1; return static_cast<column_t> ( ~to_return & (to_return + 1) ); } template <unsigned int N> void board<N>::initialize_attack_matrix (void) { attack_matrix.resize( (N - 1) * N ); } template <unsigned int N> void board<N>::destroy_attack_matrix (void) { attack_matrix.clear(); } #ifndef NDEBUG template <unsigned int N> void print_board ( const board<N> &x ) { for ( size_t i = 0; i < N; ++i ) { column_t t = x.first_safe_space(i); for ( column_t j = 1; j < 1 << N; j <<=1 ) { if ( t == j ) { cout << 'O'; t = x.next_safe_space(i, j); } else cout << 'X'; } cout << endl; } cout << endl; return ; } #endif template <unsigned int N> long solve_queens(void) { long solutions = 0; column_t j[N-1]; board<N> backtrack[N-1]; board<N>::initialize_attack_matrix(); /* Find the solutions where the queen on the first row is on the left half * of the board. For each such solution, there is a one-to-one corresp- * ondence with the solutions where that same queen is on the right half * of the board, and the solutions are reflections of each other. */ for ( j[0] = 1; j[0] < 1 << (N/2); j[0] <<= 1 ) { backtrack[0].clear(); backtrack[0].place_queen( 0, j[0] ); size_t current_row = 1; j[1] = backtrack[0].first_safe_space(1); while (true) { if ( 1 << N <= j[current_row] ) { // There are no (more) possible matches on this row. if ( 1 == current_row ) break; --current_row; j[current_row] = backtrack[current_row-1].next_safe_space( current_row, j[current_row] ); } else if ( N - 2 == current_row ) { // Descend to the final level. // // Any open spaces we find here (there should be one at most) is a // solution. Furthermore, it has a symmetric solution. backtrack[N-2] = backtrack[N-3]; backtrack[N-2].place_queen( N-2, j[N-2] ); if ( backtrack[N-2].first_safe_space(N-1) < 1 << N ) solutions += 2; j[N-2] = backtrack[N-3].next_safe_space( N-2, j[N-2] ); } else { // Descend. backtrack[current_row] = backtrack[current_row - 1]; backtrack[current_row].place_queen(current_row, j[current_row] ); ++current_row; j[current_row] = backtrack[current_row-1].first_safe_space(current_row); } // end if } // end while } // end for // If N is odd, we also need to check the middle column. The two halves of // the board on either side are symmetric, so we need only check one. if ( 1 == N % 2 ) { #ifndef NDEBUG j[0] = 1 << (N/2); #endif backtrack[0].clear(); backtrack[0].place_queen( 0, 1 << (N/2) ); // The second queen cannot be in the position diagonally below the first. for ( j[1] = 1; j[1] < 1 << (N/2 - 1); j[1] <<= 1 ) { backtrack[1] = backtrack[0]; backtrack[1].place_queen( 1, j[1] ); size_t current_row = 2; j[2] = backtrack[1].first_safe_space(2); while (true) { if ( 1 << N <= j[current_row] ) { // There are no (more) possible matches on this row. if ( 2 == current_row ) break; --current_row; j[current_row] = backtrack[current_row-1].next_safe_space( current_row, j[current_row] ); } else if ( N - 2 == current_row ) { // Descend to the final level. // // Any open spaces we find here (there should be one at most) is a // solution. Furthermore, it has a symmetric solution. backtrack[N-2] = backtrack[N-3]; backtrack[N-2].place_queen( N-2, j[N-2] ); if ( backtrack[N-2].first_safe_space(N-1) < 1 << N ) solutions += 2; j[N-2] = backtrack[N-3].next_safe_space( N-2, j[N-2] ); } else { // Descend. backtrack[current_row] = backtrack[current_row - 1]; backtrack[current_row].place_queen(current_row, j[current_row] ); ++current_row; j[current_row] = backtrack[current_row-1].first_safe_space(current_row); } // end if } // end while } // end for } // end if board<N>::destroy_attack_matrix(); return solutions; } // For the degenerate case N == 1, we need a trivial handler: template <> long solve_queens<1> (void) { return 1; } long queens( unsigned n ) { switch (n) { case 1: return solve_queens<1> (); break; case 2: return solve_queens<2> (); break; case 3: return solve_queens<3> (); break; case 4: return solve_queens<4> (); break; case 5: return solve_queens<5> (); break; case 6: return solve_queens<6> (); break; case 7: return solve_queens<7> (); break; case 8: return solve_queens<8> (); break; case 9: return solve_queens<9> (); break; case 10: return solve_queens<10>(); break; case 11: return solve_queens<11>(); break; case 12: return solve_queens<12>(); break; case 13: return solve_queens<13>(); break; case 14: return solve_queens<14>(); break; case 15: return solve_queens<15>(); break; case 16: return solve_queens<16>(); break; case 17: return solve_queens<17>(); break; default: return -1; // If you're getting -1, expand the switch block. } } int main(void) { static const unsigned start = 1; // static const unsigned finish = 17; static const unsigned finish = 14; clock_t begin, end; clock_t total = 0; unsigned i; for ( i = start; i <= finish; ++i ) { long answer; begin = clock(); answer = queens(i); end = clock(); total += end - begin; cout << setw(8); cout << static_cast<double>( 1000 * (end - begin) ) / CLOCKS_PER_SEC; cout << " ms # of solutions for "; cout << setw(2) << i << " x "; cout << setw(2) << i << " = "; cout << answer << endl; } cout << "Total time: "; cout << static_cast<double>( 1000 * total ) / CLOCKS_PER_SEC; cout << " ms." << endl; return EXIT_SUCCESS; }
The major difference is that it uses arrays rather than vectors within the board objects that form the arrays for backtracking.

Here are my results. I don't suggest going as high as 17 unless you have a lot of time on your hands.
Code:
0 ms # of solutions for 1 x 1 = 1 0 ms # of solutions for 2 x 2 = 0 0 ms # of solutions for 3 x 3 = 0 0 ms # of solutions for 4 x 4 = 2 0 ms # of solutions for 5 x 5 = 10 0 ms # of solutions for 6 x 6 = 4 0 ms # of solutions for 7 x 7 = 40 0 ms # of solutions for 8 x 8 = 92 0 ms # of solutions for 9 x 9 = 352 0 ms # of solutions for 10 x 10 = 724 10 ms # of solutions for 11 x 11 = 2680 60 ms # of solutions for 12 x 12 = 14200 330 ms # of solutions for 13 x 13 = 73712 1990 ms # of solutions for 14 x 14 = 365596 12380 ms # of solutions for 15 x 15 = 2279184 87110 ms # of solutions for 16 x 16 = 14772512 631960 ms # of solutions for 17 x 17 = 95815104 Total time: 733840 ms.


Top
Offline  
 Post subject:
PostPosted:Tue Aug 28, 2007 4:18 am 
 

Joined:Thu Aug 09, 2007 9:09 am
Posts:26
For reference, here are my results with Frenchfrog's benchmark code:
Code:
0 # of solutions for 2 x 2 = 0 0 # of solutions for 3 x 3 = 0 0 # of solutions for 4 x 4 = 2 0 # of solutions for 5 x 5 = 10 0 # of solutions for 6 x 6 = 4 0 # of solutions for 7 x 7 = 40 0 # of solutions for 8 x 8 = 92 0 # of solutions for 9 x 9 = 352 0 # of solutions for 10 x 10 = 724 10 # of solutions for 11 x 11 = 2680 50 # of solutions for 12 x 12 = 14200 280 # of solutions for 13 x 13 = 73712 1650 # of solutions for 14 x 14 = 365596 11770 # of solutions for 15 x 15 = 2279184 Press a key


Top
Offline  
 Post subject:
PostPosted:Tue Aug 28, 2007 8:26 am 
Sagely Amphibian
User avatar
 

Joined:Sun Jun 18, 2006 3:06 pm
Posts:69
I like your use of templates to put N as a literal value.


Top
Offline  
 Post subject:
PostPosted:Tue Aug 28, 2007 9:48 am 
Sagely Amphibian
User avatar
 

Joined:Sun Jun 18, 2006 3:06 pm
Posts:69
Just for fun, 33% faster code.
Code:
#include <stdio.h> #include <stdlib.h> #include <time.h> #include <limits.h> const int n = 15; unsigned long nQueens2Medium( int n, int x0 ) { int row[sizeof(unsigned long) * 8]; unsigned long rowMask[sizeof(unsigned long) * 8]; unsigned long rowMaskRight[sizeof(unsigned long) * 8]; unsigned long rowMaskLeft[sizeof(unsigned long) * 8]; unsigned long nbSol = 0; int y = 2; int xstart = 0; row[0] = x0; unsigned long maskCol = 1UL << x0; rowMaskLeft[1] = ( 1UL << ( x0 - 1 ) ) & ( ULONG_MAX >> 1UL ); rowMaskRight[1] = ( 1UL << ( x0 + 1 ) ) & ( ULONG_MAX - 1UL ); rowMask[1] = rowMaskLeft[1] | rowMaskRight[1] | maskCol; while ( y >= 2 ) { for ( int x = xstart; x < n; x++ ) { if ( !( rowMask[y - 1] & ( 1UL << ( row[y - 1] = x ) ) ) ) { if ( y < n ) { maskCol |= 1UL << x; rowMaskLeft[y] = ( rowMaskLeft[y - 1] >> 1UL ) | ( 1UL << ( row[y - 1] - 1 ) ); rowMaskLeft[y] &= ( ULONG_MAX >> 1UL ); rowMaskRight[y] = ( rowMaskRight[y - 1] << 1UL ) | ( 1UL << ( row[y - 1] + 1 ) ); rowMaskRight[y] &= ( ULONG_MAX - 1UL ); rowMask[y] = rowMaskLeft[y] | rowMaskRight[y] | maskCol; y++; x = -1; // to start the next for at 0 } else { nbSol++; break; } } } y--; xstart = row[y - 1] + 1; maskCol ^= 1UL << row[y - 1]; } return nbSol; } inline bool nQueens2LargeSafe( int *row, int x, int y ) { for ( int i = 1; i <= y; i++ ) { int rowVal = row[y - i]; if ( rowVal == x || rowVal == x - i || rowVal == x + i ) return false; } return true; } unsigned long nQueens2Large( int n, int x0 ) { int *row = new int[n]; unsigned long nbSol = 0; int y = 2; int xstart = 0; row[0] = x0; while ( y >= 2 ) { for ( int x = xstart; x < n; x++ ) { if ( nQueens2LargeSafe( row, row[y - 1] = x, y - 1 ) ) { if ( y < n ) { y++; x = -1; // to start the next for at 0 } else { nbSol++; break; } } } y--; xstart = row[y - 1] + 1; } delete [] row; return nbSol; } unsigned long nQueens( int n ) { if ( n == 1 ) return 1; unsigned long nbSolLeft = 0, nbSolMid = 0; int limit = ( n - 1 ) >> 1; for ( int x = 0; x <= limit; x++ ) { unsigned long nbSolTemp; if ( sizeof(unsigned long) * 8 >= n ) nbSolTemp = nQueens2Medium( n, x ); else nbSolTemp = nQueens2Large( n, x ); if ( x == limit && ( n & 1 ) ) nbSolMid += nbSolTemp; else nbSolLeft += nbSolTemp; } return ( nbSolLeft << 1 ) + nbSolMid; } int main() { for ( int i = 1; i <= n; i++ ) { clock_t init = clock(); unsigned long nbSol = nQueens( i ); clock_t final = clock(); printf( "%10u # of solutions for %d x %d = %u\n", (unsigned int)( (final - init) * 1000 / CLOCKS_PER_SEC ), i, i, nbSol ); } return 0; }
Code:
0 # of solutions for 1 x 1 = 1 0 # of solutions for 2 x 2 = 0 0 # of solutions for 3 x 3 = 0 0 # of solutions for 4 x 4 = 2 0 # of solutions for 5 x 5 = 10 0 # of solutions for 6 x 6 = 4 0 # of solutions for 7 x 7 = 40 0 # of solutions for 8 x 8 = 92 0 # of solutions for 9 x 9 = 352 0 # of solutions for 10 x 10 = 724 0 # of solutions for 11 x 11 = 2680 16 # of solutions for 12 x 12 = 14200 109 # of solutions for 13 x 13 = 73712 609 # of solutions for 14 x 14 = 365596 4188 # of solutions for 15 x 15 = 2279184 26672 # of solutions for 16 x 16 = 14772512 205250 # of solutions for 17 x 17 = 95815104


Top
Offline  
 Post subject:
PostPosted:Tue Aug 28, 2007 8:09 pm 
 

Joined:Sun Aug 05, 2007 11:51 pm
Posts:25
Location:Orange County, California
My solution can only find distinct solutions, and it's pretty slow. I hope to make up some of the points on style and so forth, though.

The algorithm creates an array of unsigned longs. The index of each one is the Y value, and the value in each one is the X value. The recursive placement function makes sure that no queens are placed on the same column or row, which leaves only diagonal testing to be done by the validation function.
Code:
unsigned long solutions, size; unsigned long *rows; // Checks solutions for diagonal collisions bool validate() { for (unsigned int i = 0; i < size; i++) { for (unsigned int j = i + 1; j < size; j++) { // Basic X-Y diagonal stuff if (i + rows[i] == j + rows[j] || i - rows[i] == j - rows[j]) { return false; } } } return true; } // Recursively places X-values into the appropriate Y-valued row void position(unsigned long qidx) { for (unsigned long i = 0; i < size; i++) { rows[qidx] = i; for (unsigned long j = 0; j < qidx; j++) { // Eliminate coinciding X-values if (rows[j] == i) { // I need to 'continue', but from the outer loop, not this one goto skip; } } if (qidx < size - 1) { // Recurse if not at the last recursion position(qidx + 1); } else { // Validate if at the last recursion if (validate()) { solutions++; } } // Quasi-continue label skip:; } } // This is what gets called externally unsigned long nqueens(unsigned long _size) { solutions = 0; size = _size; // Create an array of precisely the right size so as to not waste memory rows = new unsigned long[size]; // Call the first iteration of the recursive positioning function position(0); // No memory leaks! delete rows; return solutions; }
At least my code is relatively short :)


Top
Offline  
 Post subject:
PostPosted:Tue Aug 28, 2007 8:27 pm 
 

Joined:Sun Aug 05, 2007 11:51 pm
Posts:25
Location:Orange County, California
After 20 minutes of tweaking, I made it a ton faster by eliminating the validation function and essentially inlining the diagonal validation into the positioning loop. Not only does it appear to be faster than Tycho's algorithm, it's also shorter than the previous version of my algorithm.
Code:
unsigned long solutions, size; unsigned long *rows; // Recursively places X-values into the appropriate Y-valued row void position(unsigned long qidx) { for (unsigned long i = 0; i < size; i++) { rows[qidx] = i; for (unsigned long j = 0; j < qidx; j++) { // Eliminate coinciding X-values by column and diagonal tests if (rows[j] == i || qidx + rows[qidx] == j + rows[j] || qidx - rows[qidx] == j - rows[j]) { // Excuse for goto: I need to 'continue', but from the outer loop, not this one goto skip; } } if (qidx < size - 1) { // Recurse if not at the last recursion position(qidx + 1); } else { // If it got here, it's a solution solutions++; } // Quasi-continue label skip:; } } // This is what gets called externally unsigned long nqueens(unsigned long _size) { solutions = 0; size = _size; // Create an array of precisely the right size so as to not waste memory rows = new unsigned long[size]; // Call the first iteration of the recursive positioning function position(0); // No memory leaks! delete [] rows; return solutions; }
EDIT: Fixed a small memory leak issue (i.e. I was an idiot). Should have no effect on performance or result.


Last edited by zig on Sun Sep 02, 2007 6:30 pm, edited 1 time in total.

Top
Offline  
 Post subject:
PostPosted:Tue Aug 28, 2007 9:48 pm 
 

Joined:Thu Aug 09, 2007 9:09 am
Posts:26
Nice work, Frenchfrog! After some tweaking, I found that g++ generates better code when I internally represent the rows of the board with the bits wrapped in a hand-rolled structure, compared to using std::bitset, or just integers (either of the machine's native size or the minimal size). I'm not sure why. With this optimization, the code catches up to your previous submission when N = 17.

It's still catching up to your optimized program (yours is still 37% faster for N = 17, but it was 47% faster for N = 14 and has been declining since then). Computing N = 18 takes hours, and N = 19 days, so I'm not sure where the catch-up point is now.

Edit: Never mind! I found the problem that was slowing the program down so much! It keeps a lookup table of the lower-row positions
any queen attacks. Because of a logic error, it was copying the entries to a dynamically-allocated list in linear time. Now that it performs lookups in constant time, as it should, it's much faster.

Edit: Tested with Microsoft Visual C++, which is eight years behind the language standard. If you want to write Microsoft about this, be sure to mention: don't worry about the Y2K bug; sell stocks and buy real estate; and no matter how much you may dislike Bill Clinton, do not under any circumstances vote for George H. W. Bush's son.

This latest version compiles and runs correctly under MSVC in Win32 32-bit mode, much more slowly than under Linux x86_64. It also fixes a couple bugs that happened not to affect g++.

Edit: Tweaked the algorithm to eliminate the middle-column test for odd values of N. This cut the size of solve_queens() in half.

Edit: Isolated the parts of the code that use C99 extensions, and added feature test macros to enable or disable them.

Edit: Some final clean-up, to remove outdated comments and redundant code blocks.

Edit: Did I say, "final?" I meant, of course, "Microsoft's compiler crashes on preprocessor warnings. Adding insult to injury, it doesn't even print the warning."

Here are my current tweaks:
Code:
#define NDEBUG 1 #define _XOPEN_SOURCE 600 #include <algorithm> #include <cassert> #include <climits> #include <cstdlib> #include <ctime> #include <iostream> #include <iomanip> #include <vector> #if ( __GNUC__ >= 4 ) || ( _STDC_VERSION__ >= 199901L ) #define USE_C99_EXTENSIONS 1 #endif #ifdef USE_C99_EXTENSIONS #include <stdint.h> #include <math.h> #else #include <cmath> typedef unsigned long uint_fast32_t; typedef unsigned char uint8_t; typedef unsigned short uint16_t; typedef unsigned long uint32_t; using std::frexp; inline int ilogb ( double x ) { int result; frexp( x, & result ); return --result; } #endif using std::clock; using std::cout; using std::endl; using std::ostream; using std::setw; using std::transform; using std::vector; typedef uint_fast32_t uint_fast; /* This template contains a generic implementation of a N-queens algorithm. * The compiler should be able to optimize for speed (at the expense of code * size, which is far less important) based on the fact that N is a constant. * For efficiency, we can plug in hand-optimized implementations for any * value of N. * * The algorithm assumes that an unsigned long has at least N bits. Given * that it has at least 32 bits, and any computer capable of solving the * problem for N > 32 in any feasible amount of time will have at least a 64- * bit architecture, this seems safe. * * The largest problem with this program was its choice of data types; the * current solution is to make bitfield a template parameter. It should be * an unsigned integral type, although you might see whether unsigned long * is faster on your architecture. */ template < typename T > class bitwise_or { // A binary functor, intended for bitfields. public: inline T operator() ( const T& a, const T& b ) { return (a | b); } }; template < unsigned int N, typename bitfield > class board { /* This object represents a N * N chessboard. Internally, it maintains a * vector of bitfields, each of size N. A modern compiler should be able to * vectorize operations on these data. On most implementations, this should * fit each bitfield into the smallest appropriate integer type, permitting * a SIMD instruction such as POR. This is probably not the absolutely most * compact representation (e.g., the 5x5 board will probably need 5 bytes * instead of 4), but it should generate appropriately-aligned code for * speed. * * A 1 in the position representing any square means that a queen definitely * threatens that square. A 0 means that we do not know of any queen which * threatens that square. * * The bit positions are stored in row-major order, with the square on the * upper left represented by squares[0][0]. */ public: class internal { public: inline internal(void) : bits(0U) {}; inline internal( bitfield x ) : bits(x) {}; inline operator uint_fast(void) const { return (uint_fast)bits; } inline const internal operator| ( const internal& x ) const { return internal ( bitfield (bits | x.bits) ); } private: bitfield bits; friend class board; }; typedef vector< vector< internal > > attack_matrix_t; // We should not need to override the implicit member functions. void clear(void); // Wipes all queens off the board; void place_queen( uint_fast row, uint_fast column_bit ); // Place a queen, updating the board as to which squares it attacks. uint_fast first_safe_space( uint_fast row ) const; // Returns the first column in the given row not under attack, or // a number out of range if no such column exists. uint_fast next_safe_space ( uint_fast row, uint_fast column_bit ) const; /* Returns either the first column to the right of the specified column * that is not under attack, or a number out of range if no such column * exists. */ static void initialize_attack_matrix(void); static void destroy_attack_matrix(void); private: static attack_matrix_t attack_matrix; internal squares[N]; // Stores N * N bits representing the positions under attack on a N * N // chessboard. }; template < unsigned int N, typename bitfield > typename board<N, bitfield>::attack_matrix_t board< N, bitfield >::attack_matrix; /* Stores bitfields representing positions attacked by a queen in a given * square. The place_queen() member function fills the individual matrix * entries in as needed. * * A queen in row i, column j corresponds to the vector in * attack_matrix[ i*N + j ]. There are no entries for the last row, N-1, * because they should never be needed. Only the bitfields representing rows * i+1 through N-1 are stored. */ template < unsigned int N, typename bitfield > void board< N, bitfield >::clear( void ) // Wipes all queens off the board. { for ( size_t i = 0; i < N; ++i ) squares[i].bits = 0; } template < unsigned int N, typename bitfield > void board< N, bitfield >::place_queen( uint_fast row, uint_fast column_bit ) // Place a queen on the board, updating all rows below row to show which // squares are under attack. { /* One annoyance of representing column by a bitfield is that, to convert it * back to an index, we use floating-point math. This is still a constant- * time operation on modern computers. Furthermore, finding the next column * to search is also a constant-time operation. Therefore, this method * beats linear search for large N. * * We aren't using large N, so whether this is really faster is doubtful. */ const uint_fast column = static_cast<uint_fast> ( ilogb( static_cast<double> (column_bit) ) ); const ptrdiff_t i = N * row + column; static const bitwise_or<internal> or_functor = bitwise_or<internal>(); // If the test run passes all these sanity checks, it should be safe to // recompile with NDEBUG, and run without them. assert( row >= 0 ); assert( row < N - 1 ); assert( column >= 0 ); assert( column < N ); // If the requested square is already in the database, don't enter it again. if ( attack_matrix[i].empty() ) { vector< internal > attacks; attacks.reserve( static_cast<size_t>(N - 1 - row) ); // Find the highest square below us that is in the matrix, then fill upwards // from there. ptrdiff_t j; uint_fast shift = 0; for ( j = i + N; j < N * N - N; j += N ) if ( ! attack_matrix[j].empty() ) { attacks = attack_matrix[j]; shift = static_cast<uint_fast> (attacks.size()); break; } // If we fell through the loop, we are on the final row. If we didn't, we // are on the last filled square. In either case, we want to move one square // up. /* Starting with the second-to-last row, fill in the attack matrix for the * square in that row directly beneath the one we want. */ for ( j -= N; j >= i; j -= N ) { ++shift; const uint_fast bottom_row ( column_bit | column_bit << shift | column_bit >> shift ); attacks.push_back( internal(static_cast<bitfield>(bottom_row) ) ); attack_matrix[j] = attacks; } // Now that we have the attack matrix, AND it with the board. transform ( (&squares[0]) + 1 + row, (&squares[0]) + N, &attacks[0], (&squares[0]) + 1 + row, or_functor ); assert ( attacks.size() == N - 1 - row; ); } // end if ( attack_matrix[i].empty() ) else { transform ( (&squares[0]) + 1 + row, (&squares[0]) + N, &(attack_matrix[i][0]), (&squares[0]) + 1 + row, or_functor ); } } template < unsigned int N, typename bitfield > uint_fast board< N, bitfield >::first_safe_space( uint_fast row ) const { const uint_fast to_return = squares[row]; return ( ~to_return & (to_return + 1) ); } template < unsigned int N, typename bitfield > uint_fast board< N, bitfield >::next_safe_space( uint_fast row, uint_fast column_bit ) const { uint_fast to_return = squares[row]; // Mask the columns up to column_bit. to_return |= column_bit; to_return |= column_bit - 1; return ( ~to_return & (to_return + 1) ); } template < unsigned int N, typename bitfield > void board< N, bitfield >::initialize_attack_matrix (void) { attack_matrix.resize( (N - 1) * N ); } template < unsigned int N, typename bitfield > void board< N, bitfield >::destroy_attack_matrix (void) { attack_matrix.clear(); } template < unsigned int N, typename bitfield > long solve_queens(void) { long solutions = 0; uint_fast j[N-1]; board< N, bitfield > backtrack[N-1]; board< N, bitfield >::initialize_attack_matrix(); /* Find the solutions where the queen on the first row is on the left half * of the board. For each such solution, there is a one-to-one corresp- * ondence with the solutions where that same queen is on the right half * of the board, and the solutions are reflections of each other. * * If N is odd, any solution with the last queen in the center column will * have at least three counterparts: its horizontal reflection, its vertical * reflection, and its horizontal and vertical reflection. Any solution with * the first queen in that column would be one of the last two. Therefore, * we can eliminate the loop to test the center column. */ for ( j[0] = 1; j[0] < 1 << (N >> 1); j[0] <<= 1 ) { backtrack[0].clear(); backtrack[0].place_queen( 0, j[0] ); uint_fast current_row = 1; j[1] = backtrack[0].first_safe_space(1); while (true) { if ( 1 << N <= j[current_row] ) { // There are no (more) possible matches on this row. if ( 1 == current_row ) break; --current_row; j[current_row] = backtrack[current_row-1].next_safe_space( current_row, j[current_row] ); } else if ( N - 2 == current_row ) { // Descend to the final level. // // Any open spaces we find here (there should be one at most) is a // solution. Furthermore, it has a symmetric solution. backtrack[N-2] = backtrack[N-3]; backtrack[N-2].place_queen( N-2, j[N-2] ); const uint_fast k = backtrack[N-2].first_safe_space(N-1); if ( k < 1 << N ) { if ( N % 2 == 1 && k == 1 << (N >> 1) ) // Our last queen is in the center column. solutions += 4; else solutions += 2; } j[N-2] = backtrack[N-3].next_safe_space( N-2, j[N-2] ); } else { // Descend. backtrack[current_row] = backtrack[current_row - 1]; backtrack[current_row].place_queen(current_row, j[current_row] ); ++current_row; j[current_row] = backtrack[current_row-1].first_safe_space(current_row); } // end if } // end while } // end for board< N, bitfield >::destroy_attack_matrix(); return solutions; } // For the degenerate case N <= 2, we need a trivial handler: template < > long solve_queens< 1, uint8_t > (void) { return 1; } template < > long solve_queens< 2, uint8_t > (void) { return 0; } long queens( unsigned n ) { switch (n) { case 1: return solve_queens< 1, uint8_t > (); break; case 2: return solve_queens< 2, uint8_t > (); break; case 3: return solve_queens< 3, uint8_t > (); break; case 4: return solve_queens< 4, uint8_t > (); break; case 5: return solve_queens< 5, uint8_t > (); break; case 6: return solve_queens< 6, uint8_t > (); break; case 7: return solve_queens< 7, uint8_t > (); break; case 8: return solve_queens< 8, uint8_t > (); break; case 9: return solve_queens< 9, uint16_t > (); break; case 10: return solve_queens< 10, uint16_t > (); break; case 11: return solve_queens< 11, uint16_t > (); break; case 12: return solve_queens< 12, uint16_t > (); break; case 13: return solve_queens< 13, uint16_t > (); break; case 14: return solve_queens< 14, uint16_t > (); break; case 15: return solve_queens< 15, uint16_t > (); break; case 16: return solve_queens< 16, uint16_t > (); break; case 17: return solve_queens< 17, uint32_t > (); break; case 18: return solve_queens< 18, uint32_t > (); break; default: return -1; // If you're getting -1, expand the switch block. } } int main(void) { static const unsigned start = 1; // static const unsigned finish = 14; static const unsigned finish = 17; clock_t begin, end; clock_t total = 0; unsigned i; for ( i = start; i <= finish; ++i ) { long answer; begin = clock(); answer = queens(i); end = clock(); total += end - begin; cout << setw(8); cout << static_cast<double>( 1000 * (end - begin) ) / CLOCKS_PER_SEC; cout << " ms # of solutions for "; cout << setw(2) << i << " x "; cout << setw(2) << i << " = "; cout << answer << endl; } cout << "Total time: "; cout << static_cast<double>( 1000 * total ) / CLOCKS_PER_SEC; cout << " ms." << endl; return EXIT_SUCCESS; }


Last edited by Lorehead on Wed Aug 29, 2007 5:04 am, edited 10 times in total.

Top
Offline  
 Post subject:
PostPosted:Tue Aug 28, 2007 9:58 pm 
 

Joined:Thu Aug 09, 2007 9:09 am
Posts:26
For fun, here is the fastest program I've ever written to print the solutions to the N-queens puzzle up to N = 25:
Code:
#include <stdio.h> #include <stdlib.h> int main (void) { puts("1\n0\n0\n2\n10\n4\n40\n92\n352\n724\n2680\n14200\n73712\n" "365596\n2279184\n14772512\n95815104\n666090624\n" "4968057848\n39029188884\n314666222712\n" "2691008701644\n24233937684440\n" "227514171973736\n2207893435808352\n" ); return EXIT_SUCCESS; }


Top
Offline  
 Post subject:
PostPosted:Tue Aug 28, 2007 10:45 pm 
 

Joined:Thu Aug 09, 2007 9:09 am
Posts:26
The results of my latest benchmark:
Code:
0 ms # of solutions for 1 x 1 = 1 0 ms # of solutions for 2 x 2 = 0 0 ms # of solutions for 3 x 3 = 0 0 ms # of solutions for 4 x 4 = 2 0 ms # of solutions for 5 x 5 = 10 0 ms # of solutions for 6 x 6 = 4 0 ms # of solutions for 7 x 7 = 40 0 ms # of solutions for 8 x 8 = 92 0 ms # of solutions for 9 x 9 = 352 0 ms # of solutions for 10 x 10 = 724 0 ms # of solutions for 11 x 11 = 2680 30 ms # of solutions for 12 x 12 = 14200 120 ms # of solutions for 13 x 13 = 73712 770 ms # of solutions for 14 x 14 = 365596 4530 ms # of solutions for 15 x 15 = 2279184 30110 ms # of solutions for 16 x 16 = 14772512 218830 ms # of solutions for 17 x 17 = 95815104 Total time: 254390 ms.
Edit: Updated to the current version of the program, posted above. (My real program, of course, not the joke.)

And with Frenchfrog's latest program:
Code:
0 # of solutions for 1 x 1 = 1 0 # of solutions for 2 x 2 = 0 0 # of solutions for 3 x 3 = 0 0 # of solutions for 4 x 4 = 2 0 # of solutions for 5 x 5 = 10 0 # of solutions for 6 x 6 = 4 0 # of solutions for 7 x 7 = 40 0 # of solutions for 8 x 8 = 92 0 # of solutions for 9 x 9 = 352 0 # of solutions for 10 x 10 = 724 0 # of solutions for 11 x 11 = 2680 30 # of solutions for 12 x 12 = 14200 180 # of solutions for 13 x 13 = 73712 1020 # of solutions for 14 x 14 = 365596 7200 # of solutions for 15 x 15 = 2279184 47040 # of solutions for 16 x 16 = 14772512 369090 # of solutions for 17 x 17 = 95815104
And here's Zig's. I re-used the timing loop from my program to call his. I'm afraid I didn't wait for it to compute the solution for N = 17.
Code:
0 ms # of solutions for 1 x 1 = 1 0 ms # of solutions for 2 x 2 = 0 0 ms # of solutions for 3 x 3 = 0 0 ms # of solutions for 4 x 4 = 2 0 ms # of solutions for 5 x 5 = 10 0 ms # of solutions for 6 x 6 = 4 0 ms # of solutions for 7 x 7 = 40 0 ms # of solutions for 8 x 8 = 92 0 ms # of solutions for 9 x 9 = 352 10 ms # of solutions for 10 x 10 = 724 40 ms # of solutions for 11 x 11 = 2680 250 ms # of solutions for 12 x 12 = 14200 1540 ms # of solutions for 13 x 13 = 73712 10210 ms # of solutions for 14 x 14 = 365596 70820 ms # of solutions for 15 x 15 = 2279184 518730 ms # of solutions for 16 x 16 = 14772512
Edit: Last time you tried replicating my benchmarks, Frenchfrog, we were surprised by how much slower my program ran on your machine. Here is what I get when I reboot the same computer into 32-bit Windows XP and run the same benchmark:
Code:
0 ms # of solutions for 1 x 1 = 1 0 ms # of solutions for 2 x 2 = 0 0 ms # of solutions for 3 x 3 = 0 0 ms # of solutions for 4 x 4 = 2 0 ms # of solutions for 5 x 5 = 10 0 ms # of solutions for 6 x 6 = 4 0 ms # of solutions for 7 x 7 = 40 0 ms # of solutions for 8 x 8 = 92 0 ms # of solutions for 9 x 9 = 352 0 ms # of solutions for 10 x 10 = 724 16 ms # of solutions for 11 x 11 = 2680 78 ms # of solutions for 12 x 12 = 14200 359 ms # of solutions for 13 x 13 = 73712 2438 ms # of solutions for 14 x 14 = 365596 14094 ms # of solutions for 15 x 15 = 2279184 102125 ms # of solutions for 16 x 16 = 14772512 665344 ms # of solutions for 17 x 17 = 95815104 Total time: 784454 ms. Press any key to continue . . .
I don't think that this is merely glibc misreporting the value of clock(). And I only bash Microsoft when the company deserves it: almost none of that time was spent executing OS code, so the issue also isn't Windows being inefficient. The same program runs close to three times faster on 64-bit Linux. A lot of that speed is probably due to the x86_64 architecture, but g++ also seems to be doing a better job of optimizing my program.

Edit: Tested that hypothesis by compiling in g++ 3.4.4 in Cygwin, and got:
Code:
0 ms # of solutions for 1 x 1 = 1 0 ms # of solutions for 2 x 2 = 0 0 ms # of solutions for 3 x 3 = 0 0 ms # of solutions for 4 x 4 = 2 0 ms # of solutions for 5 x 5 = 10 0 ms # of solutions for 6 x 6 = 4 0 ms # of solutions for 7 x 7 = 40 0 ms # of solutions for 8 x 8 = 92 0 ms # of solutions for 9 x 9 = 352 16 ms # of solutions for 10 x 10 = 724 31 ms # of solutions for 11 x 11 = 2680 172 ms # of solutions for 12 x 12 = 14200 891 ms # of solutions for 13 x 13 = 73712 5656 ms # of solutions for 14 x 14 = 365596 33297 ms # of solutions for 15 x 15 = 2279184 234890 ms # of solutions for 16 x 16 = 14772512


Top
Offline  
 Post subject:
PostPosted:Wed Aug 29, 2007 7:46 am 
Sagely Amphibian
User avatar
 

Joined:Sun Jun 18, 2006 3:06 pm
Posts:69
First, thanks for providing all those numbers ;)

In clear?

1) 32 bit ?MS Studio 2005? is ~2x faster than 32 bit gcc 3.4.4 in this _benchmark_.
2) 64 bit gcc is 3x faster than 32 bit ?MS Studio 2005? in this _benchmark_.

Which version of gcc do you use on Linux?

Does compiling with that version of gcc on Linux to a 32 bit target change the numbers compare to g++ 3.4.4 in Cygwin/Windows?


EDIT: I would also like to say that I like your first_safe_space/next_safe_space functions, saw them in your first code version and it's brillant :)

EDIT EDIT: My code 3 times faster stealing your first_safe_space/next_safe_space functions:

EDIT EDIT EDIT: I guess what's make my code faster is that it doesn't need ilogb, it works directly with _bitfields_.
Code:
#include <stdio.h> #include <stdlib.h> #include <time.h> #include <limits.h> // 2 functions from Lorehead code. unsigned long firstSafeSpace( unsigned long *rowMask, int y ) { unsigned long to_return = rowMask[y - 1]; return ( ~to_return & (to_return + 1) ); } unsigned long nextSafeSpace( unsigned long *rowMask, int y, unsigned long x ) { unsigned long to_return = rowMask[y - 1]; to_return |= x; to_return |= x - 1; return ( ~to_return & (to_return + 1) ); } unsigned long nQueens2Medium( int n, unsigned long x0 ) { unsigned long row[sizeof(unsigned long) * 8]; unsigned long rowMask[sizeof(unsigned long) * 8]; unsigned long rowMaskRight[sizeof(unsigned long) * 8]; unsigned long rowMaskLeft[sizeof(unsigned long) * 8]; unsigned long nbit = 1UL << ( n - 1 ); unsigned long nbSol = 0; unsigned long maskCol = x0; int y = 2; row[0] = x0; rowMaskLeft[1] = x0 >> 1; rowMaskRight[1] = x0 << 1; rowMask[1] = rowMaskLeft[1] | rowMaskRight[1] | maskCol; unsigned long xstart = firstSafeSpace( rowMask, y ); while ( y >= 2 ) { for ( unsigned long x = xstart; x <= nbit; ) { if ( !( rowMask[y - 1] & ( row[y - 1] = x ) ) ) { if ( y < n ) { maskCol |= x; rowMaskLeft[y] = ( rowMaskLeft[y - 1] >> 1 ) | ( x >> 1 ); rowMaskRight[y] = ( rowMaskRight[y - 1] << 1 ) | ( x << 1 ); rowMask[y] = rowMaskLeft[y] | rowMaskRight[y] | maskCol; y++; x = firstSafeSpace( rowMask, y ); // to start the next for at 0 continue; } else { nbSol++; break; } } x = nextSafeSpace( rowMask, y, x ); } y--; xstart = nextSafeSpace( rowMask, y, row[y - 1] ); maskCol ^= row[y - 1]; } return nbSol; } inline bool nQueens2LargeSafe( int *row, int x, int y ) { for ( int i = 1; i <= y; i++ ) { int rowVal = row[y - i]; if ( rowVal == x || rowVal == x - i || rowVal == x + i ) return false; } return true; } unsigned long nQueens2Large( int n, int x0 ) { int *row = new int[n]; unsigned long nbSol = 0; int y = 2; int xstart = 0; row[0] = x0; while ( y >= 2 ) { for ( int x = xstart; x < n; x++ ) { if ( nQueens2LargeSafe( row, row[y - 1] = x, y - 1 ) ) { if ( y < n ) { y++; x = -1; // to start the next for at 0 } else { nbSol++; break; } } } y--; xstart = row[y - 1] + 1; } delete [] row; return nbSol; } unsigned long nQueens( int n ) { if ( n == 1 ) return 1; unsigned long nbSolLeft = 0, nbSolMid = 0; int limit = ( n - 1 ) >> 1; for ( int x = 0; x <= limit; x++ ) { unsigned long nbSolTemp; if ( sizeof(unsigned long) * 8 > n ) nbSolTemp = nQueens2Medium( n, 1UL << x ); else nbSolTemp = nQueens2Large( n, x ); if ( x == limit && ( n & 1 ) ) nbSolMid += nbSolTemp; else nbSolLeft += nbSolTemp; } return ( nbSolLeft << 1 ) + nbSolMid; } int main() { const int n = 17; for ( int i = 1; i <= n; i++ ) { clock_t init = clock(); unsigned long nbSol = nQueens( i ); clock_t final = clock(); printf( "%10u # of solutions for %d x %d = %u\n", (unsigned int)( (final - init) * 1000 / CLOCKS_PER_SEC ), i, i, nbSol ); } return 0; }
Code:
0 # of solutions for 1 x 1 = 1 0 # of solutions for 2 x 2 = 0 0 # of solutions for 3 x 3 = 0 0 # of solutions for 4 x 4 = 2 0 # of solutions for 5 x 5 = 10 0 # of solutions for 6 x 6 = 4 0 # of solutions for 7 x 7 = 40 0 # of solutions for 8 x 8 = 92 0 # of solutions for 9 x 9 = 352 0 # of solutions for 10 x 10 = 724 0 # of solutions for 11 x 11 = 2680 0 # of solutions for 12 x 12 = 14200 47 # of solutions for 13 x 13 = 73712 203 # of solutions for 14 x 14 = 365596 1329 # of solutions for 15 x 15 = 2279184 8296 # of solutions for 16 x 16 = 14772512 62094 # of solutions for 17 x 17 = 95815104


Top
Offline  
 Post subject:
PostPosted:Wed Aug 29, 2007 12:19 pm 
 

Joined:Thu Aug 09, 2007 9:09 am
Posts:26
I'll see your loop and raise you one perfect hash function!

Edit: And replace the attack matrix with something static and sane. Apparently, that wasn't a huge bottlenect, but it did make a noticeable difference for higher values of N. Better refactor the loop.
Code:
#define NDEBUG 1 #define _XOPEN_SOURCE 600 #include <algorithm> #include <cassert> #include <climits> #include <cstdlib> #include <ctime> #include <iostream> #include <iomanip> #include <vector> #if ( __GNUC__ >= 4 ) || ( _STDC_VERSION__ >= 199901L ) #define USE_C99_EXTENSIONS 1 #endif #ifdef USE_C99_EXTENSIONS #include <stdint.h> #else typedef unsigned long uint_fast32_t; typedef unsigned char uint8_t; typedef unsigned short uint16_t; typedef unsigned long uint32_t; #endif using std::clock; using std::cout; using std::endl; using std::ostream; using std::setw; using std::transform; using std::vector; typedef uint_fast32_t uint_fast; const unsigned start = 1; // const unsigned finish = 14; static const unsigned finish = 17; /* I stand by my earlier comment: because ilogb is a constant-time operation, * it beats linear search in theory. It's still too slow. I need a better, * constant-time function. I need a hash table. * * Unfortunately, hash tables are in Technical Report 1 of the ANSI/ISO C++ * standard, which has yet to be formally adopted. So, I have to roll my own. */ class hash_2n { /* This class represents a hash table whose keys are powers of 2, between 1 * and 2**(finish-1), and whose data are the outputs of the function: * lg(key) + 1. The lg notation comes from Donald Knuth. (Aren't you glad I * dropped his name?) * * Anyhow. This class is intended as a singleton, since there's no rational * motive to have more than one. It does not have collision handling, * insertion, or more than minimal features. Such as it does have follow the * example of map. */ public: hash_2n (void); inline uint_fast operator[] ( uint_fast exponent ) const; private: static const uint_fast hash_value = 37; static uint_fast table[hash_value]; }; hash_2n::hash_2n (void) { for ( uint_fast i = 0; i < 17; ++i ) { const uint_fast hash = ( 1 << i ) % hash_value; assert( 0 == table[hash] ); table[hash] = i; } return; } inline uint_fast hash_2n::operator[] ( uint_fast exponent ) const { return table[ exponent % hash_value ]; } uint_fast hash_2n::table [hash_value]; const hash_2n column_hash; // The one and only instance of this class in the program. /* This template contains a generic implementation of a N-queens algorithm. * The compiler should be able to optimize for speed (at the expense of code * size, which is far less important) based on the fact that N is a constant. * For efficiency, we can plug in hand-optimized implementations for any * value of N. * * The algorithm assumes that an unsigned long has at least N bits. Given * that it has at least 32 bits, and any computer capable of solving the * problem for N > 32 in any feasible amount of time will have at least a 64- * bit architecture, this seems safe. * * The largest problem with this program was its choice of data types; the * current solution is to make bitfield a template parameter. It should be * an unsigned integral type, although you might see whether unsigned long * is faster on your architecture. */ template < typename T > class bitwise_or { // A binary functor, intended for bitfields. public: inline T operator() ( const T& a, const T& b ) { return (a | b); } }; template < unsigned int N > class board { /* This object represents a N * N chessboard. Internally, it maintains a * vector of bitfields, each of size N. A modern compiler should be able to * vectorize operations on these data. On most implementations, this should * fit each bitfield into the smallest appropriate integer type, permitting * a SIMD instruction such as POR. This is probably not the absolutely most * compact representation (e.g., the 5x5 board will probably need 5 bytes * instead of 4), but it should generate appropriately-aligned code for * speed. * * A 1 in the position representing any square means that a queen definitely * threatens that square. A 0 means that we do not know of any queen which * threatens that square. * * The bit positions are stored in row-major order, with the square on the * upper left represented by squares[0][0]. */ public: // We should not need to override the implicit member functions. void clear(void); // Wipes all queens off the board; void place_queen( uint_fast row, uint_fast column_bit ); // Place a queen, updating the board as to which squares it attacks. uint_fast first_safe_space( uint_fast row ) const; // Returns the first column in the given row not under attack, or // a number out of range if no such column exists. uint_fast next_safe_space ( uint_fast row, uint_fast column_bit ) const; /* Returns either the first column to the right of the specified column * that is not under attack, or a number out of range if no such column * exists. */ private: uint32_t squares[N]; // Stores N * N bits representing the positions under attack on a N * N // chessboard. }; //template < unsigned int N, typename bitfield > // typename board<N, bitfield>::attack_matrix_t board< N, bitfield >::attack_matrix; // It's time to get serious. The old algorithm just wasn't cutting it. class attack_matrix_t { public: attack_matrix_t (void); inline const uint_fast* operator[] ( size_t column ) const; private: static uint32_t the_matrix [finish][finish - 1]; /* Stores bitfields representing positions attacked by a queen in a given * column. The entries are bitmasks to apply to the N - 1 squares below. */ }; uint_fast attack_matrix_t::the_matrix [finish][finish - 1]; attack_matrix_t::attack_matrix_t (void) { for ( size_t i = 0; i < finish; ++i ) { const uint_fast rook = 1 << i; for ( size_t j = 0; j < finish - 1; ++j ) { const uint_fast kings_bishop = rook << (j + 1), queens_bishop = rook >> (j + 1); the_matrix[i][j] = static_cast<uint32_t> ( rook | kings_bishop | queens_bishop ); } } } inline const uint_fast* attack_matrix_t::operator[] ( size_t column ) const { return (const uint_fast*) ( the_matrix + column ); } const attack_matrix_t attack_matrix; // The only instance of this object in the program. template < unsigned int N > void board<N>::clear( void ) // Wipes all queens off the board. { for ( size_t i = 0; i < N; ++i ) squares[i] = 0; } template < unsigned int N > void board<N>::place_queen( uint_fast row, uint_fast column_bit ) // Place a queen on the board, updating all rows below row to show which // squares are under attack. { // Previously used ilogb(), which was non-portable and incredibly slow. const uint_fast column = column_hash[column_bit]; static const bitwise_or<uint32_t> or_functor = bitwise_or<uint32_t>(); // If the test run passes all these sanity checks, it should be safe to // recompile with NDEBUG, and run without them. assert( row >= 0 ); assert( row < N - 1 ); assert( column >= 0 ); assert( column < N ); transform ( (&squares[0]) + 1 + row, (&squares[0]) + N, attack_matrix[column], (&squares[0]) + 1 + row, or_functor ); return; } template < unsigned int N > uint_fast board<N>::first_safe_space( uint_fast row ) const { const uint_fast to_return = squares[row]; return ( ~to_return & (to_return + 1) ); } template <unsigned int N> uint_fast board<N>::next_safe_space( uint_fast row, uint_fast column_bit ) const { uint_fast to_return = squares[row]; // Mask the columns up to column_bit. to_return |= column_bit; to_return |= column_bit - 1; return ( ~to_return & (to_return + 1) ); } template <unsigned int N> long solve_queens(void) { long solutions = 0; uint_fast j[N-1]; board<N> backtrack[N-1]; // This loop may soon be refactored: for ( j[0] = 1; j[0] < 1 << (N >> 1); j[0] <<= 1 ) { backtrack[0].clear(); backtrack[0].place_queen( 0, j[0] ); uint_fast current_row = 1; j[1] = backtrack[0].first_safe_space(1); while (true) { if ( 1 << N <= j[current_row] ) { // There are no (more) possible matches on this row. if ( 1 == current_row ) break; --current_row; j[current_row] = backtrack[current_row-1].next_safe_space( current_row, j[current_row] ); } else if ( N - 2 == current_row ) { // Descend to the final level. // // Any open spaces we find here (there should be one at most) is a // solution. Furthermore, it has a symmetric solution. backtrack[N-2] = backtrack[N-3]; backtrack[N-2].place_queen( N-2, j[N-2] ); const uint_fast k = backtrack[N-2].first_safe_space(N-1); if ( k < 1 << N ) { if ( N % 2 == 1 && k == 1 << (N >> 1) ) // Our last queen is in the center column. solutions += 4; else solutions += 2; } j[N-2] = backtrack[N-3].next_safe_space( N-2, j[N-2] ); } else { // Descend. backtrack[current_row] = backtrack[current_row - 1]; backtrack[current_row].place_queen(current_row, j[current_row] ); ++current_row; j[current_row] = backtrack[current_row-1].first_safe_space(current_row); } // end if } // end while } // end for return solutions; } template <> long solve_queens<1> (void) { return 1; } template <> long solve_queens<2> (void) { return 0; } long queens( unsigned n ) { switch (n) { case 1: return solve_queens< 1 > (); break; case 2: return solve_queens< 2 > (); break; case 3: return solve_queens< 3 > (); break; case 4: return solve_queens< 4 > (); break; case 5: return solve_queens< 5 > (); break; case 6: return solve_queens< 6 > (); break; case 7: return solve_queens< 7 > (); break; case 8: return solve_queens< 8 > (); break; case 9: return solve_queens< 9 > (); break; case 10: return solve_queens< 10 > (); break; case 11: return solve_queens< 11 > (); break; case 12: return solve_queens< 12 > (); break; case 13: return solve_queens< 13 > (); break; case 14: return solve_queens< 14 > (); break; case 15: return solve_queens< 15 > (); break; case 16: return solve_queens< 16 > (); break; case 17: return solve_queens< 17 > (); break; case 18: return solve_queens< 18 > (); break; default: return -1; // If you're getting -1, expand the switch block. } } int main(void) { clock_t begin, end; clock_t total = 0; unsigned i; for ( i = start; i <= finish; ++i ) { long answer; begin = clock(); answer = queens(i); end = clock(); total += end - begin; cout << setw(8); cout << static_cast<double>( 1000 * (end - begin) ) / CLOCKS_PER_SEC; cout << " ms # of solutions for "; cout << setw(2) << i << " x "; cout << setw(2) << i << " = "; cout << answer << endl; } cout << "Total time: "; cout << static_cast<double>( 1000 * total ) / CLOCKS_PER_SEC; cout << " ms." << endl; return EXIT_SUCCESS; }


Last edited by Lorehead on Wed Aug 29, 2007 3:25 pm, edited 1 time in total.

Top
Offline  
 Post subject:
PostPosted:Wed Aug 29, 2007 12:58 pm 
 

Joined:Thu Aug 09, 2007 9:09 am
Posts:26
New benchmarks, this time for Win32:

First, a run compiled with the default release settings. This is already running in half the time as before.
Code:
0 ms # of solutions for 1 x 1 = 1 0 ms # of solutions for 2 x 2 = 0 0 ms # of solutions for 3 x 3 = 0 0 ms # of solutions for 4 x 4 = 2 0 ms # of solutions for 5 x 5 = 10 0 ms # of solutions for 6 x 6 = 4 0 ms # of solutions for 7 x 7 = 40 0 ms # of solutions for 8 x 8 = 92 0 ms # of solutions for 9 x 9 = 352 0 ms # of solutions for 10 x 10 = 724 0 ms # of solutions for 11 x 11 = 2680 16 ms # of solutions for 12 x 12 = 14200 172 ms # of solutions for 13 x 13 = 73712 953 ms # of solutions for 14 x 14 = 365596 5390 ms # of solutions for 15 x 15 = 2279184 39515 ms # of solutions for 16 x 16 = 14772512 263011 ms # of solutions for 17 x 17 = 95815104 Total time: 309057 ms. Press any key to continue . . .
Now, to enable better optimizations in the project settings. (Don't worry, you'll get them, too.)
Code:
0 ms # of solutions for 1 x 1 = 1 0 ms # of solutions for 2 x 2 = 0 0 ms # of solutions for 3 x 3 = 0 0 ms # of solutions for 4 x 4 = 2 0 ms # of solutions for 5 x 5 = 10 0 ms # of solutions for 6 x 6 = 4 0 ms # of solutions for 7 x 7 = 40 0 ms # of solutions for 8 x 8 = 92 0 ms # of solutions for 9 x 9 = 352 0 ms # of solutions for 10 x 10 = 724 15 ms # of solutions for 11 x 11 = 2680 16 ms # of solutions for 12 x 12 = 14200 125 ms # of solutions for 13 x 13 = 73712 765 ms # of solutions for 14 x 14 = 365596 4875 ms # of solutions for 15 x 15 = 2279184 30906 ms # of solutions for 16 x 16 = 14772512 255995 ms # of solutions for 17 x 17 = 95815104 Total time: 292697 ms. Press any key to continue . . .
A 62% improvement in speed.

Frenchfrog's:
Code:
0 # of solutions for 1 x 1 = 1 0 # of solutions for 2 x 2 = 0 0 # of solutions for 3 x 3 = 0 0 # of solutions for 4 x 4 = 2 0 # of solutions for 5 x 5 = 10 0 # of solutions for 6 x 6 = 4 0 # of solutions for 7 x 7 = 40 0 # of solutions for 8 x 8 = 92 0 # of solutions for 9 x 9 = 352 0 # of solutions for 10 x 10 = 724 0 # of solutions for 11 x 11 = 2680 0 # of solutions for 12 x 12 = 14200 46 # of solutions for 13 x 13 = 73712 266 # of solutions for 14 x 14 = 365596 1625 # of solutions for 15 x 15 = 2279184 11125 # of solutions for 16 x 16 = 14772512 77311 # of solutions for 17 x 17 = 95815104 Press any key to continue . . .
That's amazing. I wonder how mine will do on Linux, but you're way ahead on Windows.

Edit: Eliminating the overhead of the lookup table for attack positions brought me closer, but I'll still need to refactor the loop.
Code:
0 ms # of solutions for 1 x 1 = 1 0 ms # of solutions for 2 x 2 = 0 0 ms # of solutions for 3 x 3 = 0 0 ms # of solutions for 4 x 4 = 2 0 ms # of solutions for 5 x 5 = 10 0 ms # of solutions for 6 x 6 = 4 0 ms # of solutions for 7 x 7 = 40 0 ms # of solutions for 8 x 8 = 92 0 ms # of solutions for 9 x 9 = 352 0 ms # of solutions for 10 x 10 = 724 0 ms # of solutions for 11 x 11 = 2680 16 ms # of solutions for 12 x 12 = 14200 125 ms # of solutions for 13 x 13 = 73712 765 ms # of solutions for 14 x 14 = 365596 4328 ms # of solutions for 15 x 15 = 2279184 31734 ms # of solutions for 16 x 16 = 14772512 213888 ms # of solutions for 17 x 17 = 95815104 Total time: 250856 ms. Press any key to continue . . .


Top
Offline  
 Post subject:
PostPosted:Wed Aug 29, 2007 4:32 pm 
Sagely Amphibian
User avatar
 

Joined:Sun Jun 18, 2006 3:06 pm
Posts:69
After a quick look at your code, perhaps transform in place_queen is doing too much work? There is no need to calcule immediately the attack matrix (1 + row) to N since a lot of impossible placement are detected before reaching N?


Top
Offline  
 Post subject:
PostPosted:Wed Aug 29, 2007 5:45 pm 
 

Joined:Thu Aug 09, 2007 9:09 am
Posts:26
Quote:
After a quick look at your code, perhaps transform in place_queen is doing too much work? There is no need to calcule immediately the attack matrix (1 + row) to N since a lot of impossible placement are detected before reaching N?
What it should be doing, unless I misread the documentation tremendously, is starting with the row after the one we checked, and keep descending to the bottom of the board. The start iterator (a pointer into the array) points just past row in squares: it's the element squares[row + 1]. (The row variable is an index into the array.) Unless there's something I've overlooked.

A C-style loop did not speed up the code, so that doesn't seem to be a bottleneck. In theory, the semantics of transform should make it easier for a compiler to parallelize or vectorize that loop, and MSVC doesn't seem to be choking on it.

I have another idea for how to optimize the program, and it focuses on the loop. I'll be trying that first.


Top
Offline  
 Post subject:
PostPosted:Wed Aug 29, 2007 6:10 pm 
Sagely Amphibian
User avatar
 

Joined:Sun Jun 18, 2006 3:06 pm
Posts:69
I wasn't clear enough, transform is fine, the problem is it's use, if I haven't misread your code, if it's placing the queen on a given row, transform will be called on all the rows from that current row plus 1 up to N. This means that if you backtrack before the last row some calculation will have been made for nothing. Is it a huge impact? I honestly don't know.

I would also like to point out the other brillant idea you had to eliminate the need to try the middle row on odd N ;)


Top
Offline  
Display posts from previous: Sort by 
Post new topic Reply to topic

All times are UTC-05:00


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Theme created by Miah with assistance from hyprnova