A new beginning

Looks like I am going to have some time to do my own thing, finally. Maybe it is a good time to retake some old forgotten projects. For a very long time, I’ve wanted to make videogames. Not just as a hobby, but as a profession. I’m a huge fan of indie games as you can see from my Steam library here. Now with SteamOS and smatphone games it seems like there’s a lot of changes in the videogame industry. You don’t need to have a team of 100+ people working together to create the next hit game, it just seems like game development is going back to its roots, where people created little gems, on their own, with a lot of talent and tons of imagination.

Limbo
Limbo, one of the best games I’ve ever played

Now, I’m a hardcore techie, so I’m not going to be using any popular engines like Unity or Ogre. I did contribute to Leadwerks being ported to Linux on Kickstarter, but I don’t think I’ll be using it.

I don’t intend on reinventing the wheel though, so I will be using a great number of open source libraries. This is a list of technologies that I might probably be using :

  • C/C++
  • Lua, for scripting purposes
  • Simple DirectMedia Layer (SDL), by the great icculus, the cross-platform equivalent of DirectX, handles images, audio, net, …
  • OpenGL, the cross-platform of Direct3D
  • OpenAL, the audio equivalent to OpenGL
  • Assimp, the Asset Import Library, a library used to import 3D models, bones and animations
  • Audacity, audio editor
  • zlib, probably, for compression purposes
  • Boost, because it just rocks
  • LMMS, to produce music. I’m actually going to give this a shot! Needless to say, I have no idea what i’m doing.
  • Blender, 3D model creation
  • Makehuman, quick and easy character creation
  • Git, version control
  • CMake, the most flexible build system I can think of
  • Valgrind, the analysis tools that have taught me so much about programming
  • GCC/GDB the GNU compiler and debugger

Other than this, I will give a shot to LLVM, as I keep hearing more and more about it, but I’ve never actually tried it. And I have no idea what I could use for creating sound effects.

Eventually, I might write a second part to this post, where I will aacknowledge how wrong I was about many choices I make here. There is one thing I know for sure though, and it’s that, at the very least, I will get more knowledge from this experience.

OpenMP, multiprocessing made easy

OpenMP is, according to the Wikipedia, an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran on many architectures, including Unix and Microsoft Windows platforms. This means, in a nutshell, that it’s an API that makes multiprocessing easy.

The API is currently on its third version, and it is supported to some degree by most compilers on the market today, including GCC since version 4.2, Microsoft’s Visual C++ since version 2005, Intel’s C++ compiler, IBM XL C/C++ compiler and many others.

The most interesting feature of OpenMP is simplicity of implementation, even on existing, production code. With just a little effort, one can turn a single threaded program and turn it into a multiprocessing program, quite often experiencing a huge performance boost, specially on multicore architectures. Although its use is so simple, I still find that it’s unusual to see it being used by most programmers I’ve met. Maybe it’s because most programmers are still used to thinking in terms of one process for most of our non time-critical code, and we don’t have the reflex of implementing such features on a day-to-day basis. And when we implement processes and/or threads, we do it using platform-specific instructions like fork().

This “Hello World” example from the Wikipedia is the perfect example to demonstrate OpenMP’s simplicity:
[cc lang=”c”]
#include
#include
#include

int main (int argc, char *argv[]) {
int th_id, nthreads;
#pragma omp parallel private(th_id)
{
th_id = omp_get_thread_num();
printf(“Hello World from thread %d\n”, th_id);
#pragma omp barrier
if ( th_id == 0 ) {
nthreads = omp_get_num_threads();
printf(“There are %d threads\n”,nthreads);
}
}
return EXIT_SUCCESS;
}
[/cc]

I compile this using GCC by adding -fopenmp to the list of parameters.

[cc lang=”text”]
$ gcc -fopenmp -o openmptest openmp.c
$ ./openmptest
Hello World from thread 0
Hello World from thread 3
Hello World from thread 1
Hello World from thread 2
There are 4 threads
[/cc]
Now if I want to adjust the number of threads to, let’s say, eight threads, I simply have to add an instruction omp_set_num_threads() as such:
[cc lang=”c”]
#include
#include
#include

int main (int argc, char *argv[]) {
int th_id, nthreads;
omp_set_num_threads(8);
#pragma omp parallel private(th_id)
{
th_id = omp_get_thread_num();
printf(“Hello World from thread %d\n”, th_id);
#pragma omp barrier
if ( th_id == 0 ) {
nthreads = omp_get_num_threads();
printf(“There are %d threads\n”,nthreads);
}
}
return EXIT_SUCCESS;
}
[/cc]
The result, as I expected:
[cc lang=”text”]
$ gcc -fopenmp -o openmptest2 openmp2.c
$ ./openmptest2
Hello World from thread 6
Hello World from thread 2
Hello World from thread 3
Hello World from thread 0
Hello World from thread 4
Hello World from thread 1
Hello World from thread 5
Hello World from thread 7
There are 8 threads
[/cc]
How is this done in Linux using fork()? Well, an example program with similar functionality would be something as such:
[cc lang=”c”]
#include
#include
#include

int num_threads = 4;

int main (int argc, char *argv[])
{
int pid[num_threads];
int thread_num;
int status;

// Fork creation
for(thread_num=0; thread_num<num_threads; thread_num++)
{
if ((pid[thread_num] = fork()) == 0)
{
// Wait until all child processes are finished
wait(&status);
}
else
{
printf(“Hello World from thread %d\n”, thread_num);
exit(0);
}
}

printf(“There are %d threads\n”,num_threads);
return EXIT_SUCCESS;
}
[/cc]
The result, as expected, is as follows:
[cc lang=”text”]
$ gcc -o forktest forktest.c
$ ./forktest
Hello World from thread 0
Hello World from thread 1
Hello World from thread 2
Hello World from thread 3
There are 4 threads
[/cc]

My Windows C programming is rusty to say the least, but of course, the above code with fork() would not work under Windows, I would have to use CreateThread(), while the OpenMP example should work if I adjust the includes and the main function parameters as usual.

So, anyways, I’m going to try something a bit more fun. There’s this fairly simple yet cool algorithm which is based on the so called “Montecarlo method”. This method consists of using randomly generated numbers to calculate a value through approximation. I’m doing something slightly different, mine is basically an algorithm to make an estimation of the ? number, but instead of using randomly generated numbers, I will be using a great amount of evenly sequenced numbers. The basic idea is that in a 1 x 1 square the area is 1 square units, and the area of a 1 radius circle is ? square units (? * 12). So if I place the centre of my circle on one of the four corners of the square, the area in common would be ?/4 square units.

Now, if I “throw” a high number of evenly distributed dots on my 1 x 1 square, the ratio of dots landing inside the circle quarter to the total number of dots would be ? / 4 to 1. We can, with this ratio, make an estimate of the value of ?. This, in quick and dirty C code, could be written like this:

[cc lang=”c”]
#include
#include
#include
#include

#define TOTAL_BULLETS 100000

int main(int argc, char *argv[])
{
double almost_pi;
int i,j;
double counter_in=0.0f;
struct timeval startTime;
struct timeval endTime;
double x, y, hypotenuse;

gettimeofday(&startTime, NULL);
for(i=0;i<=TOTAL_BULLETS;i++)
{
for(j=0;j<=TOTAL_BULLETS;j++)
{
x=(double) i / (double) TOTAL_BULLETS;
y=(double)j/(double) TOTAL_BULLETS;

hypotenuse = sqrt(x*x + y*y);

if (hypotenuse <= 1.0f)
counter_in+=1.0f;
}
}

printf(“%f bullets hit the circle\n”,counter_in);
almost_pi=((double)counter_in/(double)TOTAL_BULLETS/(double)TOTAL_BULLETS*4.0f);
printf(“Estimated value of pi is %f\n”,almost_pi);

gettimeofday(&endTime, NULL);
double tS = startTime.tv_sec*1000000 + (startTime.tv_usec);
double tE = endTime.tv_sec*1000000 + (endTime.tv_usec);
double duration = tE-tS;
printf(“Time : %f\n”,duration);
return EXIT_SUCCESS;
}
[/cc]

And the result of my little experiment is actually pretty decent:

[cc lang=”bash]
$ gcc -lm shotgun.c -O3 -o shotgun
$ ./shotgun
7854081365.000000 bullets hit the circle
Estimated value of pi is 3.141633
Time : 455893907.000000
[/cc]

The CPU use during that time was below 50%, as only one core of my dual core machine was in use by my program.

[cc lang=”bash”]
top – 11:14:35 up 2:58, 4 users, load average: 0.58, 1.36, 1.53
Tasks: 112 total, 2 running, 110 sleeping, 0 stopped, 0 zombie
Cpu(s): 50.0%us, 0.2%sy, 0.0%ni, 49.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2058320k total, 540940k used, 1517380k free, 58280k buffers
Swap: 4095992k total, 0k used, 4095992k free, 337472k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4460 txukie 25 0 6244 328 256 R 100.2 0.0 0:26.12 shotgun
409 root 10 -5 0 0 0 S 0.3 0.0 0:00.90 scsi_eh_0
1 root 18 0 10372 684 572 S 0.0 0.0 0:00.51 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
8 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 events/0
[/cc]

As seen on the previous example, running this program on an old Core 2 Duo E6400 machine running CentOS 5.6 takes 455 seconds. Now, if I adapt the code with OpenMP it should take a lot less time. Just a few lines of code are necessary to accomplish this task.

[cc lang=”c”]
#include
#include
#include
#include
#include

#define TOTAL_BULLETS 100000

int main(int argc, char *argv[])
{
double almost_pi;
int i,j;
double counter_in=0.0f;
struct timeval startTime;
struct timeval endTime;
double x, y, hypotenuse;

gettimeofday(&startTime, NULL);

#pragma omp parallel for private(i) reduction(+: counter_in)
for(i=0;i<=TOTAL_BULLETS;i++)
{
#pragma omp parallel for private(j,x,y,hypotenuse)
for(j=0;j<=TOTAL_BULLETS;j++)
{
x=(double) i / (double) TOTAL_BULLETS;
y=(double)j/(double) TOTAL_BULLETS;

hypotenuse = sqrt(x*x + y*y);

if (hypotenuse <= 1.0f)
{
#pragma omp atomic
counter_in++;
}
}
}

printf(“%f bullets hit the circle\n”,counter_in);
almost_pi=((double)counter_in/(double)TOTAL_BULLETS/(double)TOTAL_BULLETS*4.0f);
printf(“Estimated value of pi is %f\n”,almost_pi);

gettimeofday(&endTime, NULL);
double tS = startTime.tv_sec*1000000 + (startTime.tv_usec);
double tE = endTime.tv_sec*1000000 + (endTime.tv_usec);
double duration = tE-tS;
printf(“Time : %f\n”,duration);
return EXIT_SUCCESS;
}
[/cc]

The result is as follows:

[cc lang=”bash”]
$ gcc -lm shotgun.c -fopenmp -O3 -o shotgun_omp
$ ./shotgun_omp
7854081365.000000 bullets hit the circle
Estimated value of pi is 3.141633
Time : 385923213.000000
[/cc]

Now the result is a lot better, as the time taken to complete is 385 seconds, that’s a reduction of almost 70 seconds. My computer’s cores are used fully now, as indicated by top:

[cc lang=”bash”]
top – 10:54:51 up 2:39, 3 users, load average: 1.99, 1.90, 1.25
Tasks: 111 total, 2 running, 109 sleeping, 0 stopped, 0 zombie
Cpu(s): 84.3%us, 15.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2058320k total, 538088k used, 1520232k free, 56672k buffers
Swap: 4095992k total, 0k used, 4095992k free, 337468k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4361 txukie 25 0 22964 560 444 R 199.7 0.0 31:37.46 shotgun_omp
3432 txukie 15 0 77244 1588 600 S 0.3 0.1 0:00.05 screen
1 root 18 0 10372 684 572 S 0.0 0.0 0:00.51 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
[/cc]

A full specification on OpenMP 3.0, and many more resources, can be found here.