A computer program is a sequence of commands executed by the CPU one after another. Those commands are generally very simple (like sums, multiplications, reading data from the RAM), but are combined to do more complicated tasks. Typically a program consists of thousands to millions of such simple commands.
The most basic level of programming is to write machine code (single commands to the CPU) directly. This is very difficult as a modern CPU typically has hundreds of different commands, each of which does a different thing and has different interesting properties. But the main problem with this approach is not even the difficulty, but the fact that every different CPU model has a different command set, different strengths and weaknesses, and that it takes a deep knowledge of how the different hardware components and the used OS work.
To the other side, machine code is the only thing a CPU understands. Thus it is necessary to transform every program into machine code before supplying it to it. The question is only whether we want to do that by hand or use a program to do it.
Writing machine code means writing in binary1.1 directly, not text. This is of course very tedious partly due to the fact that keyboards are made to write text with, and mainly that humans are more used to think in text than in binary. While this was popular once (as it was the only way to go) it is now almost forgotten.
Assembler is nothing but textified machine code. Instead of having to
remember the number of each command, you use names for the different
commands. Of course each CPU still has different commands, and
you need to have all the same knowledge as in writing machine code
directly. Assembler is still widely used to write high-speed code, or
for the kernel of an operating system (where the hardware is accessed
directly), or for embedded systems. As assembler is plain text it
cannot be executed by the CPU; it needs to be transformed into
machine code first. This is done using a program called ``Assembler''
as well. To give you an idea, assembler looks more or less like this:
pushl %ebp movl %esp,%ebp subl $20,%esp pushl %ebx movl $2,-4(%ebp) movl -4(%ebp),%eax leal 1(%eax),%edx
Generally assemblers offer a few other options as well to make programming a bit easier, but it is still very difficult. It is definitely a very bad idea to try (and fail) to learn programming using assembler.
Both of these approaches require a lot of skills and have the big disadvantage that it is necessary to virtually completely rewrite a program to port it to another CPU architecture or even OS.
Then there are the high-level languages. There are literally hundreds if not thousands of them, and it is impossible to give a description of all of them. Most of them share a similar approach, though. They are special languages similar to English (or to other spoken languages, although English is clearly predominant) which can be transformed into machine code doing what you asked for in that special language. For example, in BASIC the command
PRINT "Hello"will print
Hello on the screen. This is of course much easier
to remember than a sequence consisting of things like the assembler
code shown in 1.2.1.
There are a few things to remark about programming languages (and computers in general), though. A computer looks at everything in the form of ones and zeroes, not in form of ``more or less''. Thus it will do exactly what you tell it, not what you meant. And it will not accept anything with mistakes in it. For example, it will not execute
PRINT] "Hello"even if it is obvious that the
] was only a typo. Computers are
not forgiving at all. If a command like PRINT] exists, it will
execute that one even if it is obvious that it is not what you wanted.
But high-level programming languages are not only much easier to use than assembler or machine code; they have other advantages as well. As they do not rely on a particular CPU architecture, but use a more general approach (like using the sign ``+'' instead of the ``add'' instruction of a particular CPU), they are much more portable. That means that the same program can be used on different platforms with no or little changes.
This still leaves us with one problem. How is a program written in some programming language (called the ``source code'') translated into machine code for a certain platform? There are two main solutions, both with their own advantages and disadvantages.
An interpreter is a program which reads source code, and
executes the commands in it while it is reading it. There is no real
translation in traditional interpreters; basically the interpreter
knows what to do when it finds a certain command. For example, when a
BASIC interpreter finds a PRINT command it knows that it must
print the text which follows it on the screen, and so on.
This approach has the following main advantages:
But it has several disadvantages as well, and that's why interpreted languages are not so widely used anymore today:
PRINT] instead of
PRINT) may be detected only after a program is shipped,
because they are only found if the program ever reaches that point.
Interpreted languages are nowadays used mainly where the same program needs to run on a wide variety of platforms. Examples include BASIC, JavaScript and some others. JavaScript, for example, must be executed on totally different platforms.
Another application of interpreted languages is scripting, used in many advanced applications. (If you don't know what scripting is, don't worry because you don't need to.)
The other approach are compilers. A compiler takes the source code and translates it into machine code, generally generating some kind of ``executable file'' (the ``.exe'' files under windows, or some of the files with the ``executable'' flag under Unix). The program can be executed only after the compilation has finished. Of course, if the compilation fails because there are errors in the source of for some other reason, no executable is generated and it can thus not be executed either.
The main advantages of compiled languages are the following:
But, as for interpreted languages, there are also some drawbacks:
Compiled languages are used for most software applications, like office suites or HTML editors or anything else you can think of. They are the only way to go for high-speed applications like 3D games or photo retouching programs. The most popular compiled languages are C and C++.
There are also some languages which are somewhere in between the compiled and interpreted ones. The most famous of them is Java, which is compiled into some kind of virtual machine code, which is then executed on a Java Virtual Machine (which is a special software). This way Java achieves a considerable speed and a high portability at the same time. There are other drawbacks, though.
Another noteworthy language is Perl. It is an interpreted language, but there are also compilers available for it. Moreover, the Perl interpreter has some characteristics of a compiler and can thus not be put into either of the two categories.
From what I've told you until now, programming does not seem like a very difficult task: you just need to know a programming language and can just start writing things in it.
This is true, but there is more to it anyway. First of all, learning a programming language is not exactly as easy as you might think; then, there are other issues as well, like portability and speed.
The most interesting of those is portability. A program is ``portable'' if it can be easily changed to run on another platform. This is generally possible only if you do not rely on any platform's special features or architecture. For example, if you use a feature which only Unix has, your program is not portable because it won't run on a Mac, and vice versa.
C++ is a compiled high level language, with features for both object-oriented and imperative programming. It comprehends most of the functionality of C (which is another programming language). It is very widely used and very powerful, and there are lots of good compilers available on almost any platform.
Programs written in C++ are typically comparably fast to equivalent programs written in C, but simpler. To give you an idea of what ``fast'' means, the Linux Kernel, most of the programs by the Free Software Foundation and all games by id software (as Doom I, II and III, Quake I, II and III as well as many others) are written in C.
One very important aspects of C++ distinguishing it from most other programming languages is the fact that it is standardized. This means that there is an internationally accepted standard (currently ISO/IEC 14882-1998) for C++ specifying what its features are and how it works. As all modern compilers are built trying to follow the standard as closely as possible, if you write a program in Standard C++ (without any compiler or platform specific extensions) it is very likely that you will be able to compile it with no or little changes under another platform and compiler. This makes portability much much easier.
Before you start programming, there is some software you need. Fortunately you can get it all for free, but if you want you can also spend lots of money on it.
Of course you need a C++ compiler for your platform. There are many of them, but I'd recommend you use a recent version of gcc if you are working under Unix or a port of it for any other platform. (The windows port of gcc is called mingw).
You also need to know how to invoke your compiler; please refer to your compiler's manual for that.
Then, you need a good text editor to edit the source code. I'd recommend emacs and vim for Unix, or just any other editor you like if you are using another platform.