Changes RSS

**This is an old revision of the document!** ----

A PCRE internal error occured. This might be caused by a faulty plugin

====== Bacula backup server on Debian Lenny, with remote SQL server ====== This node is a REALLY REALLY incomplete scratch-space for my bacula-related node... ===== What is Bacula? ===== First of all, if you are reading this, I hope you have at least a minimal knowledge of what Bacula is. As in, at leas you know that is is a system for backup, recovery and verification of computer data. Hopefully, you also know that it is a scalable, enterprise-ready solution, and you are prepared for that. As with everything else that gets labeled 'enterprise', and even 'scalable', Bacula is a system that is split into several parts, and is highly configurable. This gives great flexibility, at the cost of being rather complex to set up compared to smaller, simpler systems. If you are looking to back up your workstation, and only that, bacula is probably not for you. The same is probably true if you are looking at doing backups for a small set of computers; say two-to-four. On the other hand, if you are planning on doing backups for a greater number of systems, across operating systems, and/or require dependable backup volume control, bacula is probably very well suited. If you are coming from a commercial Enterprise backup solution, you may be surprised (hopefully pleasantly) to see that setup of Schedules, Clients, Jobs and the like are done in text-based configuration files, rather than a point-and-click GUI (or cryptic command line console). ===== Bacula components ===== As mentioned, Bacula is split into several parts. The following figure tries to show the central components, with the arrows describing direction of command- and data flow initiation. {{:guides:bacup:bacula-components.png?450|}} ==== Director ==== Central to a bacula installation is the Director. Simply put, the Director is the Bacula server itself, the central component that implement scheduled tasks, control running of backups and restores, and handles messages and reporting. ==== Console ==== Console applications exist in a variety of flavours for Bacula. Common for them is that they allow administrators to communicate with the Director (and other components via Director) to show status, list information, manipulate storage pools, run jobs et cetera. Different from many other Enterprise solutions, the management console is only used for managment, reporting and maintenance, and not for configuration. Configuration of Bacula components is done in configuration files, while the Console allows the administrator to operate and manage the dynamic environment that result from 'static' base configuration. So, as an example, the definition of a Client and a Job definition for the Client is done in configuration files, and the resulting data stored when the Job is run is managed using the Console. Another example may be that a Storage and a Pool gets defined in configurations, whereas manipulation of Volumes assigned to the Pool and Storage will be managed though the Console. ==== File Server ==== The Bacula File Server is known also as File Daemon and Client program. This is the software installed on the machine to be backed up. The File Server is responsible for recieving commands from the Director, and sending data for backup to a Storage, or recieving data from a restore. The File Server handles reading and writing data from/to the file systems on the machine, along with file and security attributes associated with the data. The File Server is not responsible for defining what data to be backed up. This is part of the configuration done at the Director. ==== Storage ==== As the name should imply, a Storage Server (or Daemon) handles storage of backup data on storage systems. A singe Storage Server may be used by multiple Directors, and a single Director may use multipe Storage Servers, allowing for a very flexible and scalable solution. Storage Servers use actual storage directly, either as file-based storage in the filesystems available at the host running it, or as storage devices available at the host. File, tape, WORM, FIFOs and CD/DVD-rom are supported as storage types, and a large variety of autochanger systems are supported, especially for Tape-based storage. A single Storage Server can serve multiple Storage devices/definitions to the Directors and File Servers communicating with it. Since Bacula communication with the Storage Server is done using TCP/IP, this component can exist on any host accessible to Director and File Servers, including the same host. ==== Catalog ==== The Catalog component is not a separate program in and of it self, but is a central concept. As with all large-scale backup solutions, a large amount of meta- and index-data gets generated by Bacula. Dynamic data generated by an operating Bacula system needs to be stored. Examples may be indexes of files for a backup that has been run, the state of the media pools. Bacula saves this information in the Catalog. The Catalog gets stored in a relational database, an SQL database. Three different storage backends are available: SQLite, MySQL and PostgreSQL. The database may be stored locally, on the same host as the Director, or it may be stored on a remote database server (mysql/pgsql). In this guid, a remote MySQL server will be used. ===== About this guide ===== The goal of this document, is to end up with a system consisting of: * One Director server * One Catalog for the Director stored in a remote MySQL database * One Storage Server running on the same host as the Director * Two storage Devices: * A File-based storage * An autochanger, using [[mhvtl]] * Three Clients * The localhost / director server * A remote Debian system * A remote Windows system * A separation of (at least) Client-related configuration into smaller files. And since I called this section "About this guide", this should be the location where I say that I take no responsibility whatsoever for any results you may experience if trying to implement a Bacula system when/after reading this. ===== Package installation on our Director server ===== The version of bacula in the standard repositories for Debian Lenny is **really** old, 2.4.4-1, compared to the latest stable, 5.0.3. I will be using Lennny Backports to get a more recent version, 5.0.2-2... I chose to go with MySQL in this setup. I would prefer going with pgsql, but as I wanted to focus on bacula, not database administration, I took the "easy way out", seeing that I am more experienced as a MySQL DBA... First two basic tools: <code> apt-get -y install \ libterm-readkey-perl \ psmisc </code> Next, we'll add Lenny Backports to our APT sources, to be able to get a more recent version of bacula. <code> echo -e "\n\ndeb http://backports.debian.org/debian-backports lenny-backports main" >> /etc/apt/sources.list apt-get update </code> I also want future updates to be pulled from backports, to get security fixes and the like. So I added the following to /etc/apt/preferences <code> Package: * Pin: release a=lenny-backports Pin-Priority: 200 </code> Since I do not want the MySQL server to be installed on the server running Bacula Director, and I want to avoid pulling in too many "unneeded" packages, the option "--no-install-recommends" is added to the following command. If you prefer to use aptitude, replace this option with "--without-recommends". Note that the lenny-backports version of the packages will be pulled in, thanks to the "-t lenny-backports" option. This option is identical for apt-get and aptitude. <code> apt-get \ -t lenny-backports \ --no-install-recommends \ install \ bacula-director-mysql \ bacula-console \ bacula-doc bacula-fd \ bacula-sd \ bacula-sd-mysql </code> I already have a fairly well performing, and maintained database server, and I do not like the concept of "a new DB server for each app", the setup will be using my already existing database server. Unfortunately, dbconfig-common does not support comnfiguration of remote SQL servers for bacula. So when debconfig asks this: <code> Configure database for bacula-director-mysql with dbconfig-common? </code> ... the answer is **No** This leads to a different problem: The installation "fails", because the post-install script for bacula-director-mysql fails. This (should) be solved by doing a bit of configuration, and then coming back later to fix this with "apt-get -f install"((I have been notified that the database setup even fails if you are installing mysql-server as a dependency for local database use... This is because the mysql-server is not yet running when debconf tries to use dbconfig-common...)). ==== Database seeding ==== Before we can start setting up bacula, we need to set up our database. Because I chose to use a remote SQL server, and dbconfig-common is braindead and does not understand that concept, the database will have to be created and seeded with tables manually. This is strictly speaking a part of Configuration, but also so important for the elementary setup, that I will call it a part of installation. Unfortunately, the packages ships only with a shell-script to seed the tables, assuming that the database will be installed locally. Fortunately, the script is basically an SQL script wrapped with shell-commands. So, lets use that! <code> cp /usr/share/bacula-director/make_mysql_tables ~/bacula_init.sql vim ~/bacula_init.sql </code> Remove the top lines leading up to, but not including the line: <code> USE ${db_name}; </code> On that line, replace '${db_name}' with the actual database name that you'll be using. If you, as I, prefer to use InnoDB rather than MyISAM, add the following as the absolute top line of the file: <code> SET storage_engine=INNODB; </code> Next, go to the bottom of the file, and remove everything from (including) and below the line: <code> END-OF-DATA </code> The lines you remove should be the mentioned one, plus some "then - echo - else -echo - fi - exit" ... So, now that we have a SQL script to use, create the actual database on the database server, and grant fairly open permissions on the database to a user created for bacula. The following is not good practice, but it will get the job done. If you want more precise control, please do so when adding the grant, but also remember that you can easily modify the grant later on.... <code> ssh databaseserver mysql -u root -p mysql </code> <code> create database bacula_db; grant all privileges on bacula_db.* to 'bacula'@'backupserver' identified by 'password'; </code> Now that the database is created, and a user for bacula is created and granted permissions to use the database, it is time to fill the database. Get back to our bacula server, and load the SQL script. <code> mysql -h databaseserver -u bacula -p < ~/bacula_init.sql </code> ===== Configuration ===== Since Bacula is separated into different components that can live completely separate, configuration of these components are split into respecitve configuration files. Needless to say, these configuration files will relate to each other, enabling communication between the components. Here is an attempt at visualizing the relations: {{:guides:bacup:bacula-config-relations.png?500|}} The two most central configuration files, in my "backupserver"-oriented view, is the Storage Daemon config, bacula-sd.conf, and the Director conf, bacula-dir.conf. I started by getting to know the bacula-dir.conf file, and then started working with the configuration by setting up my Storage Daemon, so that I had my storage devices available. Before we dive into the configuration of Bacula, we should get an overview on what the Director configuration file contains, how it is sectioned, and how the sections relate to each other (and the surrounding world). {{:guides:bacup:bacula-dir-sections.png?500|}} I won't describe the sections in text here, so take some time examining the above figures until you feel you have a grasp on how the files and sections are related. :!: **Note** In the following configuration, note the following: * The hostname of the server hosting my Director and Storage Daemon is __bactank__ * The hostname of my database server is simply __database__ * The hostname of my Linux client is __linuxclient__ * The hostname of my Windows client is __windows__ * I will be using one FileSet for each client * On the host __bactank__ I will be excluding /opt completely, and store VTL files and File-based "volumes" under that directory. ==== Storage Daemon ==== My goals for the Storage Daemon is, as stated, to run it on the same host as the Director, and to provide two types of Storage though it: a File based storage, and a mhvtl autochanger/virtual tape library. Before progressing, you may want to take a quick look at my [[mhvtl]] description/guide, to get familiar with how the virtual library is represented as SCSI devices. A bit of work has been done for us by the Debian packages, so the configuration file for the SD is already prepared for communication from the Director. Most importantly, this means that proper password relationships have been set up. But, in my opinion, a lot of unneeded stuff is in there as well. I started with the debian-package-file, and stripped away all that I did not want, and added what I needed. The configuration of the Storage server is ''/etc/bacula/bacula-sd.conf''. It should start with defining the properties of the Storage Daemon it self: <code> Storage { Name = bactank-sd; SDPort = 9103; WorkingDirectory = "/var/lib/bacula"; Pid Directory = "/var/run/bacula"; Maximum Concurrent Jobs = 20; } </code> Here we assign the Storage Daemon a name, and tell it to listen on any interface/address, port 9103. We tell it to use /var/lib/bacula as a scratch/workspace, and finally that we do not want more than 20 concurrent jobs using this Daemon. Next, we need to set up a definition to allow the Director controlling the Storage: <code> Director { Name = bactank-dir; Password = "random-generated-password-identical-to-director-conf"; } </code> The Name needs to be identical to the Name that we will assign to our Director instance, and it gets auto-filled by the debian-packages as ''hostname-dir''. The Password will be auto-generated by the Debian-packages, and needs to be identical to the ''Password ='' statement in the Storage section of the Director config. The Debian configuration will also include a Director section for monitoring. Leave this in, I will not comment that part further, than saying that more than one system may control a Storage Daemon, though configured Director sections. I wanted a File-based backup resource. I will not really use this anywhere, but I an including it to show how to set one up. <code> Device { Name = FileStorage; Media Type = File; Archive Device = /opt/bacula-filestore; LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; } </code> No files will be created at this location before bacula actually uses this resource to create a volume and stores data to it. Also, according to my understanding, a new file (volume) will be created for each Job((I have not yet tested File-based storage, so I will probably come back and update this.)). I do not specify any sizes, allowing auto-labelling and Volume Management create and close the files as it sees needed. Next up, I add the four tape drives presented by [[mhvtl]]. I will simply list one of them, as the rest are identical with the exception of the Name and Device: <code> Device { Name = Drive-1; # Will be referenced as device name by Autochanger later. Drive Index = 0 # Index as reported by the changer, and as used by bacula Media Type = LTO-4; # Description of type of media. Archive Device = /dev/nst0; # Non-rewinding SCSI device AutomaticMount = yes; # when device opened, read it AlwaysOpen = yes; # Keep the device open until explicitly unmounted/released RemovableMedia = yes; # Well, duh ;) RandomAccess = no; # Tapes are by nature sequential AutoChanger = yes; # This device is part of an autochanger Hardware End of Medium = No; # Needed for proper operation on mhvtl Fast Forward Space File = No; # Needed for proper operation on mhvtl # Heed the warnings in the distribution file about tapeinfo and smartctl. Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'" } </code> :!: Note that the Media Type is descriptive, not technology based. Also note that if you have multiple changers with the same media, that do not share the media, you will need to make this different between the changers, or else Bacula may try to load a tape that belongs to one changer, into the other... Adding a pre- or postfix to the Media Type will make them "different" in the Media index, makin sure this does not happen. Since my "autochanger" has four drives, this needs to be repeated for all four of them. When that is done, we get to the Autochanger itself: <code> Autochanger { Name = MHVTL; Changer Device = /dev/sg4; Device = Drive-1; Device = Drive-2; Device = Drive-3; Device = Drive-4; Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"; } </code> Simple enough, this gives the autochanger a name that can be referenced by Director, what "physical" device this is, that Device definitions make up the attached drives of the changer, and finally the command to run for controlling it. Finally, we define that all messages generated by the Storage Daemon should be sent to the Director for processing/filtering/delivery: <code> Messages { Name = Standard; director = bactank-dir = all; } </code> ==== File Daemon for localhost ==== To be able to flesh out as much as possible of the Director config, I first want to have the File Server/Daemon used to make backups of the backupserver itself defined before starting the Director. So, let's attack ''/etc/bacula/bacula-fd.conf'' on ''bactank''. Here I start out with the definition of the Directors that is permitted to control the client. Just like in the ''bacula-sd.conf'' file, there is a Montior definition you can leave in, and focus on the actual Director instead: <code> Director { Name = bactank-dir Password = "random-generated-password-identical-to-director-conf" } </code> Again, the Name statement needs to match the Name given to the Director, and it still defaults to ''hostname-dir''. The password will be auto-generated by the Debian packages, and needs to match the Password definition in a relevant Client section of the Director configuration. Next is the definition of the FileDaemon itself: <code> FileDaemon { Name = bactank.example.com; # Name used in Director client config FDport = 9102; # where we listen for the director #FDAddress = 127.0.0.1; # If you want to close down to a single address. WorkingDirectory = /var/lib/bacula; # Where to "scratch and temp" Pid Directory = /var/run/bacula; Maximum Concurrent Jobs = 20; } </code> The Name can be anything you want, but must be identical in the Directors Client definition, and it will be used for associating data generated from this File Server with metadata in the Catalog. The default will be ''hostname-fd'', but I prefer more verbose naming. Note that the name must differ from the Name statement of your Storage Daemon and your Director, when these run on the same host. I have commented out the FDAddress statement, telling the FD to listen on any interface. This may be against your security policy, so feel free to lock it down to either the loopback address or a specific IP on the host. As with the Storage Daemon, we close off ''bacula-fd.conf'' with a definition of where to send Messages: <code> Messages { Name = Standard director = bactank-dir = all, !skipped, !restored } </code> This is a little more precise than that of the Storage. Here we say that all messages, except thise related to skipped files and restored files, should be sent to the director ''bactank-dir''. In this context, //skipped// means files not included in the backup because it was configured to be excluded, or skipped because they were not changed when doin incremental or differential backup. ==== Director ==== Now we come to the longest configuration file yet, the Director configuration, ''/etc/bacula/bacula-dir.conf''. I could not understand the organization oth the Debian-packaged version, so I have re-organized the file to better reflect the relations of the sections to each other. === Director itself === We start off with the natural top-most section, the definition of the Director itself. As you should have noticed in the above section, consistent Name for the Director is important, it is used not only to identify the Director, but also as a part of the Authentication-Authorization in communication between components. <code> Director { Name = bactank-dir; DIRport = 9101; # where we listen for UA connections # DirAddress = xxx.yyy.zzzz.www; # IP address to listen on, if needed QueryFile = "/etc/bacula/scripts/query.sql"; WorkingDirectory = "/var/lib/bacula"; PidDirectory = "/var/run/bacula"; Maximum Concurrent Jobs = 1; # Console password Password = "random-generated-password-used-by-console-connections"; Messages = Daemon; } </code> If you want the Director to only be available for Console applications on a given IP address, or even only from ''localhost'', use DIRAddress to lock this down. The Messages directive references a Name given to a Messages section in the same file. We'll get back to this one, but note that this differs from how we wrote Messages sections in the other files. in the other files, the Messages section typically describes what director should recieve what messages. On the Director, we'll be using Messages sections to actually do something with those messages. === Catalog === The Catalog is so central to the Director, that I put this section next: <code> Catalog { Name = StandardCatalog; # One director may have multiple Catalogs. DB Address = database.example.com; # What server to use, and DB Port = 3306; # What port to connect to. dbname = bacula; # The name of the SQL database to use user = bacula; # The username used when connecting password = "db_password"; # The password of the database user } </code> Hopefully that was relatively self-explanatory. The most common setup, is to use a single Catalog with a single Director. If your setup is LARGE, or you thing you need separate Catalog instanced for some other reason, please reference the official documentation. But before you go: it really is as simple as defining more blocks like the one above. === Messages === Messages are a fairly "used-by-all" element, so I put two sections defining two different behaviours next. First, the message delivery for the Daemon/Director , then the Standard resource that will be used for all other Messages: <code> Messages { Name = Daemon; mailcommand = "/usr/lib/bacula/bsmtp -h localhost -f \"\(Bacula\) \<bacula@bactank.example.com\>\" -s \"Bacula daemon message\" %r" mail = operator@example.com = all, !skipped console = all, !skipped, !saved append = "/var/lib/bacula/log" = all, !skipped } Messages { Name = Standard mailcommand = "/usr/lib/bacula/bsmtp -h localhost -f \"\(Bacula\) \<bacula@bactank.example.com\>\" -s \"Bacula: %t %e of %c %l\" %r" operatorcommand = "/usr/lib/bacula/bsmtp -h localhost -f \"\(Bacula\) \<bacula@bactank.example.com\>\" -s \"Bacula: Intervention needed for %j\" %r" mail = operator@example.com = all, !skipped operator = operator@example.com = mount console = all, !skipped, !saved append = "/var/lib/bacula/log" = all, !skipped } </code> Things to note in the above, when compared to the default configuration: * I specify a different __from-address__ than the recipient * I have changed the recipient to a more sane value * I prefer to have the bacula Daemon and Standard log in separate files. Other than that, the above is fairly stock. Make sure you read the rationale for replacing the from-address in the "NOTE!" block of the default configuration. === Storage === Our Storage servers and devices should follow next. <code> # Definition of file storage device Storage { Name = File; # Name to use when referencing this storage in -dir.conf Address = bactank; # N.B. Use a fully qualified name here SDPort = 9103; # Port to listen on. Use SDAddress if you want specific listen. Password = "random-generated-password-identical-to-sd-conf"; Device = FileStorage;# must be same as Device in Storage daemon Media Type = File; } # Definition of mhvtl autochanger Storage { Name = MHVTL; # Name to use when referencing this storage in -dir.conf Address = bactank; # N.B. Use a fully qualified name here SDPort = 9103; # Port to listen on. Use SDAddress if you want specific listen. Password = "random-generated-password-identical-to-sd-conf"; Device = MHVTL; # must be same as Device in Storage daemon Media Type = LTO-4; # must be same as MediaType in Storage daemon Autochanger = yes; # enable for autochanger device } </code> The comments say "N.B. Use a fully qualified name here", but in reality, anything that ends up with the IP that the Storage Daemon listens on can be used. Note that the Storage definition for the autochanger references the Autochanger as Device, not the individual tape drives. It is the responsibility of the Storage Daemon to represent the changer correctly. Also, remember the notes about Media Type when we configured the Storage Daemon. === Pools === A Pool is a collection of media/volumes, and it is natural to define these once we have the Storages defined. Bacula supports a quite "magical" pool; If a pool exists with the name Scratch, Empty and Recyclable volumes of the correct Media Type for a given Job present in this pool will be automatically moved to a Pool that needs additional tapes when Jobs are run. This means we can start by adding all our volumes to the Scratch Pool, and these tapes will be allocated as needed. By also adding the directive "RecyclePool = Scratch", volumes will be returned to this pool as soon as they get marked as Recyclable and subsequently Purged. <code> # Default pool definition Pool { Name = Default; # The name to reference this Pool Storage = MHVTL; # A pool uses a singe Storage. Pool Type = Backup; # Currently supported: Backup.. Recycle = yes; # Bacula can automatically recycle Volumes AutoPrune = yes # Prune expired volumes Volume Retention = 4 months # 1/3 year RecyclePool = Scratch # Move to this pool when markec Recyclable Cleaning Prefix = CLN # If cleaning tapes are available, they have this pfx. } Pool { Name = Monthly; # The name to reference this Pool Storage = MHVTL; # A pool uses a singe Storage. Pool Type = Backup; # Currently supported: Backup.. Recycle = yes; # Bacula can automatically recycle Volumes AutoPrune = yes # Prune expired volumes Volume Retention = 24 months # 2 years RecyclePool = Scratch # Move to this pool when markec Recyclable Cleaning Prefix = CLN # If cleaning tapes are available, they have this pfx. } # Scratch pool definition Pool { Name = Scratch Storage = MHVTL RecyclePool = Scratch Pool Type = Backup } </code> Volume Retention needs to be set fairly high, at least higher than any File or Job retention set in later sections. The retention periods define how long data about a given Volume/Job/File is to be kept in the Catalog, and as such, how much time will pass before a Volume Expires... Please look at the [[http://www.bacula.org/5.0.x-manuals/en/main/main/Catalog_Maintenance.html#SECTION004510000000000000000|Setting Retention Periods]] section of the Bacula manual for an explanation. === Schedules === I am a fan of using as few different Schedules in a backup solution as possible. Thus I define the absolutely needed Schedules as early on as possible aswell. <code> # When to do the backups # Do a full dump every sunday # Take a Differential (what changed since last Full) on Wednesdays # Take increments (what changed since last backup) the rest of the week. Schedule { Name = "WeeklyCycle" Run = Level=Full sun at 2:05 Run = Level=Differential wed at 2:05 Run = Level=Incremental mon-tue at 2:05 Run = Level=Incremental thu-sat at 2:05 } # In the Monthly Cycle the Pool gets overrided to use the Pool with a # much longer Volume retention period. # Every first sunday of the month, make a full backup, then do # Differential backup on Sunday the rest of the month. Schedule { Name = "MonthlyCycle" Run = Level=Full Pool=Monthly 1st sun at 3:05 Run = Level=Differential Pool=Monthly 2nd-5th sun at 3:05 } </code> ===== Tapes/volumes in the Media database / pools from an autochanger ===== <code> label pool=Scratch slots=1-22 barcodes </code> ===== Links, references, scratch ===== * http://www.bacula.org/en/?page=downloads * http://bacula.org/5.0.x-manuals/en/main/main/Bacula_Main_Reference.html * http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/bacula-25/bacula-5-0-3-backport-for-debian-lenny-107920/ * http://www.crazysquirrel.com/computing/debian/backup/bacula-on-debian.jspx * http://panyasan.wordpress.com/2008/03/02/using-bacula-for-a-distributed-backup-system-debian-etch/ * http://edin.no-ip.com/content/bacula-debian-sid-mini-howto * http://www.bacula.org/manuals/en/catalog/catalog/Installi_Configur_PostgreS.html * http://wiki.bacula.org/doku.php?id=sample_configs * http://www.bacula.org/5.0.x-manuals/en/main/main/Configuring_Director.html * http://lucasmanual.com/mywiki/Bacula * http://www.backupcentral.com/phpBB2/two-way-mirrors-of-external-mailing-lists-3/bacula-25/bacula-is-not-recycling-pruning-purging-automatically-95208/ * http://www.bacula.org/3.0.x-manuals/en/console/console/Bacula_Console.html * http://sites.google.com/site/linuxvtl2/ * http://backports.debian.org/Instructions/