Data Separation: Ideal and easy with FileMaker by Pam Rotella
A popular selling point of FileMaker database software is that it's easy enough for non-programmers to create simple databases on their own. These databases are usually slightly better than a spreadsheet, and their programmers may have any job title -- salesman, secretary, or librarian. Junior I.T. staff can also learn FileMaker techniques quickly, meaning that minimal time and effort are invested in learning new database software.
But that's the point. With minimal programming knowledge of FileMaker software, these homespun FileMaker databases, while functional, rarely follow any programming conventions or best practices. In my experience they more often follow no conventions at all, and are challenging for professional programmers to decipher later. Without exception, one practice I've never seen in user-created databases is data/logic separation.
After a database goes "live", methods must be considered and selected for releasing programming modifications and updates. When changes are made offline without data separation, hours are spent (and paid for, by the client or employer) on imports that could have been handled within a few minutes had the data file(s) been separated from empty interface, logic, and presentation files, and upgraded by simple replacement of the empty files. The alternative to frequent imports in non-separated files is risking data to program changes made and tested in files with "live" data.
With versions of FileMaker prior to 7 and 8, data separation wasn't an option. But after version 7, the separation of data, interface, presentation, and logic files became easy. Any data-containing FileMaker database can be added to the relationship graph of an empty FileMaker file, and the empty file treats the data file's tables much like native tables.
Adding a data file to another file's relationship graph only takes a couple of minutes at the outset of the design process. It does, however, create the need for a few design adjustments. For example, database users must have rights to the data file along with the empty interface files displaying its tables, they must understand how to navigate between files if necessary, and the menu and security system must be designed for a multiple file solution. For the programmer, there is the added step of switching to the data file when a field addition or calculation adjustment is needed, screens to find or query data must be built into the same files using that particular data table, and each database file's relationships can and probably should vary according to which relationships are needed and used by each file. But these design changes are handled with quick adjustments to programming style, growing into an easy habit if used regularly. Extra time spent on good methodology is typically offset by time saved with better organization and fewer imports.
For FileMaker databases originally designed as a one-file system, conversion to a data separated group of files can be quick, depending on the complexity of the interface file(s). A database can be copied and renamed to a data file, for example a file called "Application" copied with its data intact and renamed to a file named "Application_data." (Always back up a solution completely before starting programming changes, preferably changes should be made in an offline copy.) Then the "Application" file without the "_data" suffix (now the main interface file) can be backed up and emptied of all or most data (or cloned if no existing data is needed for program function), and references to itself in its own relationship graph changed to the new "Application_data" file. Probably the hardest part of this process, if the files are complex, is determining whether the interface file's programming is impacted by data removal -- for example, certain values may need to remain in global fields, tables may need at least one record for scripting to run properly, and the like.
FileMaker developers who don't regularly use data/logic/interface separation (referred to as "data separation" in the remainder of this article) tell me that the practice is "debated," that there are pros and cons to the practice like everything else.
Often I'm the person called to clean up FileMaker jobs gone bad, and frankly, I dread working on systems designed without data separation. Data separated solutions have rollouts that usually take only a few minutes and are less risky. To me, the amount of time wasted on data imports alone is justification to "fix" the all-in-one file design.
I've noticed that FileMaker development companies tend to use data separation, and that's probably because it makes upgrades and rollouts faster and easier. It's a strategy along the lines of data integrity and customer satisfaction. Would the client rather have a programmer sit around and babysit a data import all day, or work on new features? Is it safer for the data to remain in the same file as often as possible, or be transferred between files with each release?
With that said, we've all designed non-separated files, depending on factors such as the understanding of database users or administrators responsible for maintaining the final product. Some users are operating with a minimal skill level and would consider data separation impossibly complex. That doesn't mean that data separation is out of the question for such applications, but there are cases where the client may prefer a simpler design, despite any disadvantages it may cause later in the application's life cycle.
There are a few other situations where I agree that data separation isn't necessary. For example, at one of my recent jobs, a series of FileMaker applications developed by the company's accounting department served the purpose of custom reports not available from their main GL software. The FileMaker applications relied on external data imported at the time of each new report. (Relying on SQL tables dropped into the relationship graph wasn't desirable for reasons too involved to mention here, but may be the subject of a later paper.) For every report run, all previously imported data was deleted and then reimported from the company's main GL/accounting application, to ensure that only the most recent data were reflected in FileMaker-based reports. And so the data separation was unnecessary in those applications. Data always came from an external source and was not stored beyond the immediate reporting need, and so protecting the data from frequent imports just didn't matter.
In most applications, however, data does matter. Companies invest a huge amount of labor and money into collecting data and ensuring its accuracy. In those cases, my question to developers "debating" data separation is this: Do you want to lose data, or do you NOT want to lose data?
Alternative one: Working in "live" files
Without data separation, there are two options for making changes to files that are active and in use: programming in "live" files, or working in an offline copy and then importing data from the "live" application once changes in the offline version are completed and tested.
I've worked in "live" files before. I can respect the amount of time it saves for quick modifications, and understand the confidence of programmers who think they've become good enough to never make critical mistakes while "working live." But I've lost data when programming in live files, and the best programmers I've known have lost data working that way, too. If you're working in "live" files daily in a high-volume shop, you may not lose data today, this month, or even this year. But program in live files day after day, year after year, with various emergency requests and rush jobs mixed into your busy schedule, the boss running in and out of your office with his own big meeting documentation emergencies bumping everyone else's, and eventually a programming test will have that one script line out of order. That could be the day you'll lose data, or it may be on a quiet and productive day when you least expect something to go wrong.
Some programmers are OK with that -- working in live files saves time, and lowers data integrity risks that come with frequent imports. Retaining daily backups ensures that any data lost is less than one day's worth of entry or imports, which can often be recreated by database users.
Even so, large or small, data loss is still a disaster and a disgrace, and it's an embarrassment to admit to your company or client when it happens. It means that employee time is wasted on an error by the programmer that could have been prevented with more careful choices and procedures. In more severe cases, it could mean the loss of a job or project.
Alternative two: Frequent imports
Then there's the other option: programming in an offline copy, and importing data from "live" file(s) into the modified files after changes are tested and finalized. This is the most time-consuming option, and it doesn't come without risks. This technique means hours of extra time spent, and usually billed to the client, on imports rather than feature improvements. Some companies have files with millions of records, where imports turn into day-long affairs and may fail so often that the only viable choice is importing data in large groups.
For the most active database applications, a programmer may have to come into the company on a Sunday afternoon to import data at a time when workers aren't typically using the database. And occasionally, the programmer is still there on Monday morning as users come into work, apologizing for an import that didn't go well, telling everyone that they can't quite use the files yet. Obviously, this impacts customer satisfaction and could even lead to the end of the database project for that programmer and his company.
Frequent imports can be riskier than working in live files at times. Auto-update fields or serial numbers can be overlooked after an import no matter how rigid the checklist process. (Again, day after day, year after year, programmers become too comfortable with a process and don't always consult a checklist for every step). And if an error like this isn't caught before the files are released to the database users, auto-entered data and serial numbers can be duplicated or linked to the wrong records, a time-consuming problem to fix.
Data can also go corrupt with frequent imports, or the programmer may overlook selecting all records in each table before importing each, hence importing an incomplete dataset (the reason for one of the scripts I always put in finished data files, "PrepareForExport," which does nothing more than switch from table to table, selecting all records).
Automation helps with some of this process, and I've written scripts to auto-import data at more than one company. Yet in the end, frequent transfer of data between files is time-consuming and not without risks. The data separation model can minimize these risks by minimizing the need for imports.
Primary advantage and limitations
With data separation, smaller rollouts usually involve replacing empty interface files only. Changes are made to the empty or near-empty interface files offline, and then when tests run successfully and the improved application is ready to release, the programmer or administrator simply closes the solution's active files on the server and replaces the interface files -- or performs a small import before replacing them, for those who allow a small amount of data in the interface files. (I prefer to keep the interface files completely empty and simply replace them.) The entire process of replacing empty files is usually only a few minutes. If only all new product rollouts were so quick and easy!
This model lends itself to many smaller releases -- improvements can be released almost real-time, with minimal inconvenience to database application users. If the project lends itself to smaller rollouts, this could lead to increased customer satisfaction as users can utilize new features earlier and managers can see increased efficiency from their database investment right away.
Of course, there is the occasional complication of how to handle adjustments to the data file itself. If there are major changes, it may not be necessary to perform an import. In many cases, new tables and fields can be copied over to the active "live data" file from the programmer's new model, and the relationship graph can be updated manually, often in a few short minutes after users exit their files to allow these changes. Security and other upgrades must also be added manually. Careful note-taking helps to avoid missing changes to duplicate in the "live" file. Obviously, all changes in the "live" data file must match the offline data file exactly, and the fields must be added to the correct tables. Outages are often less than an hour for manual changes like this. That's if the changes are few enough to make the programmer feel confident that nothing will be missed.
For extremely complex solutions or a broader range of improvements to the data file before release, an occasional import to the new data file, with careful adherence to a complete checklist, may be the best option. This brings all the risks of imports listed previously, but the separation model will usually minimize the number of times that imports are needed, and the number of changes made to "live" files. It does not always eliminate the need for those techniques altogether.
In addition to making data less vulnerable to loss and damage, quick rollouts save the customer from costs of import after import, leaving the programmer's time to work on other problems and improvements. The speedy availability should help bolster customer satisfaction.
FileMaker development companies often program large and complex solutions, with many separate files for various purposes. Obviously, the minimum number of files for data separation is two.
Personally, for my smallest applications, I prefer a four-file model. To explain why my own minimum model is more than two files, I'll start with a story.
Only a couple of years ago, I applied for a job with a company that will remain anonymous, but I did like the idea of working for them because I occasionally use their products and respect their quality. But the match wasn't meant to be.
The I.T. Manager was a FileMaker novice, yet he was confident enough to design his own programming "test" for FileMaker development applicants -- a test that would have taken about 3 weeks if designed properly. At the time, I wondered whether he was in fact a legitimate employer with a real job, or just someone looking to gather programming samples for his staff.
I normally don't give an employer more than an hour of my time for a technical interview, but I did want the job, and so I spent a full day on the theoretical project to humor him. I went as far as I could in the day, and then explained a few things that I would have done differently had there been time. For example, I told him, one feature he wanted would normally require a separate file so that the user's found set (query) didn't change on the main screen while he or she executed find requests (queries) on the same table elsewhere.
The man commented that changing FileMaker's found set on one layout doesn't change the found set on other layouts. He considered my answer to be wrong, but in fact he was the one with the wrong answer.
FileMaker's found set WILL in fact change for all layouts based on the same table, within the same file. It has to work that way -- otherwise you couldn't write a script to have a user find records on one custom screen, and then switch to another screen where the found set is viewed and printed in the form of a report.
At the time, the only way to have different found sets for the same table simultaneously was to have different files open that referred to the same table. [NOTE: This characteristic was true for version 8, used by the prospective employer at the time. Somewhere between versions 8 and 11, FileMaker changed its features to allow different found sets when a new window of the same file is launched. There are pros and cons to the new window characteristic -- for those of us who would sometimes program a pop-up search screen or a little navigation bar to overlay preview mode, it takes more work to achieve the same result, or we need to purchase a screen management plug-in. It's annoying to users when they cancel a search pop-up by clicking the "x" on the title bar and the entire file closes as a result, which is what happens when pop-up screens must remain in the same file. A different found set in the same file may also be accomplished by making a duplicate instance of the needed table in the relationship graph with a separate name, and then programming a layout based on that instance. But again, that involves more work and often isn't practical in more complex applications. And of course the fact remains that even in later versions, switching to a different layout based on the same table in the same file will NOT change the found set.]
Having multiple files for the same data table involves dropping the data table into the relationship graph of multiple interface files, as described earlier.
I design a minimum 4-file solution for even the smallest databases. Large, complex databases may need more files, but I find that four files provide users with a minimum standard of flexibility. If, for example, the new database application is to be named "Application," these are the four files I would create:
The data file
The first file is always the data file, because all interface files are based on its tables and fields. For an application named "Application," my naming convention would be "Application_data." (Naming conventions are good in that they help us to recognize our programming features quickly and easily, both during the development cycle and after we return to our work months or years later. I may write more on naming conventions in a later paper.)
The "Application_data" file houses all or most data, and minimizes other programming features. In general, the fewer features in the "_data" file, the better. In a solution using data separation, the data file's purpose is to separate the data into a file with the simplest possible design, so that extensive changes to its screens and scripts are never needed. Complex data files lead to the need for imports, something this model was designed to avoid when possible. The rule to remember here is that any "extras" created may need modification later, and so keep it simple.
Relationships in the _data file can be limited to those needed for field calculations, layouts can be blank or display a warning to future programmers not to add features that may require future updates, scripts should be kept to the lowest minimum number. All of this helps minimize the need for frequent imports or altering files with "live" data, as discussed earlier.
The interface file
The only file without a descriptive extension, "Application," would be the primary user interface file. In my solutions, it is the main file seen and opened, typically the only file visible to end users. Its startup script opens the other 3 files and keeps them hidden until called by a menu option, or by an advanced user with sufficient rights to access them outside of the custom menu system. The "Application" file contains menus, user preferences, data entry/review screens, and sometimes other features as needed.
I've found through my years of experience that some database users spend an hours or days working in one screen, often adjusting the found set -- adding new records, not omitting others until a review is complete, performing manual standardization on the most troublesome data, and so on. A common complaint is that simple requests for rush reports destroy that found set (if any reports are based on the same table, and reports are generated in the same database file). But rush reports need to be done, emergency data updates need to be run, and a database user shouldn't have to spend hours recreating that special found set or put off rush jobs until the end of the day. Hence the two remaining files in this design standard.
The print reports file
The "Application_print" file houses most print report layouts... AND the screens to find data contained in those reports. Remember, the found set for a table changes WITHIN the same file. Records for print reports located within this file must by found or queried in the same file, from screens based on the same tables as their corresponding print layouts.
The print separation model means that a user can interrupt work on a data entry screen or elsewhere (not another print job in the same file, though, if based on the same table), run a report, and then return to their exact location on the original screen with their original set of records. It's a convenience that can save a user hours or days throughout the year, depending on work habits.
There may be a few print reports kept in the main interface file, so that users can print certain forms and reports for individual records or the data entry screen's special found set. By nature, certain types of forms will need to be printed as entered, individually or in batches, from the data entry screen. But the typical series of reports found in any company, for example the usual reams of financial reports requested by management, gross and net margin reports, payroll, company directories, and the like, are normally run on datasets found with a simple start and end date, and perhaps a few drop-down menu choices. A simplified search screen with a link to an advanced search screen is sufficient for most standardized reporting. (I often like to include a calculation field for report captions, so that standardized report choices are automatically displayed on most reports. That helps people to understand the data that's included on each report and the data that isn't, both immediately and years later. That way, data isn't thought to be a complete set when restrictions have been applied.)
FileMaker print report files often grow to be huge, sometimes with hundreds of report layouts and the scripts that drive them. I find that it helps to separate and organize print reports into their own file like this, apart from other types of layouts, as it makes locating them faster.
The scripts file
The "Application_script" file is probably the closest thing to a logic file in this design, but its purpose can vary. It can be used for imports, data processing, and other processes better separated from the other files. Often regular imports and data updates are driven by automated scripting, hence the naming convention used here.
It's handy to have a separate spot for data processing and imports. Consider a typical situation: The database administrator is steadily working in the main interface file, "Application" in this case, and he or she discovers that someone neglected to import a particular spreadsheet, an oversight that will throw off all totals. That import will change the found set if imported into the same table being used by the data entry screen. Or perhaps the user discovers that someone forgot to run updates on other data imports. Again, finding records with missing updates would alter the found set if performed in the same table used by the data entry screen.
The _script file allows limitless updates without changing the found set in the data entry and print reports files. There may also be cases where a script is launched from one of the other files, and by nature it needs to change the found set multiple times and perform calculations. The _script file can serve the purpose of multiple found sets run by a script, without changing the found set in the data interface file that launched its script for that purpose.
Although I prefer not to refer to the _script file as a "catch-all," it often ends up serving that purpose, and although it may seem unnecessary to the novice programmer, I've found through my many years of experience that it's often a critical addition. Its relationship graph can grow to the size of a kitchen table. Think of what's involved in importing and cleaning data -- matching names and other data to try to find the related ID automatically, adding IDs for various master tables not referenced in the typically minimal original data source, standardizing data, scripts to aid in auto-correcting errors, and the like. All of those relationships can and probably should be left in the _script file's relationship graph, but always properly labeled. The monstrous number of relationships may make it hard on the programmer, but the end users will probably save years of time from all of the brilliant automatic updates that the programmer wrote for them. And those updates need their own space, so that the user can jump from table to table without fear of losing the set on the data entry screen, OR the set displayed on the current print report.
FileMaker security overall isn't the purpose of this paper and is too large of a topic to cover here. However, a few details should be mentioned about security in data separated database applications.
Data separation or not, I provide a good menu system to guide users through the database application. That menu system can also be used to help novice FileMaker users navigate the multi-file system more easily, giving them options to switch between files even without understanding the actual steps to jump between files like advanced users.
I also have the _data file open to a layout based on a table called "Blank_Security" with no meaningful records. That way, inexperienced users with deletion rights don't accidentally delete unseen records when they come to a blank layout, thinking that nothing is there absent an obvious data entry screen. Then there may be a blank layout for every table so that the script to find all records before an import can switch to each and find all records, but users without rights to these screens can't access them from the _data file and so are restricted to the menu options in the interface files.
I also place warnings on my blank _data screens, so that users know that they should use a different file and future programmers know that adding an interface to the file will render the benefits of a data separation model more difficult or impossible.
Restricting the _data file to a blank screen with no records can be handled by file options and the startup script. (I'm an advocate of a "Startup" and "Shutdown" script in every file, even if they're blank until decisions on features are made.)
Returning the users to a blank screen in the main interface, _print, and _script files may not be necessary or desirable, depending on the user. Think back to the user who spends the day in the data entry screens, occasionally running a print report or importing a spreadsheet. It's easier to let the user switch between the files manually -- sometimes both are visible and it only takes a click of the mouse. Adding a button and script to the mix would accomplish the same thing, but an option would have to be included to return to the exact same screen and found set being used before the print request or other interruption. Power users would probably prefer both options.
Again, needs vary according to the purpose of the application and skill of various users. Different responses by the database can be designed according to users' rights, and programmed into scripts where necessary.
For some companies with many applications, the extra files used in the data separation model become an issue. Depending on the version of FileMaker Server and how the server is configured, there are a limited number of files that can be open on a FileMaker Server at any time. If the number is only 100 files, for example, then a four-file model would limit each server to only 25 applications running at any given time. Some companies would rather forego data separation than add additional FileMaker servers. I find this is rare, however, when a company's I.T. department has adequate funding and staff to handle additional servers. My experience in general is that servers are added all the time for a variety of purposes, and that I.T. management understands the advantages of data separation and would prefer to have the better design for its company's applications. A couple of thousand dollars now may save thirty thousand down the road.
Also, some programmers keep minimal amounts of data in their user interface files, but I try to avoid it. If at all possible, I'll have a script check for that one record or one value that's an absolute necessity, and create it if it doesn't exist, with the Startup script. That way an empty file can be copied over the old file, the fastest possible update, and the files will automatically fix themselves when opened. I prefer scripts that automatically fix things.
Another possible issue was covered in an article by Matt Petrowsky in FileMaker Magazine (and republished online elsewhere). Matt claimed that there was a recalculation issue when using portals with the separation model. Another article claims that when adding records in a portal based on a table that isn't native to that file, you have to click outside of the portal for it to recalculate. Actually, clicking outside of a field or leaving a field isn't uncommon for FileMaker, as it's one of the ways to post data (depending on the file's settings) -- a sort of "save as you go" characteristic. However, I created a simple data separated set of files to test for this problem with version 11, and did NOT find it to be true. Perhaps this problem was found in earlier versions, or he may have had a more complex type of calculation than I used for my test.
If a developer encounters calculation or other problems that seem to be related to data separation, he or she will need to make a decision on whether the characteristic lost by using separation is critical. If the application can't be delivered without it, perhaps screens needing the feature could be located in the _data file, with other screens located in the empty interface files. That way, the _data file's features are still kept to a minimum, but of course the need to import data or work in "live" files will increase with the complexity of the _data file in data separated file solutions. .
As a database programmer, I was originally impressed with FileMaker because, unlike other PC database products at the time, I could leave clients with an application that they were able to understand and run on their own. That is, I could REALLY leave, and never come back, until and unless the customer wanted new features. With other PC database products, the final application usually required a highly skilled user, or a technical staff on hand, or in some cases Visual Basic and other programs running on top of the database, just to run simple queries and meet users' needs on a day-to-day basis.
Take ease of use and add FileMaker's multi-platform, rapid development capability, and the client has one of the most powerful, flexible products on the market. Non-programmers can quickly and easily create a database with more power than a spreadsheet. I.T. staff can support the database with a minimal amount of research and guesswork. Management can understand how the database operates, or even create simple databases of their own.
Yet, when a professional database developer is called, the programming standards should become higher. Database users often don't understand risks associated with frequent data imports, or of designing and testing code with some complexity in files with "live" data. Professional developers know the risks and are responsible to take action to avoid or minimize them. Data separation is a friend to the multi-tasking power user, a good way to minimize data loss risks, and also enables quick, frequent upgrade rollouts to clients. In my view, data separation should become the preferred programming standard for professional FileMaker developers.
- Pam Rotella is a Senior FileMaker Developer with over 20 years of I.T./PC database programming experience, more than 15 of those years working with FileMaker. She frequently consults with major US corporations and mid-sized companies, and can be reached via e-mail at email@example.com.
DISCLAIMER: This site is not affiliated with FileMaker software company, but rather seeks to provide a forum for credible discussion on software use. Tips provided here are not intended as individual consultation or advice. All papers published here are the opinions of their respective authors, and are not necessarily endorsed by FileMakerPapers.com or Pam Rotella.