Very Long Build Times

Has any one else experienced very long build times for very large databases? It makes sense that the database’s size is correlated with the build time, but our build times are starting to become unbearably long (and sometimes don’t finish building by the start of the business day, leading to issues in meetings.) Our build times just keep getting larger as the database grows, too. We’ve gone from 3 hours to 5 hours in short order.

Is there a way to fix this or get around it? We’ve already taken our summarizing way down (we had had it set at 300 days, but this proved to be one major cause of our long build times.)

Are there other ways to shorten build times that don’t involve sacrificing data (and preferably not dimensions or properties either?) I may be asking for a miracle.

Also, what are the leading causes of long build times?

We have the same issue and it fails continuously. We are starting to become tired of it honestly. We are going to try one more thing, when they get time to address our ticket. We are going to split some of the older static data and not sync it anymore since it is “history”, Remains to be seen how it works out.

Hi,
We had challenges with large DB build times and does require a bit of out of the box thinking. To be fair to Phocas, the current platform does mean the entire DB needs to be rebuild (I raised this here as a suggestion about how they could look to reduce a while ago: Technical suggestion for reduced build time of larger databases). In the meantime, we have to work out how to reduce the demand for rebuild and sync times with what we have.

What I would suggest is the following (using a sales DB as an example):

  1. Have you asked Phocas to switch your specific DB to column store indexing?
    There is a trade-off with this, the size of DB and build time drops but the queries to will take a little longer to return. I have done this for the sales DB & Financial statements DBs and has proven successful, sometime the query is slower than my client likes, but the build time improvements outweighed the slight delay in query time.

  2. Split your syncs
    I have created 3 levels of sync for the invoicing which are:
    a. Historic (refreshed once a year), this houses all data to up to a CY20. These are done and dusted never to be touched again.
    b. Nightly (refreshed once a day), this houses all data from beginning of CY21 to 3 months ago. Shouldn’t change but just to be on the safe side refresh nightly.
    c. Intraday (refreshed every 2 hours), this houses all data from 3 months ago to now. These are the volatile records which could change.

Using these 3 syncs allows the sync time to be about 7 minutes for all new data and then the DB will start to build. So end to end the sales DB takes 1 hour to build with the major grunt work of data upload occurring during the night when everyone is asleep.

Honourable mentions

  1. Have you set your sync feeds to only upload data you are using?
    image
  2. Have you considered changing some of your dimensions to properties?
    This is a biggy, as dimensions are aggregated, properties are not. By swapping dimensions to properties you will get a net saving, for example one dimension I had which I changed to a property saved 10 minutes of build time.
  3. Period Types in the designer.
    How many period types do you have configured in the designer? By reducing this you can make real speed savings on the build times. For example when I first started out I had set a Fiscal Year Period type and a Calendar period type, end of the day this was overkill as Month and Year were enough and could manage fiscal year else where. I.e. this:
    DB designer:
    image
    In the “defined periods” within the database set up a fiscal year like this:

    So why have two calendars to aggregate against when the Month and Year would do the same thing.
  4. Pay close attention to your detailed build log.
    If you go to the designer view of the DB and click on the build tab, you can see a detailed load:

    What is useful here is looking how long your dimensional data takes to load, do you need to filter your dimensions down to reduce the time to build? I.e. if your fact data is restricted to one territory, do you need your dimensions to load all territory information, or could you restrict? Here is an example of a smaller sales database set to look only at Iberia (sorry for bad drawing!):

    The fact data is filtered to Iberia, so I reduced the dimension data to also only show Iberia, this saved around 5 minutes across all dimension builds. Your mileage may vary but just a suggestion that may help!
    Also, the build log may show you how long certain types of period are being built, I may make you consider other ways of getting the date range without needing that one (basically point 3 discussed above).

I think that’s all off the top of my head which may help. Good Luck!
Regards,

Jon

2 Likes

We have had similar experiences with large databases and build times, Jon’s feedback is on point. A few additional suggestions (for advanced users):

  1. Eliminate unused columns from the dataset either in a Sync View, or upstream in your ERP database/file export.

  2. If your database has multiple streams joining on various dimensions, verify/add a distinct dataset to the dimension and assign a primary key to validate the relationship. I often create a Sync View of a distinct single column of id’s.

  3. If possible, avoid using the Default Restrictions on the Database OR User Profile to filter specific pieces of data. **This one is huge, we’ve managed to reduce build times by 2.5 hrs on some databases by moving the filtering and default restrictions to the actual dataset and not in the build process. Instead, you could either:
    a. Use the filtering option on the Stream in Design Mode
    b. Apply the filter to the query upstream to filter the dataset in the ERP database/export.