Technical suggestion for reduced build time of larger databases

Hi,
I am afraid I am going to go full developer on this feature request. So please bear with me on this request as I believe this would be a powerful feature for Phocas.
Issue
Currently Phocas rebuilds all data from the streams it has been provided. With some of our databases, we could have data going back several years (consider closed periods) but is required for trend analysis. This means that rather than the database rebuilding only the open period data (volatile), Phocas rebuilds all data equally. The end result is a large build time, when some of the stored aggregations will never change. One approach is to split the database up to several smaller ones. However in the land of aggregated data, I consider the split should be done within the SAME database, not separate ones. The advantages are:

  1. You have a single reference point and can perform trending on a larger scale
  2. Allow the users to view all time aspects of the within the same database.
  3. Reduce development time on dashboards. If we split the database to several it means multiple dashboards for multiple time-frames and/or splits to geographically(country by country).

Solution
In MS SSAS you have the concept of partitions, this allows the developer to mark sections of the data as closed and no longer requires rebuilding. In basic terms, don’t build parts of the data that we know are never going to change. With this partitioning it would mean lengthy builds reduce to a fraction of there original time as you would not need to constantly build all the data.
How would this work
As Phocas GUI is very user friendly, I suggest we leverage the streams in the designer to handle this. For example (using two feeds split by fiscal year):
You have data from 2010 to today, you know your closed periods are 2010 - 2017, therefore you would have two streams:

Stream 1 - Open Periods(2018 - today)
Stream 2 - Closed Periods.(2010 - 2017)

On the Tab name for the stream we have an option to make it not to rebuild (sorry for crude image!):
image
The above states that “PandL” will be built nightly whereas “PandL2017” (contains data up to 2017) does NOT get rebuilt and when it is first loaded it remains with the saved aggregations.

Adding partitions to a cube (MS SSAS) is configured differently, however if we were to use streams in this manner the end user is in control of what gets processed. If the closed data stream DOES need to change (for example a new dimension is added) then we would check the box, rebuild and un-check after. Again in the control of the developer managing the cube.

I appreciate the above is a feature which would require effort to develop, but a worthy one. Please let me know what you think on this idea

Kind Regards,

Jon

7 Likes

Hi Jon

I think that this is a very good idea. Yesterday we had to run an exercise which involved data back to 2014, and the build time was really slow. As we had to run this for multiple customers one a at time it must have a had quite a negative effect on the servers response time for other users,

Kind regards

Dave

1 Like

Hi @david.mciver,

Thanks for the feedback. Can you do me a favour and give the initial post a “like”, so it gains more traction with the Phocas devs please?

Kind Regards,

Jon

Done J

1 Like

Hi Jon,

The majority of the build time is taken up with creating the indexed views on the transaction data, to make the runtime queries super-quick. We have done some R&D in the past where we selectively retained some of the already indexed data and only indexed new content. All of these indexes are based on the internal IDs of Dimensions, Measures etc. So any changes in structure would invalidate the history indexes, but worse, any changes to the Dimension rows would cause the indexes to be invalid as well. We could get around that by using the dimension code in the indexes instead of the ID, but the indexes would all be considerably bigger producing an overall reduction in performance coupled with a comprehensive rewrite to the entire query engine behind phocas.

In the end, for those customers with large volumes of data that were hitting the buffers with build times, we recommend using column store indexes instead. They can have some runtime performance issues, very much depending on the type of data being indexed, but on the whole they seem to work well. All MT hosted sites can access the CSI method from Designer;

This is also available on any on-premises sites running any edition of SQL2016 SP2 or later.

Hi @jason.law,

Thanks you for your technical answer and makes complete sense. We explored Columnstore indexes with our Phocas Consultant. We used Financial statements as the example (like my example in the initial post).

It is indeed much quicker to build and would be happy with that, however, it pushed the P&L run time of financial statement into minutes to return rather than seconds. So this is not really an option with Financial statements currently, although I do see the value of CSI on DB’s and a great option!

In the example I provided, we need to keep at least 5 years of financial statements data for management reporting. So we cannot reduce the database size of rows. Build time is becoming a concern for the business and if there are any other suggestions/updates regarding P&L Financial Statement performance where CSI could be used in conjunction it would be very beneficial.

Are there any performance updates on Financial Statements in the works? NOTE: FS DB is very quick to query, it is only an issue when using the Phocas P&L structured layout we have significant slow down.

FYI, The CSI option is not present on our designer. Only Phocas could configure it for us on a per database basis. We are an EDA customer. I apologise in advance if any of these questions can be answered away, my problem is if I don’t see the features present, I don’t know about it :slight_smile:

Kind Regards,

Jon Kemp

Hi Jon,
Just out of curiosity, what are the DB sizes in MB and row counts of the main transactional DB’s and build times?

It might be interesting to compare amongst us customers on the cloud.

Our stats this morning from Phocas > Administration > Databases (Select a DB) > Build Logs:

The fact we are using views to enable us to use rebates seems the biggest contributing factor, but I feel sure there are other factors.

Warm regards,

Richard