Tableau Prep provides various cleaning operations that you can use out of the box to clean and shape your data. Cleaning up dirty data makes it easier to combine and analyze your data or makes it easier for others to understand your data when sharing your data sets. Tableau Prep delivers a standard set of data roles that you can select from or you can create your own using the unique field values in your data set. When you assign a data role, Tableau Prep compares the standard values defined for the data role with the values in your field. Any values that don't match are marked with a red exclamation mark.
- With Tableau Prep Builder you can easily clean your data. In many organisations usually it's the data engineers and data scientists to take care of data preparation. Analyst interact with the data only after a process of data cleaning and preparation. Thanks to Tableau Prep, data cleaning has become quicker and easier.
- Learn how to use various features of the Data Connection window to tidy up messy Excel files. This short video shows you how to clean up Excel files and prep data containing text for use in Tableau. Data Prep with Text and Excel Files.
Tips & Tricks
What Is Tableau Prep Builder
Have you ever needed to do a little more in the way of data prep than what's provided in the Tableau Data Source tab? If you are not a SQL expert, how do you do the prep required to make your data Tableau friendly? In the past, have you resorted to dumping data to a CSV file or Excel and using Excel for 'cleaning' before bringing data into Tableau? What happens if you have multiple sources? Sure, some things can be done in Tableau Desktop using features such as cross-database joins and pivots, but it can be hard to generate repeatable steps that can be used to transform your data. And manual steps can be difficult to document and share with others. If these scenarios are painfully familiar to you, then Tableau Prep may be just what the (Tableau) doctor ordered.
Tableau Prep was announced under the name Project Maestro at the 2017 Tableau Conference and launched in April 2018. Under the new April 2018 licensing model, it comes with the Tableau Creator license (see Tableau Pricing). According to Tableau, 'Tableau Prep will make it possible for more people, from IT to business users, to easily prep their data with a direct and visual approach.'
I recently had the chance to take part in the beta program for Tableau Prep/Project Maestro. While maybe not as fully featured as some ETL (extract, transformation and load) tools like Informatica and Alteryx, Tableau Prep has some nice features that should make life easier for relatively simple ETL scenarios.
Here's a list of my 10 favorite features.
- Joining disparate data sources. For quite some time, one of Tableau's strengths has been its ability to join multiple data sources. Tableau Desktop lets you combine sources by 'joining' or 'blending.' Similarly, Tableau Prep lets you combine an Oracle Table, a SQL Server table and a Microsoft Excel worksheet into one data source with just a couple of clicks.
While some data prep can be done in Tableau Desktop's data source tab, there are limitations to what can be done. The main differences between Tableau's new Tableau Prep tool and data prep from within Tableau Desktopare in the presentation and in the number of options available.
One such difference is that you can connect to 70 different data sources in Tableau Desktop and in the first production release of Tableau Prep, you can connect to 28 data sources. See below:
Within Tableau Prep, you add connections to these sources. If the connection is a database, such as SQL Server, you specify the schema, the tables and which columns you want to bring in. Once you have your sources set up, you draw a line between them and add a join.
This functionality is similar to the way other ETL tools do joins. It should be pointed out that doing these joins is often easier said than done. When not dealing with simple data like Tableau's Sample Superstore, joins can be tricky. This is often due to differences in the level of detail, mismatched data, etc. But if you have a relatively simple scenario, joining disparate sources can be accomplished in Tableau Prep.
NOTE: Currently, the initial selection of schemas and tables is not quite as intuitive as with some data sources such as Oracle. As with Tableau Desktop, I expect every subsequent release will see improvement in this area.
- Join and union results. With Tableau Desktop you can join data and union data. When doing a join or union in Tableau Desktop, the bottom of the Data Source tab will show your result. But often you want to see the results of what joined (given your join condition) AND what didn't. In Tableau Prep, you get a Summary of Join Results that shows unmatched values. This feature can be useful for evaluating/debugging join conditions and for validating data.
See the bottom right corner for an example of Summary Join Results.
- Preview in Tableau Desktop. After you have done a 'step' to transform your data, you can use Preview in Tableau Desktop to look at the data in Tableau Desktop. This feature provides a quick way to validate the data produced by the step.
- Aggregate data. Tableau Desktop offers multiple ways to aggregate, or summarize, data. For example, you can create a Tableau extract and select Aggregate data for visible dimensions, or you can use sets or groups to aggregate data and summarizing into something like Total Sales by Region. But, if you just want a quick aggregation of your data to store in a data source or Tableau data extract (TDE), you can do this easily with Tableau Prep. You can add a step to aggregate and simply drag and drop your grouped fields and your aggregated fields. In the example below Discount, Profit, Quantity and Sales is grouped by Year of Sales, Region and City.
- Wildcard union. Sometimes you want to combine similar files into one data source (i.e. monthly sales files). In Tableau Desktop you can do a Union to accomplish this. Tableau Prep takes this to the next level by providing Wildcard Unions. With Wildcard Unions you specify a file path or directory and union all files that are in that location (irrelevant files can be excluded). You can then merge these files into one. The resulting merged file contains a File Paths column that contains the file path of the original source. When doing unions, you can also merge similar fields that have different names. See below for an example of a Wildcard Union:
- Pivot for database tables. In Tableau Desktop, you can pivot data in Excel or CSV files. If you want to re-structure, or pivot, data stored in a database such as SQL Server or Oracle, you have to create a new table or use custom SQL. In Tableau Prep, 'pivot' appears to be a valid option for database tables.
Below are examples of pivots using SQL Server and Oracle tables as data sources:
- Edit and clean data.This is big and has lots of use cases. Often when you bring data into Tableau, data integrity issues become apparent. Wouldn't it be nice to quickly clean your data bringing it into Tableau? Let's say most of your states were put in using a two-character capital abbreviation like AZ, but some states were fully spelled out. Ideally, you would fix data integrity issues at the source. But sometimes you just need a quick fix. With Tableau Prep you can now clean your data and edit values. See below for an example of edit values:
In Tableau Prep, you can create one step that does multiple 'cleaning' functions like filter, change data types, rename and remove fields. See below for a 'Fix Dates' step that performs multiple functions:
- Group and replace. Let's say you have the following values 'USA,' 'United States' and 'U.S.A.' You know these are all USA and should be grouped together. Currently, you can group these in Tableau Desktop or fix your data at the source. With Tableau Prep you also have an option to Group andReplace, saving your new grouping as part of your data source.
See below for an example (notice you even can do it by Pronunciation!):
- Data profile. In Tableau Desktop's Data Source pane, row level data is displayed at the bottom of the screen, but you can't see how the data is distributed. With Tableau Prep, the screen is divided into three panes: the top pane has a data flow or a graphical representation of the work flow, the middle pane has a data summary or profile and the bottom pane displays the row level data. The profile section in the middle has histograms to depict the frequency of values within columns, making it very easy to see how data is distributed.
See below:
- Graphical depiction of steps taken to transform data. As you can see above, steps taken to transform the data are graphically depicted and put into one self-documenting flow. You can also click on any of the steps to see what the data looks like at any given stage of transformation. This is what more complex ETL tools have been doing for years and it is really useful. These steps can then be shared and run as a 'flow,' which can be published as a Hyper, TDE or saved to a file.
CONCLUSION
Tableau Prep has some really good, time saving features that will allow you produce Tableau friendly data. It will be a good alternative to manually scrubbing data, using steps that often are not documented or repeatable. For simple transformation logic, Tableau Prep should do everything that is required. Even though Tableau Data Prep is good at creating Tableau Data Extracts and text-based files, often it can be better to build a database repository or data warehouse that can be leveraged by multiple reporting tools. For this type of work there are other ETL (extraction, transformation and load) tools that might better suit your needs. Tools such as Alteryx and Informatica have more data output options (i.e. database tables) and more capabilities when it comes to predictive modeling, statistical analysis, geospatial manipulation, mapping and valuable built-in demographic data for enhancing a dataset. At Senturus, we believe there is no one size fits all tool for data preparation. There is a 'right tool for the job' and we can help you determine what tool might best fit your needs.
It is a well known fact that data preparation is often 80% of the work when building out business analytics frameworks. For more complex data work, expert advice is often needed to make sense of the underlying data sources so they can be joined into a cohesive, well-designed data model that can be used by multiple reporting tools. At Senturus, we have been doing just that for nearly two decades. We make sense of what is complex by designing and building intuitive data structures that can be easily leveraged by tools such as Tableau.
This blog was submitted by our own Monica Van Loon. A frequent contributor to our blog, Monica is a Tableau certified consultant and teaches many of our Tableau workshops.
Senturus is a nationwide business analytics consulting firm and a Tableau partner. We were in no part solicited or paid for this review. The views and opinions expressed in this article are those of the author and do not necessarily reflect those of any other related party.
Data Prep
Related Pages
This topic describes how to enable Tableau Prep conductor on your existing installation of Tableau Server.
Tableau Prep Conductor is supported only on Tableau Server versions 2019.1 or later. If you are using Tableau Server 2018.3 or earlier, you must first upgrade your Tableau Server to 2019.1 before enabling Tableau Prep Conductor on your Tableau Server installation.
Tableau Prep Conductor is licensed through the Data Management Add-on, on a per Deployment basis, which is User-Based or Core-Based. A Deployment includes a licensed production Tableau Server installation and licensed non-production Tableau Server installations that support the production installation. For more information on Deployment, see the EULA Documentation(Link opens in a new window).
This topic describes how to enable Tableau Prep conductor on your existing installation of Tableau Server.
Before you upgrade
Prepare for upgrade:
Configure public gateway settings
If your Tableau Server is set up with one of the following:
Load balancer to distribute requests across gateways.
Reverse proxy to authenticate external (internet) client requests and offloading SSL-based encryption.
You must configure the following public gateway settings:
tsm configuration set -k gateway.public.host -v
(This should be the URL that your users are using to access Tableau Server)
tsm configuration set -k gateway.public.port -v 443
For more information on configuring gateway settings, see Configuring Proxies for Tableau Server(Link opens in a new window).
Tableau Server Installations using User-Based licenses
The recommended topology for a production Tableau Server installation is a dedicated node for running flows. For more information, see Minimum Hardware Requirements and Recommendations for Tableau Server(Link opens in a new window).
ManyCam Description. ManyCam is a freeware tool for you if you use both CamFrog and ICQ, or any other video chat program, and you want to use them simultaneously. It creates a 'vi. ManyCam Virtual Webcam allows you to use your Webcam with multiple programs at the same time. Broadcast your Webcam video simultaneously on MSN Messenger, Yahoo, Skype, AIM, PalTalk, and CamFrog. ManyCam also lets you to add cool live computer generated special. ManyCam is the go-to software to enhance your live video on streaming platform, video conferencing app and distant classes. Add multiple cameras and video sources, such as mobile and PowerPoint, use virtual backgrounds, create layers and presets, screencast desktop, and more.
Tableau Server single-node installations
If you currently have a single node Tableau Server installation, it is recommended that you add a second node and dedicate it to running flows.
Run upgrade on your current Tableau Server installation using the information in the topics below:
When you get to the Activate step, use the Tableau Server product keys to activate Tableau Server.
All product keys are available through the Customer Portal(Link opens in a new window).
After completing the installation, add the Data Management product key to enable Tableau Prep Conductor on your node. The Data Management product key, like your other server keys, are available through the Customer Portal(Link opens in a new window).
In the Tableau Services Manager web interface, click Licensing on the Configuration tab and click Activate License.
Enter or paste your new product key and click Activate.
On the Register page, enter your information into the fields and click Register.
You will be prompted to restart the server. Restart the server and verify that Tableau Prep Conductor is enabled and running.
In the Tableau Services Manager web interface, click the Status tab to see the status. If Tableau Prep Conductor is enabled and running, you should see Tableau Prep Conductor in the list of processes as Active. If Tableau Prep Conductor is not enabled, you will see Tableau Prep Conductor in the list of processes, but with no status information.
Tableau Prep Conductor not enabled:
Tableau Prep Conductor enabled and running:
- Add a second node to your Tableau Server installation. The installer will enable certain required processes like the Cluster Controller. Enable Backgrounder process on it as it is required to run scheduled flow tasks. When you enable the Backgrounder process, the installer automatically enables a single instance of Data Engine and Tableau Prep Conductor on the node. Do not add any other processes on this node.
Run the following commands to dedicate this node to do only flow tasks. For more information on node roles, see Node Roles in Tableau Server(Link opens in a new window).
Get the nodeID for your dedicated node to see the list of services on each node:
tsm topology list-nodes -v
.Set the node role for the dedicated node using the nodeID that you got from running the command described above:
tsm topology set-node-role -n -r flows
.Apply the changes, and restart the server:
tsm pending-changes apply
.Review the status to ensure that all the processes are up and running and configured correctly:
tsm status -v
.
You have successfully added Tableau Prep Conductor to your Tableau Server installation.
Tableau Server multi-node installations
Run upgrade on your current Tableau Server Installation using the information in the topics below:
When you get to the Activate step, use the Tableau Server product keys to activate Tableau Server.
All product keys are available through the Customer Portal(Link opens in a new window).
After completing the installation, add the Data Management product key to enable Tableau Prep Conductor. Tableau Prep Conductor is automatically enabled on the nodes where you already have the Backgrounder process enabled. The Data Management product key, like your other server keys, are available through the Customer Portal(Link opens in a new window).
In the Tableau Services Manager web interface, click Licensing on the Configuration tab and click Activate License.
Enter or paste your new product key and click Activate.
On the Register page, enter your information into the fields and click Register.
You will be prompted to restart the server. Restart the server and verify that Tableau Prep Conductor is enabled and is running.
In the Tableau Services Manager web interface, click the Status tab to see the status of all the processes. If Tableau Prep Conductor is enabled and running, you should see Tableau Prep Conductor in the list of processes as Active. If Tableau Prep Conductor is not enabled, you will see Tableau Prep Conductor in the list of processes, but with no status information.
Tableau Prep Conductor not enabled:
Tableau Prep Conductor enabled and running:
Add a new node to your Tableau Server installation. The installer will enable certain required processes like the Cluster Controller. Enable Backgrounder process on it as it is required to run scheduled flow tasks. When you enable the Backgrounder process, the installer automatically enables a single instance of Data Engine and Tableau Prep Conductor on the node. Do not add any other processes on this node.
Note: The dedicated note counts towards the total count of the Coordination Service ensemble. You may need to deploy a Coordination Service on the new node depending on the total number of nodes you have in your cluster including the new dedicated node. For more information, see Deploy a Coordination Service Ensemble(Link opens in a new window).
Run the following command to dedicate this node to only doing flow related operations. For more information on node roles, see Node Roles in Tableau Server(Link opens in a new window).
- Get the nodeID for your dedicated node to see the list of services on each node:
tsm topology list-nodes -v
.- Set the node role for the dedicated node using the nodeID that you got from running the command described above:
tsm topology set-node-role -n -r flows
.- Apply the changes and restart the server:
tsm pending-changes apply
.- Review the status to ensure that all the processes are up and running and configured correctly:
tsm status -v
.
- At this stage, you may have Tableau Prep Conductor enabled on other nodes. By default, the Backgrounder process on a node performs all tasks of all types including flow tasks. To isolate Tableau Prep Conductor and flow tasks to only certain nodes, you can configure the Backgrounders to do one of the following:
To run only flow tasks:
tsm topology set-node-role -n -r flows
.To run all other tasks except flows:
tsm topology set-node-role -n -r no-flows
.
You have successfully added Tableau Prep Conductor to your Tableau Server installation.
Tableau Server Installations using Core-Based licenses
Data Prep Tableau Interview
The recommended topology for a production Tableau Server installation is a dedicated node for running flows. For more information, see Minimum Hardware Requirements and Recommendations for Tableau Server(Link opens in a new window).
The Data Management Add-on for Core-Based licenses includes product keys that enable Tableau Prep Conductor for your Tableau Server, and Tableau Prep Conductor cores that comes in units of four. The Tableau Prep Conductor cores should be applied to the node dedicated to running the flows. These product keys, like your other server keys, are available through the Customer Portal(Link opens in a new window).
To learn more about Tableau Prep Conductor licensing, see Licensing Tableau Prep Conductor for Tableau Server(Link opens in a new window).
Tableau Server single-node installations
Data Prep Tableau
If you currently have a single node Tableau Server installation, it is recommended that you add a second node and dedicate it to running flows.
Run upgrade on your current Tableau Server Installation using the information in the topics below:
Activate the product keys. This will enable Tableau Prep Conductor on the nodes where you already have the Backgrounder process enabled. When you are using core-based licensing, you must apply both the Data Management product key and the Resource Core product key to your Tableau Deployment. The first key allows flows to be run on Tableau Server though the Tableau Prep Conductor and the second key adds the additional cores for the Tableau Prep Conductor nodes. All product keys are available through the Customer Portal(Link opens in a new window).
In the Tableau Services Manager web interface, click Licensing on the Configuration tab and click Activate License.
Enter or paste your new product key and click Activate.
On the Register page, enter your information into the fields and click Register.
You will be prompted to restart the server. Restart the server and verify that Tableau Prep Conductor is enabled and is running.
In the Tableau Services Manager web interface, click the Status tab to see the status. If Tableau Prep Conductor is enabled and running, you should see Tableau Prep Conductor in the list of processes as Active. If Tableau Prep Conductor is not enabled, you will see Tableau Prep Conductor in the list of processes, but with no status information.
Tableau Prep Conductor not enabled:
Tableau Prep Conductor enabled and running:
Add a second node to your Tableau Server installation. The installer will enable certain required processes like the Cluster Controller. Enable Backgrounder process on it as it is required to run scheduled flow tasks. When you enable the Backgrounder process, the installer automatically enables a single instance of Data Engine and Tableau Prep Conductor on the node. Do not add any other processes on this node.
Important: The number of physical cores on this machine must be equal to, or less than the Tableau Prep Conductor cores you purchased. For example, if you purchased four Tableau Prep Conductor cores, your node can only have up to four physical cores. To understand about how Tableau Prep Conductor licensing works, see Licensing Tableau Prep Conductor for Tableau Server(Link opens in a new window).
Run the following commands to dedicate this node to only doing flow tasks. For more information on node roles, see Node Roles in Tableau Server(Link opens in a new window).
Get the nodeID for your dedicated node to see the list of services on each node:
tsm topology list-nodes -v
.Set the node role for the dedicated node using the nodeID that you got from running the command described above:
tsm topology set-node-role -n -r flows
.Apply the changes and restart the server:
tsm pending-changes apply
.Review the status to ensure that all the processes are up and running and configured correctly:
tsm status -v
.
Anonymize Data Tableau Prep
You have successfully added Tableau Prep Conductor to your Tableau Server installation.
Tableau Server multi-node installations
Download Tableau Prep
Run upgrade on your current Tableau Server installation using the information in the topics below:
Activate the product keys. This will enable Tableau Prep Conductor on the nodes where you already have the Backgrounder process enabled. When you are using core-based licensing, you must apply both the Data Management product key and the Resource Core product key to your Tableau Deployment. The first key allows flows to be run on Tableau Server though the Tableau Prep Conductor and the second key adds the additional cores for the Tableau Prep Conductor nodes. All product keys are available through the Customer Portal(Link opens in a new window).
In the Tableau Services Manager web interface, click Licensing on the Configuration tab and click Activate License.
Enter or paste your new product key and click Activate.
On the Register page, enter your information into the fields and click Register.
You will be prompted to restart the server. Restart the server and verify that Tableau Prep Conductor is enabled and is running.
In the Tableau Services Manager web interface, click the Status tab to see the status. If Tableau Prep Conductor is enabled and running, you should see Tableau Prep Conductor in the list of processes as Active. If Tableau Prep Conductor is not enabled, you will see Tableau Prep Conductor in the list of processes, but with no status information.
Tableau Prep Conductor not enabled:
Tableau Prep Conductor enabled and running:
Add a new node to your Tableau Server installation. A dedicated node to run flow related operations is recommended for production Tableau Server installations. The installer will enable certain required processes like the Cluster Controller. Enable Backgrounder process on it as it is required to run scheduled flow tasks. When you enable the Backgrounder process, the installer automatically enables a single instance of Data Engine on the node. Do not add any other processes on this node.
Note: The dedicated note counts towards the total count of the Coordination Service ensemble. You may need to deploy a Coordination Service on the new node depending on the total number of nodes you have in your cluster including the new dedicated node. For more information, see Deploy a Coordination Service Ensemble(Link opens in a new window).
Important:
The number of physical cores on this machine must be equal to, or less than the Tableau Prep Conductor cores you purchased. For example, if you purchased four Tableau Prep Conductor cores, your node can only have up to four physical cores. To understand about how Tableau Prep Conductor licensing works, see Licensing Tableau Prep Conductor for Tableau Server(Link opens in a new window).Run the following commands to dedicate this node to only doing flow tasks. This will enable Tableau Prep Conductor on your new node. For more information, see Node Roles in Tableau Server(Link opens in a new window).
Get the nodeID for your dedicated node to see the list of services on each node:
tsm topology list-nodes -v
.Set the node role for the dedicated node using the nodeID that you got from running the command described above:
tsm topology set-node-role -n nodeID -r flows
.Apply the changes and restart the server:
tsm pending-changes apply
.Review the status to ensure that all the processes are up and running and configured correctly:
tsm status -v
.Select the data range (in this example, B5:B13 and E5:E13): And create a chart that you want. Excel combine 2 line graphs together. Combine two graphs, retain both graphs formatting. I have two graphs in Excel, both are on the same axes. Each graph would have taken a long time to make, selecting the correct data from a giant table, then coloring each data set in a specific color gradient. Another graph was made with different data on the same axes with a different. Highlight the second set of data, making sure to unhighlight the first set of data. Press 'Ctrl+c' to copy the information. Click on the graph and press 'Ctrl+v.' This should insert the second set of.
At this stage, you may have Tableau Prep Conductor enabled on other nodes that have the Backgrounder process. By default, the Backgrounder process on a node performs all tasks of all types including flow tasks. To isolate Tableau Prep Conductor and flow operations to only certain nodes, you can configure the backgrounders to do one of the following:
To run only flow tasks:
tsm topology set-node-role -n -r flows
.To run all other tasks except flows:
tsm topology set-node-role -n -r no-flows
.
Next step
Have you ever needed to do a little more in the way of data prep than what's provided in the Tableau Data Source tab? If you are not a SQL expert, how do you do the prep required to make your data Tableau friendly? In the past, have you resorted to dumping data to a CSV file or Excel and using Excel for 'cleaning' before bringing data into Tableau? What happens if you have multiple sources? Sure, some things can be done in Tableau Desktop using features such as cross-database joins and pivots, but it can be hard to generate repeatable steps that can be used to transform your data. And manual steps can be difficult to document and share with others. If these scenarios are painfully familiar to you, then Tableau Prep may be just what the (Tableau) doctor ordered.
Tableau Prep was announced under the name Project Maestro at the 2017 Tableau Conference and launched in April 2018. Under the new April 2018 licensing model, it comes with the Tableau Creator license (see Tableau Pricing). According to Tableau, 'Tableau Prep will make it possible for more people, from IT to business users, to easily prep their data with a direct and visual approach.'
I recently had the chance to take part in the beta program for Tableau Prep/Project Maestro. While maybe not as fully featured as some ETL (extract, transformation and load) tools like Informatica and Alteryx, Tableau Prep has some nice features that should make life easier for relatively simple ETL scenarios.
Here's a list of my 10 favorite features.
- Joining disparate data sources. For quite some time, one of Tableau's strengths has been its ability to join multiple data sources. Tableau Desktop lets you combine sources by 'joining' or 'blending.' Similarly, Tableau Prep lets you combine an Oracle Table, a SQL Server table and a Microsoft Excel worksheet into one data source with just a couple of clicks.
While some data prep can be done in Tableau Desktop's data source tab, there are limitations to what can be done. The main differences between Tableau's new Tableau Prep tool and data prep from within Tableau Desktopare in the presentation and in the number of options available.
One such difference is that you can connect to 70 different data sources in Tableau Desktop and in the first production release of Tableau Prep, you can connect to 28 data sources. See below:
Within Tableau Prep, you add connections to these sources. If the connection is a database, such as SQL Server, you specify the schema, the tables and which columns you want to bring in. Once you have your sources set up, you draw a line between them and add a join.
This functionality is similar to the way other ETL tools do joins. It should be pointed out that doing these joins is often easier said than done. When not dealing with simple data like Tableau's Sample Superstore, joins can be tricky. This is often due to differences in the level of detail, mismatched data, etc. But if you have a relatively simple scenario, joining disparate sources can be accomplished in Tableau Prep.
NOTE: Currently, the initial selection of schemas and tables is not quite as intuitive as with some data sources such as Oracle. As with Tableau Desktop, I expect every subsequent release will see improvement in this area.
- Join and union results. With Tableau Desktop you can join data and union data. When doing a join or union in Tableau Desktop, the bottom of the Data Source tab will show your result. But often you want to see the results of what joined (given your join condition) AND what didn't. In Tableau Prep, you get a Summary of Join Results that shows unmatched values. This feature can be useful for evaluating/debugging join conditions and for validating data.
See the bottom right corner for an example of Summary Join Results.
- Preview in Tableau Desktop. After you have done a 'step' to transform your data, you can use Preview in Tableau Desktop to look at the data in Tableau Desktop. This feature provides a quick way to validate the data produced by the step.
- Aggregate data. Tableau Desktop offers multiple ways to aggregate, or summarize, data. For example, you can create a Tableau extract and select Aggregate data for visible dimensions, or you can use sets or groups to aggregate data and summarizing into something like Total Sales by Region. But, if you just want a quick aggregation of your data to store in a data source or Tableau data extract (TDE), you can do this easily with Tableau Prep. You can add a step to aggregate and simply drag and drop your grouped fields and your aggregated fields. In the example below Discount, Profit, Quantity and Sales is grouped by Year of Sales, Region and City.
- Wildcard union. Sometimes you want to combine similar files into one data source (i.e. monthly sales files). In Tableau Desktop you can do a Union to accomplish this. Tableau Prep takes this to the next level by providing Wildcard Unions. With Wildcard Unions you specify a file path or directory and union all files that are in that location (irrelevant files can be excluded). You can then merge these files into one. The resulting merged file contains a File Paths column that contains the file path of the original source. When doing unions, you can also merge similar fields that have different names. See below for an example of a Wildcard Union:
- Pivot for database tables. In Tableau Desktop, you can pivot data in Excel or CSV files. If you want to re-structure, or pivot, data stored in a database such as SQL Server or Oracle, you have to create a new table or use custom SQL. In Tableau Prep, 'pivot' appears to be a valid option for database tables.
Below are examples of pivots using SQL Server and Oracle tables as data sources:
- Edit and clean data.This is big and has lots of use cases. Often when you bring data into Tableau, data integrity issues become apparent. Wouldn't it be nice to quickly clean your data bringing it into Tableau? Let's say most of your states were put in using a two-character capital abbreviation like AZ, but some states were fully spelled out. Ideally, you would fix data integrity issues at the source. But sometimes you just need a quick fix. With Tableau Prep you can now clean your data and edit values. See below for an example of edit values:
In Tableau Prep, you can create one step that does multiple 'cleaning' functions like filter, change data types, rename and remove fields. See below for a 'Fix Dates' step that performs multiple functions:
- Group and replace. Let's say you have the following values 'USA,' 'United States' and 'U.S.A.' You know these are all USA and should be grouped together. Currently, you can group these in Tableau Desktop or fix your data at the source. With Tableau Prep you also have an option to Group andReplace, saving your new grouping as part of your data source.
See below for an example (notice you even can do it by Pronunciation!):
- Data profile. In Tableau Desktop's Data Source pane, row level data is displayed at the bottom of the screen, but you can't see how the data is distributed. With Tableau Prep, the screen is divided into three panes: the top pane has a data flow or a graphical representation of the work flow, the middle pane has a data summary or profile and the bottom pane displays the row level data. The profile section in the middle has histograms to depict the frequency of values within columns, making it very easy to see how data is distributed.
See below:
- Graphical depiction of steps taken to transform data. As you can see above, steps taken to transform the data are graphically depicted and put into one self-documenting flow. You can also click on any of the steps to see what the data looks like at any given stage of transformation. This is what more complex ETL tools have been doing for years and it is really useful. These steps can then be shared and run as a 'flow,' which can be published as a Hyper, TDE or saved to a file.
CONCLUSION
Tableau Prep has some really good, time saving features that will allow you produce Tableau friendly data. It will be a good alternative to manually scrubbing data, using steps that often are not documented or repeatable. For simple transformation logic, Tableau Prep should do everything that is required. Even though Tableau Data Prep is good at creating Tableau Data Extracts and text-based files, often it can be better to build a database repository or data warehouse that can be leveraged by multiple reporting tools. For this type of work there are other ETL (extraction, transformation and load) tools that might better suit your needs. Tools such as Alteryx and Informatica have more data output options (i.e. database tables) and more capabilities when it comes to predictive modeling, statistical analysis, geospatial manipulation, mapping and valuable built-in demographic data for enhancing a dataset. At Senturus, we believe there is no one size fits all tool for data preparation. There is a 'right tool for the job' and we can help you determine what tool might best fit your needs.
It is a well known fact that data preparation is often 80% of the work when building out business analytics frameworks. For more complex data work, expert advice is often needed to make sense of the underlying data sources so they can be joined into a cohesive, well-designed data model that can be used by multiple reporting tools. At Senturus, we have been doing just that for nearly two decades. We make sense of what is complex by designing and building intuitive data structures that can be easily leveraged by tools such as Tableau.
This blog was submitted by our own Monica Van Loon. A frequent contributor to our blog, Monica is a Tableau certified consultant and teaches many of our Tableau workshops.
Senturus is a nationwide business analytics consulting firm and a Tableau partner. We were in no part solicited or paid for this review. The views and opinions expressed in this article are those of the author and do not necessarily reflect those of any other related party.
Data Prep
Related Pages
This topic describes how to enable Tableau Prep conductor on your existing installation of Tableau Server.
Tableau Prep Conductor is supported only on Tableau Server versions 2019.1 or later. If you are using Tableau Server 2018.3 or earlier, you must first upgrade your Tableau Server to 2019.1 before enabling Tableau Prep Conductor on your Tableau Server installation.
Tableau Prep Conductor is licensed through the Data Management Add-on, on a per Deployment basis, which is User-Based or Core-Based. A Deployment includes a licensed production Tableau Server installation and licensed non-production Tableau Server installations that support the production installation. For more information on Deployment, see the EULA Documentation(Link opens in a new window).
This topic describes how to enable Tableau Prep conductor on your existing installation of Tableau Server.
Before you upgrade
Prepare for upgrade:
Configure public gateway settings
If your Tableau Server is set up with one of the following:
Load balancer to distribute requests across gateways.
Reverse proxy to authenticate external (internet) client requests and offloading SSL-based encryption.
You must configure the following public gateway settings:
tsm configuration set -k gateway.public.host -v
(This should be the URL that your users are using to access Tableau Server)
tsm configuration set -k gateway.public.port -v 443
For more information on configuring gateway settings, see Configuring Proxies for Tableau Server(Link opens in a new window).
Tableau Server Installations using User-Based licenses
The recommended topology for a production Tableau Server installation is a dedicated node for running flows. For more information, see Minimum Hardware Requirements and Recommendations for Tableau Server(Link opens in a new window).
ManyCam Description. ManyCam is a freeware tool for you if you use both CamFrog and ICQ, or any other video chat program, and you want to use them simultaneously. It creates a 'vi. ManyCam Virtual Webcam allows you to use your Webcam with multiple programs at the same time. Broadcast your Webcam video simultaneously on MSN Messenger, Yahoo, Skype, AIM, PalTalk, and CamFrog. ManyCam also lets you to add cool live computer generated special. ManyCam is the go-to software to enhance your live video on streaming platform, video conferencing app and distant classes. Add multiple cameras and video sources, such as mobile and PowerPoint, use virtual backgrounds, create layers and presets, screencast desktop, and more.
Tableau Server single-node installations
If you currently have a single node Tableau Server installation, it is recommended that you add a second node and dedicate it to running flows.
Run upgrade on your current Tableau Server installation using the information in the topics below:
When you get to the Activate step, use the Tableau Server product keys to activate Tableau Server.
All product keys are available through the Customer Portal(Link opens in a new window).
After completing the installation, add the Data Management product key to enable Tableau Prep Conductor on your node. The Data Management product key, like your other server keys, are available through the Customer Portal(Link opens in a new window).
In the Tableau Services Manager web interface, click Licensing on the Configuration tab and click Activate License.
Enter or paste your new product key and click Activate.
On the Register page, enter your information into the fields and click Register.
You will be prompted to restart the server. Restart the server and verify that Tableau Prep Conductor is enabled and running.
In the Tableau Services Manager web interface, click the Status tab to see the status. If Tableau Prep Conductor is enabled and running, you should see Tableau Prep Conductor in the list of processes as Active. If Tableau Prep Conductor is not enabled, you will see Tableau Prep Conductor in the list of processes, but with no status information.
Tableau Prep Conductor not enabled:
Tableau Prep Conductor enabled and running:
- Add a second node to your Tableau Server installation. The installer will enable certain required processes like the Cluster Controller. Enable Backgrounder process on it as it is required to run scheduled flow tasks. When you enable the Backgrounder process, the installer automatically enables a single instance of Data Engine and Tableau Prep Conductor on the node. Do not add any other processes on this node.
Run the following commands to dedicate this node to do only flow tasks. For more information on node roles, see Node Roles in Tableau Server(Link opens in a new window).
Get the nodeID for your dedicated node to see the list of services on each node:
tsm topology list-nodes -v
.Set the node role for the dedicated node using the nodeID that you got from running the command described above:
tsm topology set-node-role -n -r flows
.Apply the changes, and restart the server:
tsm pending-changes apply
.Review the status to ensure that all the processes are up and running and configured correctly:
tsm status -v
.
You have successfully added Tableau Prep Conductor to your Tableau Server installation.
Tableau Server multi-node installations
Run upgrade on your current Tableau Server Installation using the information in the topics below:
When you get to the Activate step, use the Tableau Server product keys to activate Tableau Server.
All product keys are available through the Customer Portal(Link opens in a new window).
After completing the installation, add the Data Management product key to enable Tableau Prep Conductor. Tableau Prep Conductor is automatically enabled on the nodes where you already have the Backgrounder process enabled. The Data Management product key, like your other server keys, are available through the Customer Portal(Link opens in a new window).
In the Tableau Services Manager web interface, click Licensing on the Configuration tab and click Activate License.
Enter or paste your new product key and click Activate.
On the Register page, enter your information into the fields and click Register.
You will be prompted to restart the server. Restart the server and verify that Tableau Prep Conductor is enabled and is running.
In the Tableau Services Manager web interface, click the Status tab to see the status of all the processes. If Tableau Prep Conductor is enabled and running, you should see Tableau Prep Conductor in the list of processes as Active. If Tableau Prep Conductor is not enabled, you will see Tableau Prep Conductor in the list of processes, but with no status information.
Tableau Prep Conductor not enabled:
Tableau Prep Conductor enabled and running:
Add a new node to your Tableau Server installation. The installer will enable certain required processes like the Cluster Controller. Enable Backgrounder process on it as it is required to run scheduled flow tasks. When you enable the Backgrounder process, the installer automatically enables a single instance of Data Engine and Tableau Prep Conductor on the node. Do not add any other processes on this node.
Note: The dedicated note counts towards the total count of the Coordination Service ensemble. You may need to deploy a Coordination Service on the new node depending on the total number of nodes you have in your cluster including the new dedicated node. For more information, see Deploy a Coordination Service Ensemble(Link opens in a new window).
Run the following command to dedicate this node to only doing flow related operations. For more information on node roles, see Node Roles in Tableau Server(Link opens in a new window).
- Get the nodeID for your dedicated node to see the list of services on each node:
tsm topology list-nodes -v
.- Set the node role for the dedicated node using the nodeID that you got from running the command described above:
tsm topology set-node-role -n -r flows
.- Apply the changes and restart the server:
tsm pending-changes apply
.- Review the status to ensure that all the processes are up and running and configured correctly:
tsm status -v
.
- At this stage, you may have Tableau Prep Conductor enabled on other nodes. By default, the Backgrounder process on a node performs all tasks of all types including flow tasks. To isolate Tableau Prep Conductor and flow tasks to only certain nodes, you can configure the Backgrounders to do one of the following:
To run only flow tasks:
tsm topology set-node-role -n -r flows
.To run all other tasks except flows:
tsm topology set-node-role -n -r no-flows
.
You have successfully added Tableau Prep Conductor to your Tableau Server installation.
Tableau Server Installations using Core-Based licenses
Data Prep Tableau Interview
The recommended topology for a production Tableau Server installation is a dedicated node for running flows. For more information, see Minimum Hardware Requirements and Recommendations for Tableau Server(Link opens in a new window).
The Data Management Add-on for Core-Based licenses includes product keys that enable Tableau Prep Conductor for your Tableau Server, and Tableau Prep Conductor cores that comes in units of four. The Tableau Prep Conductor cores should be applied to the node dedicated to running the flows. These product keys, like your other server keys, are available through the Customer Portal(Link opens in a new window).
To learn more about Tableau Prep Conductor licensing, see Licensing Tableau Prep Conductor for Tableau Server(Link opens in a new window).
Tableau Server single-node installations
Data Prep Tableau
If you currently have a single node Tableau Server installation, it is recommended that you add a second node and dedicate it to running flows.
Run upgrade on your current Tableau Server Installation using the information in the topics below:
Activate the product keys. This will enable Tableau Prep Conductor on the nodes where you already have the Backgrounder process enabled. When you are using core-based licensing, you must apply both the Data Management product key and the Resource Core product key to your Tableau Deployment. The first key allows flows to be run on Tableau Server though the Tableau Prep Conductor and the second key adds the additional cores for the Tableau Prep Conductor nodes. All product keys are available through the Customer Portal(Link opens in a new window).
In the Tableau Services Manager web interface, click Licensing on the Configuration tab and click Activate License.
Enter or paste your new product key and click Activate.
On the Register page, enter your information into the fields and click Register.
You will be prompted to restart the server. Restart the server and verify that Tableau Prep Conductor is enabled and is running.
In the Tableau Services Manager web interface, click the Status tab to see the status. If Tableau Prep Conductor is enabled and running, you should see Tableau Prep Conductor in the list of processes as Active. If Tableau Prep Conductor is not enabled, you will see Tableau Prep Conductor in the list of processes, but with no status information.
Tableau Prep Conductor not enabled:
Tableau Prep Conductor enabled and running:
Add a second node to your Tableau Server installation. The installer will enable certain required processes like the Cluster Controller. Enable Backgrounder process on it as it is required to run scheduled flow tasks. When you enable the Backgrounder process, the installer automatically enables a single instance of Data Engine and Tableau Prep Conductor on the node. Do not add any other processes on this node.
Important: The number of physical cores on this machine must be equal to, or less than the Tableau Prep Conductor cores you purchased. For example, if you purchased four Tableau Prep Conductor cores, your node can only have up to four physical cores. To understand about how Tableau Prep Conductor licensing works, see Licensing Tableau Prep Conductor for Tableau Server(Link opens in a new window).
Run the following commands to dedicate this node to only doing flow tasks. For more information on node roles, see Node Roles in Tableau Server(Link opens in a new window).
Get the nodeID for your dedicated node to see the list of services on each node:
tsm topology list-nodes -v
.Set the node role for the dedicated node using the nodeID that you got from running the command described above:
tsm topology set-node-role -n -r flows
.Apply the changes and restart the server:
tsm pending-changes apply
.Review the status to ensure that all the processes are up and running and configured correctly:
tsm status -v
.
Anonymize Data Tableau Prep
You have successfully added Tableau Prep Conductor to your Tableau Server installation.
Tableau Server multi-node installations
Download Tableau Prep
Run upgrade on your current Tableau Server installation using the information in the topics below:
Activate the product keys. This will enable Tableau Prep Conductor on the nodes where you already have the Backgrounder process enabled. When you are using core-based licensing, you must apply both the Data Management product key and the Resource Core product key to your Tableau Deployment. The first key allows flows to be run on Tableau Server though the Tableau Prep Conductor and the second key adds the additional cores for the Tableau Prep Conductor nodes. All product keys are available through the Customer Portal(Link opens in a new window).
In the Tableau Services Manager web interface, click Licensing on the Configuration tab and click Activate License.
Enter or paste your new product key and click Activate.
On the Register page, enter your information into the fields and click Register.
You will be prompted to restart the server. Restart the server and verify that Tableau Prep Conductor is enabled and is running.
In the Tableau Services Manager web interface, click the Status tab to see the status. If Tableau Prep Conductor is enabled and running, you should see Tableau Prep Conductor in the list of processes as Active. If Tableau Prep Conductor is not enabled, you will see Tableau Prep Conductor in the list of processes, but with no status information.
Tableau Prep Conductor not enabled:
Tableau Prep Conductor enabled and running:
Add a new node to your Tableau Server installation. A dedicated node to run flow related operations is recommended for production Tableau Server installations. The installer will enable certain required processes like the Cluster Controller. Enable Backgrounder process on it as it is required to run scheduled flow tasks. When you enable the Backgrounder process, the installer automatically enables a single instance of Data Engine on the node. Do not add any other processes on this node.
Note: The dedicated note counts towards the total count of the Coordination Service ensemble. You may need to deploy a Coordination Service on the new node depending on the total number of nodes you have in your cluster including the new dedicated node. For more information, see Deploy a Coordination Service Ensemble(Link opens in a new window).
Important:
The number of physical cores on this machine must be equal to, or less than the Tableau Prep Conductor cores you purchased. For example, if you purchased four Tableau Prep Conductor cores, your node can only have up to four physical cores. To understand about how Tableau Prep Conductor licensing works, see Licensing Tableau Prep Conductor for Tableau Server(Link opens in a new window).Run the following commands to dedicate this node to only doing flow tasks. This will enable Tableau Prep Conductor on your new node. For more information, see Node Roles in Tableau Server(Link opens in a new window).
Get the nodeID for your dedicated node to see the list of services on each node:
tsm topology list-nodes -v
.Set the node role for the dedicated node using the nodeID that you got from running the command described above:
tsm topology set-node-role -n nodeID -r flows
.Apply the changes and restart the server:
tsm pending-changes apply
.Review the status to ensure that all the processes are up and running and configured correctly:
tsm status -v
.Select the data range (in this example, B5:B13 and E5:E13): And create a chart that you want. Excel combine 2 line graphs together. Combine two graphs, retain both graphs formatting. I have two graphs in Excel, both are on the same axes. Each graph would have taken a long time to make, selecting the correct data from a giant table, then coloring each data set in a specific color gradient. Another graph was made with different data on the same axes with a different. Highlight the second set of data, making sure to unhighlight the first set of data. Press 'Ctrl+c' to copy the information. Click on the graph and press 'Ctrl+v.' This should insert the second set of.
At this stage, you may have Tableau Prep Conductor enabled on other nodes that have the Backgrounder process. By default, the Backgrounder process on a node performs all tasks of all types including flow tasks. To isolate Tableau Prep Conductor and flow operations to only certain nodes, you can configure the backgrounders to do one of the following:
To run only flow tasks:
tsm topology set-node-role -n -r flows
.To run all other tasks except flows:
tsm topology set-node-role -n -r no-flows
.
Next step
Step 2: Configure Flow Settings for your Tableau Server.
Download Data Prep Tableau
Who can do this
Data Interpreter Tableau Prep
Tableau Server Administrators can install or upgrade Tableau Server, and enable Tableau Prep Conductor on Tableau Server.
Thanks for your feedback!