LUX - The next level of EDA sophistication
“Exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as those that we believe to be there.” — John W Tukey
The importance of Data Visualization as a part of Exploratory Data Analysis, cannot be emphasized enough. But the hurdle is, sometimes the real goal of exploration by data visualization gets lost in the maze of mindful ideas and its implementation codes/tools. Though we have hundreds of visualization libraries, most of them require users to write a substantial amount of code for plotting even a single graph. This shifts the focus on the mechanics of the visualization rather than the critical relationships within the data.
Wait! Here comes the LUX, a python library, a tool that could simplify data exploration by recommending relevant visualizations to the user.
Current challenges to efficient data exploration:
In today's world, though we have the most advanced and powerful tools available, there are still challenges that hinder data exploration flow. This is especially true when we go from a question in our minds to discovering actionable insights. The major identifiable obstacles are:
The disconnect between code and interactive tools
Plotting requires lots of code and prior decisions
Trial-and-error is tedious and overwhelming
There is an apparent gap between how people reason and think about their data and what actually needs to be done to the data to get to these insights. Lux is a step to address these possible gaps.
Lux helps users explore and discover meaningful insights from their data by automating certain data exploration aspects. Lux features an intent language that allows users to specify their analysis intent in a sloppy manner, and it automatically infers the unspecified details and determines appropriate visualization mappings. The goal of Lux is to make it easier for data scientists to explore their data even when the user doesn’t have a clear idea of what they’re looking for.
interactive visualizations directly into Jupyter notebooks
a powerful intent language that allows users to specify their analysis interests to lower the programming cost.
it provides visualization recommendations of data frames automatically to users
How to Install
pip install lux-api #Activating extension for Jupyter notebook jupyter nbextension install --py luxwidget jupyter nbextension enable --py luxwidget
How to use
Importing the libraries
Now let's read and import a dataset publicly available from Github
Lux's nice thing is that it can be used as it is with the pandas data frame and doesn’t require any modifications to the existing syntax. For instance, if you drop any column or row, the recommendations are regenerated based on the updated data frame.
When we print out the data frame, we see the default pandas table display. We can toggle it to get a set of recommendations generated automatically by Lux.
We can toggle it to get a set of recommendations generated automatically by Lux, press the 'Toggle Pandas/Lux' button
The recommendations are organized by multiple different tabs, which represent potential next steps that users can take in their exploration.
The Correlation Tab: shows a set of pairwise relationships among quantitative attributes ranked by the most correlated to the least correlated one.
The Distribution Tab shows a set of univariate distributions ranked by the most skewed to the least skewed.
The Occurrence Tab shows a set of bar charts that can be generated for the categorical features from the data set.
The Temporal Tab shows a set of distribution charts for temporal features, if available in the data
This is the most interesting part of Lux. Beyond the basic recommendations, we can also specify our analysis intent. We can set the intent here as,
When we print out the data frame again, we can see that the recommendations are steered to what is relevant to the intent that we’ve specified.
df.intent = ['Ship Mode','Profit'] df
On the left-hand side in the image below, what we see is Current Visualisation corresponding to the attributes that we have selected. On the right-hand side, we have Enhance i.e. what happens when we add an attribute to the current selection. We also have the Filter tab which adds filter while fixing the selected variable.
Exporting visualizations from Widget
Lux also makes it pretty easy to export and share the generated visualizations. The visualizations can be exported into a static HTML as follows:
We can also access the set of recommendations generated for the data frames via the properties recommendation. The output is a dictionary, keyed by the name of the recommendation category.
Note: More features need to be explored and will be discussed in the coming articles. You can also look into the official documentation for more specific details.