Migrate from Jupyter - persistent sql state wrt temp tables across notebook cells

Hello -

Our team is looking to migrate to Hue from users running jupyter on their own workstations, but we’re witnessing something that’s a blocker at this point.

We’re using to breaking up large and complex SQL queries across multiple notebook cells, and interleaving instructions and sanity checks throughout as markdown.

In testing Hue we’re seeing that each cell appears to be a new connection to the database, so per-connection temp tables are going away.

Is this expected behavior, or did we screw up some configuration along the way? Is this not even an appropriate use case for Hue notebooks?

We’re using the sqlalchemy drivers. We use both Redshift and SQL Server.

It should be only one new session per notebook, and when using SqlAlchemy one user is pointing to his own engine.

#1 Which version of Hue are you using and did you setup multiple servers? If yes, is the Load Balancer using sticky sessions? (not round robin, similarly to https://docs.gethue.com/administrator/administration/reference/#impala-and-hive-ha)

#2 Do you experience these issues when in regular Editor mode? (you can have multiple statements in the editor and execute them sequentially by moving the cursor selection)

Each execute statement does a SqlAlchemy connect() but this should be unrelated:

We’re using 4.8.0, as provided by AWS EMR’s 6.2.0 . No load balancing is in place.

Inside the cell’s editor, everything works fine. Just across cells it appears to be a new db connection.

I’m wondering if it has to do with how Hue is setup in an EMR environment - maybe each cell is treated as it’s own livy job.

If that’s the case, perhaps it’s better to just setup Hue independently if we want to use it this way?