Add graphframes #1

Closed nigel.stanger opened this issue on 19 Sep 2021 - 0 comments

nigel.stanger commented on 19 Sep 2021

Automatically pulling down graphframes on demand using --packages doesn’t work any more because Spark looks in every repository but the one that actually works to try to download it ⇒ boom. The PySpark shell pretty much falls over at that point with no Spark context.

Mark got it working for INFO 303 using the --py-files option, but I’m struggling to get this to work here. The PySpark shell isn’t starting up properly for some reason and again there’s no Spark context.

Hard to diagnose given it’s (deliberately) spread across two different containers. Note that rebuilding the PySpark container now takes forever under Alpine because it builds all the Python dependencies from source.

nigel.stanger added a commit that referenced this issue on 20 Sep 2021
ba71b8b Revamped in line with Mark’s approach ...
nigel.stanger closed this issue on 20 Sep 2021
Labels

Priority
default
Milestone
No milestone
Assignee
nigel.stanger
1 participant