Automatically pulling down graphframes on demand using --packages doesn’t work any more because Spark looks in every repository but the one that actually works to try to download it ⇒ boom. The PySpark shell pretty much falls over at that point with no Spark context.
Mark got it working for INFO 303 using the --py-files option, but I’m struggling to get this to work here. The PySpark shell isn’t starting up properly for some reason and again there’s no Spark context.
Hard to diagnose given it’s (deliberately) spread across two different containers. Note that rebuilding the PySpark container now takes forever under Alpine because it builds all the Python dependencies from source.
nigel.stanger
added a commit that referenced this issue
on 20 Sep 2021
Automatically pulling down graphframes on demand using
--packages
doesn’t work any more because Spark looks in every repository but the one that actually works to try to download it ⇒ boom. The PySpark shell pretty much falls over at that point with no Spark context.Mark got it working for INFO 303 using the
--py-files
option, but I’m struggling to get this to work here. The PySpark shell isn’t starting up properly for some reason and again there’s no Spark context.Hard to diagnose given it’s (deliberately) spread across two different containers. Note that rebuilding the PySpark container now takes forever under Alpine because it builds all the Python dependencies from source.