I'm having trouble getting my Spark Application to ignore Log4j, in order to use Logback. One of the reasons i'm trying to use logback, is for the loggly appender it supports.
I have the following dependencies and exclusions in my pom file. (versions are in my dependency manager in main pom library.)
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
</dependency>
<dependency>
<groupId>org.logback-extensions</groupId>
<artifactId>logback-ext-loggly</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</dependency>
I have referenced these two articles:
Separating application logs in Logback from Spark Logs in log4j
Configuring Apache Spark Logging with Scala and logback
I've tried using first using (when running spark-submit) :
--conf "spark.driver.userClassPathFirst=true"
--conf "spark.executor.userClassPathFirst=true"
but receive the error
Exception in thread "main" java.lang.LinkageError: loader constraint violation: when resolving method "org.slf4j.impl.StaticLoggerBinder.ge
tLoggerFactory()Lorg/slf4j/ILoggerFactory;" the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current cl
ass, org/slf4j/LoggerFactory, and the class loader (instance of sun/misc/Launcher$AppClassLoader) for the method's defining class, org/slf4
j/impl/StaticLoggerBinder, have different Class objects for the type org/slf4j/ILoggerFactory used in the signature
I would like to get it working with the above, but then i also looked at trying the below
--conf "spark.driver.extraClassPath=$libs"
--conf "spark.executor.extraClassPath=$libs"
but since i'm passing my uber jar to spark submit locally AND (on a Amazon EMR cluster) i really can't be specifying a library file location that will be local to my machine. Since the uber jar contains the files, is there a way for it to use those files? Am i forced to copy these libraries to the master/nodes on the EMR cluster when the spark app finally runs from there?
The first approach about using the userClassPathFirst seems like the best route though.
So I solved the issue and had several problems going on.
So in order to get Spark to allow logback to work, the solution that worked for me was from a combination of items from the articles i posted above, and in addition a cert file problem.
The cert file i was using to pass into spark-submit was incomplete and overriding the base truststore certs. This was causing a problem SENDING Https messages to Loggly.
Part 1 change: Update maven to shade org.slf4j (as stated in an answer by @matemaciek)
Part 1a: the logback.xml
Part 2 change: The MainClass
Part 3 change:
i was submitting spark application as such (example):
So the above spark-submit failed on a HTTPS certification problem (that was when Loggly was being contacted to send the message to loggly service) because the rds-truststore.jks overwrote the certs without all certs. I changed this to use cacerts store, and it now had all the certs it needed.
No more error at the Loggly part when sending this