动态分区问题,如果数据量大或者当动态分区大甚至只有十几个时,会出现如下异常:
2015-10-23 16:43:54,165 INFO [fetcher#10] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 20 of 34 to spark-03:13562 to fetcher#10 2015-10-23 16:43:54,166 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hive (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#9 2015-10-23 16:43:54,167 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#9 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56) at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:304) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:294) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
参考issue:
https://issues.apache.org/jira/browse/MAPREDUCE-6108
https://issues.apache.org/jira/browse/MAPREDUCE-6447
https://issues.apache.org/jira/browse/MAPREDUCE-6447
参数理解:
mapreduce.map.java.opts -xmx配置的 heap memory cloudera mapreduce.map.java.opts.max.heap 一般设置java.opts为memory.mb的75% mapreduce.reduce.java.opts -xmx配置的 heap memory cloudera mapreduce.reduce.java.opts.max.heap 一般设置java.opts为memory.mb的75% mapreduce.map.memory.mb 1G默认 mapreduce.reduce.memory.mb 1G默认 mapreduce.reduce.memory.totalbytes
mapreduce.reduce.shuffle.parallelcopies shuffle开启的fetcher线程数 apache默认5,choudera默认10
mapreduce.reduce.shuffle.input.buffer.percent 默认0.7
mapreduce.reduce.shuffle.memory.limit.percent默认0.25
如上3个参数相乘得小于1,否则将报如上错。将mapreduce.reduce.shuffle.parallelcopies调成5,可以解决此问题。
另外cloudera hive hive.stats.autogather默认为true,即插入数据时会优化统计,如此在大的动态分区时load数据后会有一段很长时间的统计,且操作hive元数据表,例如每个分区的文件数,行数等等。耗时比较长时可能会timeout,需要将其设成false。
相关推荐
Caused by: java.sql.SQLException: java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver at com.trs.gateway.commons.hive.HiveFeature.getConnection(HiveFeature.java:57) at ...
Caused by: java.sql.SQLException: java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver at com.trs.gateway.commons.hive.HiveFeature.getConnection(HiveFeature.java:57) at ...
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 解决方法 ...
hive 开发UDF 使用maven工程 引发jar包缺失 hive 开发UDF 使用maven工程 引发jar包缺失
错误: 代理抛出异常错误: java.rmi.server.ExportException: Port already in use: 1099; nested exception is: java.net.BindException: Address already in use: JVM_Bind 这里说的是1099端口被其它进程占用...
2)Caused by: java.lang.OutOfMemoryError: Java heap space 3)Current usage: 1.0 GB of 1 GB physical memory used;2.7 GB of 2.1 GB virtual memory used. Killing container 4)java.lang.RuntimeException:...
Hive错误之 Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask错误分析_xiaohu21的博客-CSDN博客.mht
由于上传文件限制,把文件拆为两个。...两个文件下载到同一个目录,解压缩后文件名: apache-hive-3.1.2-bin.tar.gz.a 改为 apache-hive-3.1.2-bin.tar.gz apache-hive-3.1.2-bin.tar.gz.zip 1/2
apache-hive-2.1.1-bin.tar apache-hive-2.1.1-bin.tar apache-hive-2.1.1-bin.tarapache-hive-2.1.1-bin.tar apache-hive-2.1.1-bin.tar apache-hive-2.1.1-bin.tarapache-hive-2.1.1-bin.tar apache-hive-2.1.1-...
apache-hive-2.3.8-bin.tar.gz下载自:https://mirrors.bfsu.edu.cn/apache/hive/hive-2.3.8/,因担心网址后续可能不再让下载,所以上传到CSDN备份
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 通过控制台的...
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:62) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:89) at org.apache.hadoop.hive.ql.io.orc....
为解决hive安全问题,重新编译hive源码升级jetty到9.4.24.v20191120
hive配置,hive-default.xml.template,大数据hive常用配置
hive-exec-1.2.1.spark2.jar spark2-shell 支持 hive2 hadoop3
打开压缩包得到apache-hive-2.3.9-bin.tar.gz
apache-hive-2.3.3-bin.tar.gz
Missing Hive Execution Jar: /hive/hive1.2.1/lib/hive-exec-*.jar
apache hive 2.3.4版本,开源.