sqoop将postgresql表导入hive表

博客分类：

sqoop

使用sqoop导入数据至hive常用语句直接导入hive表 sqoop import --connect jdbc:postgresql://ip/db_name--username user_name --table table_name --hive-import -m 5 内部执行实际分三部，1.将数据导入hdfs（可在hdfs上找到相应目录），2.创建hive表名相同的表，3，将hdfs上数据传入hive表中 sqoop根据postgresql表创建hive表 sqoop create-hive-table --connect jdbc:postgresql://ip/db_n ...

2012-11-20 13:54
浏览 7630
评论(0)
分类:编程语言

关于sqoop --split-by 及 -m的理解

博客分类：

sqoop

sqoop

场景： sqoop import --connect jdbc:postgresql://...../..... --username .... --query "select * from retail_tb_order_qiulp_test where status = 'TRADE_FINISHED' or status = 'TRADE_CLOSED' or status = 'TRADE_CLOSED_BY_TAOBAO' and \$CONDITIONS" --hive-import -m 6 --hive-table custom_analyse_db. ...

2012-11-20 13:51
浏览 3933
评论(2)
分类:编程语言

mysql时区问题

博客分类：

mysql

在做数据处理时，mongo导出的数据是CET（格林）时间，直接导入mysql后，时间没有转化成+8，有两种方便的方式进行转化： 1.select convert_tz(import_start,'+00:00','+8:00') from import_ods_log; 将+0时区转化成+8时区 2.将mysql time_zone设置成+8，那么查询的时间字段将会自动加8小时。第一种相对灵活，建议使用第一种。

2012-11-12 10:08
浏览 871
评论(0)
分类:数据库

hive left outer join where 条件问题

博客分类：

hive

hive join

select count(1) from s_ods_trade where part ='2012-10-31'; 22076 select count(1) from s_ods_trade 104343 select count(1) from s_ods_trade_full where part ='2012-10-31'; 11456 select count(1) from s_ods_trade_full 53049 SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_tr ...

2012-11-06 11:27
浏览 2054
评论(0)
分类:数据库

python访问mysql

博客分类：

python

python mysql ubuntu

参考 http://blog.csdn.net/chenyi8888/article/details/7601781 http://blog.csdn.net/daihui05/article/details/7266914 MySQL-python/1.2.4包下载地址： http://pypi.python.org/pypi/MySQL-python/1.2.4b5 我使用的ubuntu 安装mysqlclient

2012-08-29 17:07
浏览 930
评论(0)
分类:编程语言

sqoop导入数据至hive

博客分类：

sqoop

sqoop

使用sqoop导入数据至hive常用语句直接导入hive表 sqoop import --connect jdbc:postgresql://ip/db_name--username user_name --table table_name --hive-import -m 5 内部执行实际分三部，1.将数据导入hdfs（可在hdfs上找到相应目录），2.创建 ...

2012-08-29 11:20
浏览 45576
评论(2)
分类:编程语言

python获取当前昨天及所有时间

博客分类：

python

python

print datetime.date.today()-datetime.timedelta(days=29)

2012-08-29 11:18
浏览 979
评论(0)
分类:编程语言

几个经典sql

博客分类：

sql

sql 面试

几个经典的sql语句 1.关于group by的sql语句表结构： year month amount 1991 1 1.1 1991 2 1.2 1991 3 1.3 1992 1 ...

2012-08-13 23:20
浏览 900
评论(0)
分类:数据库

最短编辑距离

博客分类：

算法

public static int editDist(String s1,String s2){ int m=s1.length(); int n=s2.length(); int i=0,j=0; int[][] d=new int[m+1][n+1]; for(i=0;i<=m;i++){ d[i][0]=i; } for(j=0;j<=n;j++){ d[0][j]=j; } ...

2012-08-12 13:40
浏览 1077
评论(0)
分类:Web前端

三线程联系输出abc

博客分类：

多线程

线程

public class ThreadPrint { /** * @author my_corner * @param * @return * @throws InterruptedException */ public static void main(String[] args) throws InterruptedException { PrintTask task = new PrintTask(); Thread a = new Thread(task); ...

2012-08-11 16:03
浏览 783
评论(0)
分类:编程语言

两个有序list合并

博客分类：

算法

list 数组合并

public static List<Integer> merge(List<Integer> list1,List<Integer> list2){ List<Integer> list=new ArrayList<Integer>(); int size1=list1.size(); int size2=list2.size(); int i=0,j=0,k=0; while(i<size1&&j<size ...

2012-08-10 18:24
浏览 2695
评论(0)
分类:编程语言

最大子列

博客分类：

算法

最大子列

public static int maxSub(int[] arr) { int maxSum = 0; int currentSum = 0; for (int i = 0; i < arr.length; i++) { currentSum += arr[i]; if (currentSum > maxSum) { maxSum = currentSum; } else if (currentSum < ...

2012-08-10 17:22
浏览 840
评论(0)
分类:编程语言

关于mapreduce解析xml的方法

博客分类：

hadoop

mapreduce xml xmlinputformat

mapreduce的TextInputFormat很方便的处理行行的文本，但遇到xml的时候就很纠结了，曾经采用</property>分隔数据重写FileInputFormat（网上有资料），可以解决此问题，但会获取很多噪音数据。后来想到以起始<property>结束</property>来获取数据，重写FileInputFormat没有这个技术能力，呵呵。后来一直找资料，看了一篇http://www.linezing.com/blog/?p=489，可以借助mahout工程的XmlInputFormat.java很方便的解决问题。根据mahout的版 ...

2012-03-29 11:52
浏览 1703
评论(0)
分类:编程语言

如何在mapreduce方法中获取当前使用文件（get file name）

博客分类：

hadoop

hadoop mapreduce filename

使用的0.20.2版本hadoop 查了许久，如何在map方法中获取当前数据块所在的文件名，方法如下： //获取文件名 InputSplit inputSplit=(InputSplit)context.getInputSplit(); String filename=((FileSplit)inputSplit).getPath().getName();

2012-03-29 11:42
浏览 1832
评论(0)
分类:编程语言

如何提示mapreduce，查看systemout信息

博客分类：

hadoop

又折腾了大半天，只解决了一半的问题吧。已经解决部分：可以通过web查看运行job的systemout及其他信息访问地址http://ip:50030，找到相应job往下查即可。最后有stdout logs hello world!信息。即为systemout输出信息。通过查询hadoop官方文档，集群搭建部分可以看到logging部分，有关hadoop的日志配置信息。未解决部分：在map或者reduce函数上写io将一些必要信息进行落地成文本，可执行成功，但文件未能参见也未能写入信息，io具体写法如下： File dir = new File("/root/bin/ ...

2012-01-14 14:47
浏览 1226
评论(0)
分类:编程语言

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

sqoop将postgresql表导入hive表

关于sqoop --split-by 及 -m的理解

mysql时区问题

hive left outer join where 条件问题

python访问mysql

sqoop导入数据至hive

python获取当前昨天及所有时间

几个经典sql

最短编辑距离

三线程联系输出abc

两个有序list合并

最大子列

关于mapreduce解析xml的方法

如何在mapreduce方法中获取当前使用文件（get file name）

如何提示mapreduce，查看systemout信息

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

最近访客更多访客>>