Attempting to copy a TSV file data into an MSSQL DB via bcp
from mssql-tools
package. Getting bcp error logs of the form:
#@ Row 1, Column 20: String data, right truncation @#
"1234" "ASDFG Z ZXCCVVB" "Physician" "ASDFG ZXCVB" "" "" "" "" "" "" "10" "" "" "" "" "" "1234" "1234" "Provider" "Y" "1234" "1234" "" "" "Person" "MD" "ASDFG N ZXCVB" "Active" "Internal" <NULL> "" "" "" "" "" "xxxxxx" "xxxxxx" "" "1234" "M" <NULL> "" "" "" "" "Y" "Y" "Y" "Y" <NULL> "" "" <NULL> <NULL> "" "" <NULL> <NULL> "" "" <NULL> "" "" "" "" "" "" "" "" <NULL> <NULL> 100110 "" "" "" "" "" "1" "" "" "" "" "" "" "" 4 "Y" <NULL> <NULL> <NULL> "" "" <NULL> "" <NULL> "" "" "" "MD" <NULL> "" "" "" <NULL> <NULL> <NULL> <NULL> "" <NULL> "" "" "1234" "" "" <NULL> "" "" <NULL> "1" "Y" "" "2" 1 1 "1" "1" 2019-10-01
"1234" "HEALTH CENTER" "" "WOMEN'S" "" "" "" "" "" "" "" "" "" "" "" "" "" "1234" "Provider" "Y" "1234" "" "" "" "Person" "" "HEALTH CENTER" "Active" "" "" "" "" "" "" "xxxxxx" "xxxxxx" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "1234" "" "" "" "" "" "" "" "" 1 1 "" "1" 2019-10-01
With bcp command...
RECOMMEDED_IMPORT_MODE="-c"
/opt/mssql-tools/bin/bcp "$TABLENAME" in "$EXPORT_STAGE/$TABLENAME/$TABLENAME.TSV" \
-S $DEST_IP \
-U $USER -P $PASSWORD \
-d $DEST_DB \
$RECOMMEDED_IMPORT_MODE \
-t "\t" \
-e $EXPORT_STAGE/$TABLENAME/$TABLENAME.bcperror.log
Notice that the first row error is related to column 20, corresponding to the "Y" character. Looking at the destination DB schema, I see that the type is varchar(1)
. Table DDl from
SELECT ORDINAL_POSITION, COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH
, IS_NULLABLE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'mytable'
ORDINAL_POSITION COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH IS_NULLABLE
1 PROV_ID varchar 18 NO
2 PROV_NAME varchar 254 YES
3 PROV_TYPE varchar 254 YES
4 PROV_ABBR varchar 254 YES
5 GL_PREFIX varchar 15 YES
6 RPT_GRP_ONE varchar 80 YES
7 RPT_GRP_TWO varchar 80 YES
8 RPT_GRP_THREE varchar 80 YES
9 RPT_GRP_FOUR varchar 80 YES
10 RPT_GRP_FIVE varchar 80 YES
11 RPT_GRP_SIX varchar 66 YES
12 RPT_GRP_SEVEN varchar 66 YES
13 RPT_GRP_EIGHT varchar 66 YES
14 RPT_GRP_NINE varchar 66 YES
15 RPT_GRP_TEN varchar 66 YES
16 IS_RESIDENT varchar 3 YES
17 USER_ID varchar 18 YES
18 EPIC_PROV_ID varchar 18 YES
19 REFERRAL_SRCE_TYPE varchar 66 YES
20 IS_VERIFIED_YN varchar 1 YES
21 SER_REF_SRCE_ID varchar 18 YES
22 UPIN varchar 30 YES
23 SSN varchar 192 YES
24 EMP_STATUS varchar 40 YES
25 STAFF_RESOURCE varchar 20 YES
26 CLINICIAN_TITLE varchar 100 YES
27 EXTERNAL_NAME varchar 80 YES
28 ACTIVE_STATUS varchar 20 YES
29 REFERRAL_SOURCE_TYPE varchar 66 YES
30 RECORD_TYPE numeric NULL YES
31 BILL_PROV_YN varchar 1 YES
32 BILL_UNDER_PROV_ID varchar 18 YES
33 SUP_PROV_ID varchar 18 YES
34 COUNTY_C varchar 66 YES
35 COUNTRY_C varchar 66 YES
36 OFFICE_PHONE_NUM varchar 50 YES
37 OFFICE_FAX_NUM varchar 25 YES
38 EMAIL varchar 128 YES
39 DEA_NUMBER varchar 30 YES
40 SEX varchar 66 YES
41 BIRTH_DATE date NULL YES
42 MEDICARE_PROV_ID varchar 12 YES
43 MEDICAID_PROV_ID varchar 12 YES
44 IS_PRIV_REVOKED varchar 1 YES
45 NURSE_EMP_ID varchar 18 YES
46 EPICCARE_PROV_YN varchar 1 YES
47 MEDS_AUTH_PROV_YN varchar 1 YES
48 ORDS_AUTH_PROV_YN varchar 1 YES
49 TRANS_INTF_USER_YN varchar 1 YES
50 PEER_REV_LAST_DATE date NULL YES
51 TAKING_NEW_PAT_YN varchar 1 YES
52 TAKING_WALKINS_YN varchar 1 YES
53 LAST_RECOMMENDED_DATE date NULL YES
54 BASE_COST numeric NULL YES
55 SURG_REC_POOL_YN varchar 1 YES
56 INSTRUMENT_TYPE_C varchar 66 YES
57 EQUIP_SERVICE_DATE date NULL YES
58 EQUIP_LASTSVC_DATE date NULL YES
59 CLM_POS_REQD_YN varchar 1 YES
60 DEFAULT_POS_CLM_YN varchar 1 YES
61 MODALITY_TYPE_C numeric NULL YES
62 MODALITY_YN varchar 1 YES
63 SUPERV_POOL_ID varchar 18 YES
64 SUPERV_POOL_NAME varchar 254 YES
65 FLASH_CARD_PRT_ROU varchar 255 YES
66 CTRL_SHEET_PRT_ROU varchar 255 YES
67 PIN_ID varchar 18 YES
68 PROV_ATTR_ID varchar 18 YES
69 ATTND_PRIM_PAGER varchar 50 YES
70 OO_OFFICE_FROM_DTE date NULL YES
71 OO_OFFICE_TO_DTE date NULL YES
72 DEF_DEPARTMENT_ID numeric NULL YES
73 CM_PHY_OWNER_ID varchar 25 YES
74 CM_LOG_OWNER_ID varchar 25 YES
75 RPT_GRP_ELEVEN_C varchar 66 YES
76 RPT_GRP_TWELVE_C varchar 66 YES
77 RPT_GRP_THIRTEEN_C varchar 66 YES
78 RPT_GRP_FOURTEEN_C varchar 66 YES
79 RPT_GRP_FIFTEEN_C varchar 66 YES
80 RPT_GRP_SIXTEEN_C varchar 66 YES
81 RPT_GRP_SEVNTEEN_C varchar 66 YES
82 RPT_GRP_EIGHTEEN_C varchar 66 YES
83 RPT_GRP_NINETEEN_C varchar 66 YES
84 RPT_GRP_TWENTY_C varchar 66 YES
85 HOSPITALIST_YN varchar 254 YES
86 DEF_DIVISION_C numeric NULL YES
87 DEF_PROVIDER_YN varchar 254 YES
88 PROV_REC_STATE_C numeric NULL YES
89 X_IP_ORD_PROV_C varchar 1 YES
90 PROV_START_DATE date NULL YES
91 PRACTICE_NAME_C numeric NULL YES
92 X_SUPERVISING_PROV varchar 254 YES
93 X_SUPERVISION_REQD varchar 50 YES
94 X_INPAT_DISCIPLINE varchar 18 YES
95 X_ED_PROVIDER varchar 1 YES
96 X_IS_EMPLOYED_CRNA varchar 254 YES
97 X_REVENUE_DEPT_ID numeric NULL YES
98 SURG_SCHED_OUT_YN varchar 1 YES
99 SURG_EQP_SVCDAT_YN varchar 1 YES
100 SURG_COST_TBL_ID numeric NULL YES
101 TEAM_LEADER_ID varchar 18 YES
102 TEAM_C numeric NULL YES
103 SUP_PROV_YN varchar 1 YES
104 EMPLOYED_CRNA_YN varchar 1 YES
105 IS_INTERPRETER_YN varchar 1 YES
106 DOCTORS_DEGREE varchar 254 YES
107 REVENUE_DEPT_ID numeric NULL YES
108 ENC_PROV_YN varchar 1 YES
109 OR_VLD_DT_OFST numeric NULL YES
110 OR_CHARGE_CODE_ID numeric NULL YES
111 RES_POOL_TYPE_C numeric NULL YES
112 PHARMACIST_YN varchar 1 YES
113 LAB_FAX_NUMBER varchar 254 YES
114 PROV_PHOTO varchar 120 YES
115 USE_DEPT_VT_LIM_YN varchar 1 YES
116 VERIFYING_PERSON_ID varchar 18 YES
117 DIRECTORY_INFO varchar 254 YES
118 DBC_EXT_POS_ID numeric NULL YES
119 EDI_CLM_ACTIVE_YN varchar 1 YES
120 PROV_CLM_PROC_STA_C numeric NULL YES
121 PAYEE_NUM_DEFAULT varchar 254 YES
122 SER_CLM_ID varchar 254 YES
123 MCD_PROF_CD_C numeric NULL YES
124 OP_ORD_PROV_YN varchar 1 YES
125 IS_SUP_PROV_REQ_C numeric NULL YES
126 PROVIDER_TYPE_C varchar 66 YES
127 EPRESCRIBING_YN varchar 1 YES
128 EP_FLAG_YN varchar 1 YES
129 SEX_C varchar 66 YES
130 ACTIVE_STATUS_C numeric NULL YES
131 REFERRAL_SOURCE_TYPE_C varchar 66 YES
132 STAFF_RESOURCE_C numeric NULL YES
133 REFERRAL_SRCE_TYPE_C varchar 66 YES
134 etl_date datetime NULL YES
Thus the character from the TSV should fit. Using cat -ne datafile.tsv
can see that the rows end with $
and soe preprocessing I do to the file is:
# remove header
sed -i 1d "$EXPORT_STAGE/$TABLENAME/$TABLENAME.TSV"
# remove DOS carriage return
sed -i "s/\r//g" "$EXPORT_STAGE/$TABLENAME/$TABLENAME.TSV"
# append etl date to data records
sed -i "s/$/\t$(date +%Y-%m-%d)/" "$EXPORT_STAGE/$TABLENAME/$TABLENAME.TSV"
And the raw data file head looks something like (certain fields replaced with random nonsense values)...
"PROV_ID" "PROV_NAME" "PROV_TYPE" "PROV_ABBR" "GL_PREFIX" "RPT_GRP_ONE" "RPT_GRP_TWO" "RPT_GRP_THREE" "RPT_GRP_FOUR" "RPT_GRP_FIVE" "RPT_GRP_SIX" "RPT_GRP_SEVEN" "RPT_GRP_EIGHT" "RPT_GRP_NINE" "RPT_GRP_TEN" "IS_RESIDENT" "USER_ID" "EPIC_PROV_ID" "REFERRAL_SRCE_TYPE" "IS_VERIFIED_YN" "SER_REF_SRCE_ID" "UPIN" "SSN" "EMP_STATUS" "STAFF_RESOURCE" "CLINICIAN_TITLE" "EXTERNAL_NAME" "ACTIVE_STATUS" "REFERRAL_SOURCE_TYPE" "RECORD_TYPE" "BILL_PROV_YN" "BILL_UNDER_PROV_ID" "SUP_PROV_ID" "COUNTY_C" "COUNTRY_C" "OFFICE_PHONE_NUM" "OFFICE_FAX_NUM" "EMAIL" "DEA_NUMBER" "SEX" "BIRTH_DATE" "MEDICARE_PROV_ID" "MEDICAID_PROV_ID" "IS_PRIV_REVOKED" "NURSE_EMP_ID" "EPICCARE_PROV_YN" "MEDS_AUTH_PROV_YN" "ORDS_AUTH_PROV_YN" "TRANS_INTF_USER_YN" "PEER_REV_LAST_DATE" "TAKING_NEW_PAT_YN" "TAKING_WALKINS_YN" "LAST_RECOMMENDED_DATE" "BASE_COST" "SURG_REC_POOL_YN" "INSTRUMENT_TYPE_C" "EQUIP_SERVICE_DATE" "EQUIP_LASTSVC_DATE" "CLM_POS_REQD_YN" "DEFAULT_POS_CLM_YN" "MODALITY_TYPE_C" "MODALITY_YN" "SUPERV_POOL_ID" "SUPERV_POOL_NAME" "FLASH_CARD_PRT_ROU" "CTRL_SHEET_PRT_ROU" "PIN_ID" "PROV_ATTR_ID" "ATTND_PRIM_PAGER" "OO_OFFICE_FROM_DTE" "OO_OFFICE_TO_DTE" "DEF_DEPARTMENT_ID" "CM_PHY_OWNER_ID" "CM_LOG_OWNER_ID" "RPT_GRP_ELEVEN_C" "RPT_GRP_TWELVE_C" "RPT_GRP_THIRTEEN_C" "RPT_GRP_FOURTEEN_C" "RPT_GRP_FIFTEEN_C" "RPT_GRP_SIXTEEN_C" "RPT_GRP_SEVNTEEN_C" "RPT_GRP_EIGHTEEN_C" "RPT_GRP_NINETEEN_C" "RPT_GRP_TWENTY_C" "HOSPITALIST_YN" "DEF_DIVISION_C" "DEF_PROVIDER_YN" "PROV_REC_STATE_C" "PROV_START_DATE" "PRACTICE_NAME_C" "SURG_SCHED_OUT_YN" "SURG_EQP_SVCDAT_YN" "SURG_COST_TBL_ID" "TEAM_LEADER_ID" "TEAM_C" "SUP_PROV_YN" "EMPLOYED_CRNA_YN" "IS_INTERPRETER_YN" "DOCTORS_DEGREE" "REVENUE_DEPT_ID" "ENC_PROV_YN" "PHARMACIST_YN" "LAB_FAX_NUMBER" "RES_POOL_TYPE_C" "OR_VLD_DT_OFST" "OR_CHARGE_CODE_ID" "DBC_EXT_POS_ID" "OP_ORD_PROV_YN" "IS_SUP_PROV_REQ_C" "PROV_PHOTO" "USE_DEPT_VT_LIM_YN" "VERIFYING_PERSON_ID" "DIRECTORY_INFO" "EDI_CLM_ACTIVE_YN" "PROV_CLM_PROC_STA_C" "PAYEE_NUM_DEFAULT" "SER_CLM_ID" "MCD_PROF_CD_C" "PROVIDER_TYPE_C" "EPRESCRIBING_YN" "EP_FLAG_YN" "SEX_C" "STAFF_RESOURCE_C" "ACTIVE_STATUS_C" "REFERRAL_SOURCE_TYPE_C" "REFERRAL_SRCE_TYPE_C"
"1234" "ASDF Z XZXCV" "Physician" "ASD ZXCV" "" "" "" "" "" "" "10" "" "" "" "" "" "1234" "1234" "Provider" "Y" "1234" "1234" "" "" "Person" "MD" "ASDFG Z ZXV" "Active" "Internal" "" "" "" "" "" "xxxxxx" "xxxxxx" "" "1234" "M" "" "" "" "" "Y" "Y" "Y" "Y" "" "" "" "" "" "" "" "" "" "" "" "" "" "" 1234 "" "" "" "" "" "1" "" "" "" "" "" "" "" 4 "Y" "" "" "" "" "" "" "MD" "" "" "" "" "" "" "1234" "" "" "" "" "1" "Y" "" "2" 1 1 "1" "1"
"1234" "HEALTH CENTER" "" "WOMEN'S" "" "" "" "" "" "" "" "" "" "" "" "" "" "1234" "Provider" "Y" "1234" "" "" "" "Person" "" "HEALTH CENTER" "Active" "" "" "" "" "" "" xxxxxx" "xxxxxx" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "1234" "" "" "" "" "" "" "" "" 1 1 "" "1"
and here's the first line with hidden characters revealed in vi (via :set list
):
1234^IASFD^I ASDFG Z^IASDFG^IZXCV^I ASDFG^I^I^I^I^I^I^I10^I^I^I^I^I^I1234^I1234^IProvider^IY^I1234^I1234^I^I^IPerson^IMD^IADFG Z ZXCV^IActive^IInternal^I^I^I^I^I^I^Ixxxxxx^Ixxxxxx^I^I1234^IM^I^I^I^I^I^IY^IY^IY^IY^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I1234^I^I^I^I^I^I1^I^I^I^I^I^I^I^I4^IY^I^I^I^I^I^I^I^I^I^I^I^IMD^I^I^I^I^I^I^I^I^I^I^I^I^I1234^I^I^I^I^I^I^I1^IY^I^I2^I1^I1^I1^I1^I2019-10-01$
Can anyone help determine what could be going on here?
Like jamie and Alex said, BCP was interpreting the double-quotes as part of the text value and so fields that were intended to be single-chars, eg.
Y
were being read as 3-wide strings"Y"
and were thus overflowing the field in the destination DB (which in this case was expectingvarchar(1)
).* The extended conversation in the comments was helpful for various other problems in the data and I think still useful for others debugging BCP problems, but this double-quotes issue was the main problem for this question.