Read a Text file which has huge line (more than 1024 characters) using VBA macro

2.3k views Asked by At

I need to pick a particular data from a text file. But this text file has data which is more than 1024 characters on a single line.

For eg: I need data between string text1 and text 2. My code takes only the first data between text1 & text2 in the huge line, and moves to next line. But previous huge line has multiple instances of text1 & text2. I am not able to get those data. Please help. Find below my code:

Sub Macro1()
  Dim dat As String
  Dim fn As String

  fn = "C:\Users\SAMUEL\Desktop\123\Source1.TXT" '<---- change here

  With CreateObject("Scripting.FileSystemObject").OpenTextFile(fn)
    Do While Not .AtEndOfStream
        dat = .Readline

        If InStr(1, dat, "text1", vbTextCompare) > 0 Then
           x = InStr(dat, "text1") + 8
           y = InStr(dat, "text2")
           Z = y - x

           MsgBox Mid(dat, x, Z)
        End If
    Loop
    .Close
  End With
End Sub

I want to pick the data between Text1 and Text2 to a specific cell. The data looks like "This is an Text1 awesome Text2 website. I like this Text1 website Text2." This is a huge data which I copied from a website. When I save in a Text file, one line of this web data is more than 4000 characters. So the line in text file ends at 1024 characters and data moves to next line that becomes 3 lines. But My macro takes first 1024 in string "dat" and moves to second line of web data, that means it skips all data after 1024 characters to 4000 characters. The data I want which exists between Text1 and Text2 could be anywhere in whole 4000 characters, But It will be in same pattern. It will never be like Text1...Text1...Text2..

2

There are 2 answers

0
brettdj On

Using a is a useful way to quickly replace all matches in a single shot, or work through each match (including multiple matches per line) as this sample does below.

  Sub DisappearingSwannie()
  Dim objFSO As Object
  Dim objFil As Object
  Dim objRegex As Object
  Dim objRegMC As Object
  Dim objRegM As Object
  Dim strIn As String
  Dim X
  Dim lngCnt As Long
  Dim fn As String
  fn = "C:\temp\test.TXT" '<---- change here

  Set objFSO = CreateObject("Scripting.FileSystemObject")
  Set objRegex = CreateObject("vbscript.regexp")
  Set objFil = objFSO.OpenTextFile(fn)
  X = Split(objFil.readall, vbNewLine)

  With objRegex
  .Global = True
  .Pattern = "text1(.+?)text2"
  End With

  For lngCnt = 1 To UBound(X)
  If objRegex.test(X(lngCnt)) Then
  Set objRegMC = objRegex.Execute(X(lngCnt))
  For Each objRegM In objRegMC
  Debug.Print "line " & lngCnt & " position:" & objRegM.firstindex
  Next
  End If
  Next

 End Sub
0
Ron Rosenfeld On

Here is a macro that looks in A1 and B1 for Text1 and Text2. It then allows you to pick a file to process and parses out the substrings from text1 to text2 inclusive. Finally, it splits them into chunks of no more than 1024 characters (ensuring that each chunk ends with a space, so as not to split words), and writes them into a series of rows in column A starting in A2.

Both the parsing of the substrings, and the breaking them up into 1024 character chunks, are accomplished using regular expressions. The "work" is done in VBA arrays as this is faster than going back and forth to the worksheet.

Since the length of a string variable can be approximately 2^31 characters, I doubt you will have any problem reading the entire document into a single variable and then processing it, instead of going line by line.

Since the macro has arguments, you will need to call it from another macro; or it should be trivial for you to change the code to allow different methods of input for text1 and text2.

There is no error checking.

If you do not want to include Text1 and Text2 in the results, a minor change in the regular expression pattern is all that would be required.

I used early binding so as to take advantage of the "hints" while writing the macro. This requires setting references as noted in the macro. However, it should be simple for you to change this to late binding if you wish.

You might also consider a modification so that the multi-row chunks are somehow differentiated from the single row chunks.

Enjoy

Option Explicit
'Set Reference to Microsoft Scripting Runtime
'Set Reference ot Microsoft VBScript Regular Expressions 5.5
Sub ExtractPhrases(Text1 As String, Text2 As String)
    Dim FSO As FileSystemObject
    Dim TS As TextStream
    Dim FN As File, sFN As String
    Dim RE As RegExp, MC As MatchCollection, M As Match
    Dim RE2 As RegExp, MC2 As MatchCollection, M2 As Match
    Dim sPat As String
    Dim S As String, sTemp As String
    Dim V() As Variant, vRes() As Variant
    Dim I As Long, J As Long, K As Long
    Dim C As Range
    Dim rRes As Range

'Get File path
With Application.FileDialog(msoFileDialogFilePicker)
    .AllowMultiSelect = False
    .ButtonName = "Process File"
    .Filters.Add "Text", "*.txt", 1
    .FilterIndex = 1
    .InitialView = msoFileDialogViewDetails
    If .Show = -1 Then sFN = .SelectedItems(1)
End With

'Read File into String variable
Set FSO = New FileSystemObject
Set TS = FSO.OpenTextFile(FileName:=sFN, IOMode:=ForReading, Create:=False)
S = TS.ReadAll

'Get results
Set RE = New RegExp
Set RE2 = New RegExp
With RE2
    .Global = True
    .MultiLine = False
    .Pattern = "(\S[\s\S]{1,1023})(?:\s+|$)"
End With
With RE
    .Global = True
    .IgnoreCase = True
    .Pattern = "\b" & Text1 & "\b([\s\S]+?)\b" & Text2 & "\b"
    If .Test(S) = True Then
        ReDim vRes(0)
        Set MC = RE.Execute(S)
        For I = 1 To MC.Count
            Set MC2 = RE2.Execute(MC(I - 1))
            ReDim V(1 To MC2.Count)
            For J = 1 To MC2.Count
                V(J) = MC2(J - 1).SubMatches(0)
            Next J
            ReDim Preserve vRes(UBound(vRes) + J - 1)
                For J = 1 To MC2.Count
                    K = K + 1
                    vRes(K) = V(J)
                Next J
        Next I
    End If
End With

vRes(0) = "Phrases"

'transpose vRes
ReDim V(1 To UBound(vRes) + 1, 1 To 1)
For I = 0 To UBound(vRes)
    V(I + 1, 1) = vRes(I)
Next I

Set rRes = Range("a2").Resize(rowsize:=UBound(V))
Range(rRes(1), Cells(Rows.Count, rRes.Column)).Clear
rRes = V


End Sub