Разбор робота-пушкиниста. Часть вторая. Практика.

by Max Galkin

В этом посте я разбираю устройство робота-пушкиниста из прошлой записи. В первой части разбора было приведено краткое изложение первых лекций курса теории поиска. Здесь я привожу краткое введение в эф-шарп и некоторые удивительные заметки насчёт реализации робота.

Обратите внимание, я улучшил код робота, исправил некоторые ошибки и дописал комментарии, новый код лежит тут

 Посмотреть дерево стихов Пушкина

 

Хаскелл Карри

Хаскелл Карри не верит, что ты понимаешь монады

Конечно, тут вряд ли тут возможно дать сколь-нибудь полное описание эф-шарпа. К сожалению, я не могу порекомендовать и никакого ресурса, на котором бы легко и ясно давалось такое описание. Я покажу несколько примеров из кода робота, иллюстрирующих некоторые особенности языка, остальное желающие найдут сами с помощью любимой поисковой системы.

F# — функциональный (гибридный) язык со строгой статической типизацией для платформы .NET. Гибридный он потому, что поддерживает и объектно-ориентированную разработку, в частности, для взаимодействия со стандартными библиотеками фреймворка. F# также позволяет писать функции с побочными эффектами и иметь переменные, то есть не является чистым функциональным, как, к примеру, Haskell.

В качестве самого простого примера рассмотрим объявление служебной функции time, которая измеряет время выполнения другой функции (задаваемой в качестве параметра).

Ключевое слово let объявляет новое имя, в данном случае функцию time с двумя аргументами jobName и job. Обратите внимание, что мы не указываем типы аргументов, в большинстве случаев компилятор автоматически выводит эти типы, исходя из кода метода. Например, аргумент jobName мы печатаем через функцию printfn как строку, поэтому компилятор понимает, что это должна быть строка. Аналогично, аргумент job мы вызываем как функцию без параметров в строке 3, компилятор понимает, что это должна быть функция без параметров. Попробуйте навести курсором на идентификаторы в тексте программы и вы увидите подсказки с типами значений1. Здесь можно увидеть взаимодействие со стандартными типами .NET фреймворка (строки 2 и 4) и операцию приведения типа (строка 5).

1: let time jobName job =
2:     let startTime = DateTime.Now;
3:     let returnValue = job()
4:     let endTime = DateTime.Now;
5:     printfn "%s took %d ms" jobName (int((endTime - startTime).TotalMilliseconds))
6:     returnValueF# Web Snippets

 

Дальше — пример использования функции time и полный код основной программы. Здесь поочередно вызываются основные шаги поисковой системы, которые описаны в предыдущей части (сбор и обработка документов, индексация, запросы, вывод результатов) и на консоль выводятся измерения времени работы и размеров полученных структур. Здесь можно увидеть примеры создания анонимных функций с помощью лямбда-выражений (fun() -> crawlPoemsOrLoadFromCache) и конвейеризацию операций, где function1 |> fucntion2 всего лишь означает, что результат выполнения функции 1 будет передан на вход функции 2.

1: let poems = time "Crawling poems" (fun() -> crawlPoemsOrLoadFromCache)
2: printfn "Crawled %d poems" (poems |> Seq.length)
3: printfn "Crawled %d lines" (poems |> Seq.sumBy ( fun poem -> poem.Lines |> Seq.length))
4: let poemIndex = time "Indexing poems" (fun () -> poems |> indexPoems)
5: printfn "Index contains %d terms" poemIndex.Length
6: let tree = time "Generating result tree" (fun () -> createPushkinTree poems poemIndex 20)
7: printfn "Number of queries made to create tree: %d" (pushkinTreeToNumberOfQueries tree)
8: let htmlContent = time "Generating html content" (fun () -> resultsToHtml tree)
9: time "Output content" (fun() -> outputResultsToFile htmlContent)F# Web Snippets

 

Если вы наведете курсор на функцию crawlPoemsOrLoadFromCache, то вы увидите, что её возвращаемый тип это список объектов типа Poem. Тип Poem — пример реализации объектного типа на F#. Если вы сравните с прошлой редакцией кода программы, то увидите, что там была допущена ошибка в реализации этого типа. Я думал, что объявления членов класса (memberthis.Lines = …) вычисляют значение и записывают его в публичное поле объекта. Оказалось, что эти объявления создают вычисляемое проперти с заданным кодом (аналог get{…} в С#). После переноса вычислений в приватные поля и вывод уже их через публичные свойства запросы естественно ускорились на 20%.

В строках 5-8 приводится пример сравнения с шаблоном (pattern matching).

 1: type Poem(poemHref : string, title : string, lines : seq<string>) =
 2:     let parsedLines = [ for line in lines -> line.Replace("&nbsp;" , "") ]
 3:     let lineTokens = [ for line in lines -> regexMatches (line.ToLower()) "([а-я]+)" ]
 4:     let parsedTitle =
 5:         match title with
 6:         //Take first line as the title for untitled poems
 7:         | "***" -> (if (lines |> Seq.isEmpty) then "" else (lines |> Seq.nth 0))
 8:         | _ -> title
 9: 
10:     member this.Href = poemHref
11:     member this.Lines = parsedLines
12:     ///Line tokens are Russian words comprising the line. 
13:     member this.LineTokens = lineTokens
14:     member this.Title = parsedTitle
15: 
16:     member this.SerializationInfo = (poemHref, title, lines)F# Web Snippets

 

Надеюсь, вы уже разобрались с языком =) Теперь кое-что посложнее и поинтереснее.

Вот основной код веб-спайдера, то есть сборщика документов для индексации. У меня он сразу же переводит найденные документы в объекты типа Poem, которые производят разбиение на слова, но обратите внимание как почти всю функцию удаётся записать через операции конвейеризации — это очень удобный подход, логичный и легко читаемый. То есть функция берет некий исходный список данных (в нашем случае это список URL сборников стихов) и шаг за шагом преобразует этот список, применяя несколько системных функций, таких как Seq.map, Seq.collect или Seq.filter.

Seq.map применяет функцию заданную первым параметром к каждому элементу списка, заданному вторым параметром2, и на выходе строит список3 из полученных в результате значений.

Seq.collect тоже применяет функцию заданную первым параметром к каждому элементу списка, заданному вторым параметром, но ожидает, что функция вернет список значений, склеивает все такие полученные списки вместе и выдает в качестве результата.

Seq.filter возвращает список таких значений из входного списка, которые удовлетворяют предикату, заданному первым параметром.

 1: let crawlPoemsFromWeb =
 2:     let domainUrl = "http://www.rvb.ru/pushkin/"
 3:     let volumeUrlTemplate = domainUrl + "tocvol{0}.htm"
 4:     let poemUrlTemplate = domainUrl + "{0}"
 5: 
 6:     //take only first 4 volumes -- they contain poems
 7:     seq { for volumeNumber in 1..4 -> String.Format(volumeUrlTemplate, volumeNumber) }
 8:         |> Seq.map webRequestHtmlWin1251
 9: 
10:         //Poems are always referenced through named links on rvb.ru
11:         |> Seq.collect extractNamedHrefs
12: 
13:         //We only take final editions of poems to avoid massive duplication of lines
14:         |> Seq.filter isFinalEditionHref
15: 
16:         |> Seq.map (fun href -> String.Format(poemUrlTemplate, href))
17: 
18: // //uncomment for development mode -- total number of Poems is ~800
19: // |> Seq.take 40
20: 
21:         //Request and wrap individual poems
22:         |> Seq.map (fun href -> (hrefAndHtmlToPoem href (webRequestHtmlWin1251 href)))
23: 
24:         //Empty poem is not a poem!
25:         //It means we crawled some prose accidentally, it might happen
26:         |> Seq.filter (fun poem -> not (poem.Lines |> Seq.isEmpty))
27: 
28:         //cache results so we don't crawl poems twice
29:         |> Seq.cacheF# Web Snippets

 

Теперь давайте посмотрим на функцию, создающую обратный индекс. Она получает на вход список стихотворений list<Poem>, а на выходе у нее интересный тип (string * (int * (int * int) list) list) list. Звёздочку в данном случае следует читать, как «пара». Физический смысл здесь такой, на выходе список пар (Слово — список пар (Номер Стихотворения — список пар (Номер Строки — Номер позиции в строке))).

Обратный индекс

Первая часть функции строит таблицу (Слово, (Номер стихотворения, (Номер строки, Номер позиции в строке))), а вторая часть функции группирует и упорядочивает эту структуру сначала по слову, потом по номеру стихотворения, потом по номеру строки и позиции.

 1: let indexPoems (poems : list<Poem>) =
 2:     poems
 3:         |> List.mapi
 4:         (
 5:             fun poemNumber poem ->
 6:                 poem.LineTokens
 7:                     |> List.mapi
 8:                     (
 9:                         fun lineNumber tokens ->
10:                             tokens
11:                                 |> List.mapi
12:                                 (
13:                                     fun position token ->
14:                                         //nested tokens to simplify grouping later
15:                                         (token, (poemNumber, (lineNumber, position)))
16:                                 )
17:                     )
18:                     |> List.collect id
19:         )
20:         |> List.collect id
21: 
22:         //now we have raw list of tuples, we will turn it into ordered inversed index
23: 
24:         |> extractKeySortAndGroupBy
25:         |> Seq.map
26:         (
27:             fun (token, tuples) ->
28:                 let poems =
29:                     tuples
30:                         |> extractKeySortAndGroupBy
31:                         |> Seq.map
32:                         (
33:                             fun (poemNumber, tuples) ->
34:                                 let linesPositions =
35:                                     tuples
36:                                         |> Seq.sortBy ( fun (lineNumber,position) -> position)
37:                                         |> Seq.sortBy ( fun (lineNumber,position) -> lineNumber)    //sortBy is stable according to MSDN
38:                                         |> Seq.toList
39:                                 (poemNumber, linesPositions)
40:                         )
41:                         |> Seq.toList
42:                 (token, poems)
43:         )
44:         |> Seq.toListF# Web Snippets

 

А вот как выглядит функция слияния / пересечения двух упорядоченных списков, для которой в разделе теории был приведен псевдокод. Помимо того, что выглядит она довольно кратко и очень близко к псевдокоду, есть одно интересное изменение в коде по сравнению с предыдущей версией, которое ускорило запросы к индексу аж в 6 раз! Стандартный профайлер Visual Studio показывал следующую картинку, обратите внимание на то, что дольше всего программа занималась сравнением ключей из списков и делала это при помощи GenericGreaterThanIntrinsic.

Профиль неоптимального кода слияния списковПрофиль неоптимального кода слияния списков

Явная типизация возвращаемого значения функции (keyExtractor : ‘a -> int) полностью устраняет все эти вызовы. Оказывается, если функцию не типизировать, то F# выведет слишком универсальный тип и будет использовать обобщенное сравнение, которое намного медленнее, чем сравнение целых чисел. Если ввести ограничение, что ключ у нас всегда целое число, то проблемы не возникает. На СтекСорвало есть описание похожего случая.

 1: let rec orderedListsMerge xs ys (keyExtractor : 'a->int) merger =
 2:     match xs, ys with
 3:     | [],_ | _,[] -> []
 4:     | x::xs', y::ys' ->
 5:         let xkey = keyExtractor x
 6:         let ykey = keyExtractor y
 7:         if(xkey = ykey) then
 8:             (commentary omitted)
 9:             (merger x y) :: orderedListsMerge xs' ys keyExtractor merger
10:         elif(xkey > ykey) then
11:             orderedListsMerge xs ys' keyExtractor merger
12:         else
13:             orderedListsMerge xs' ys keyExtractor mergerF# Web Snippets

 

Последний кусочек кода, который я покажу, пожалуй, самый сложный, это центральный метод для всех запросов. Этот метод получает на входе полный обратный индекс и подмножество обратного индекса для одного из слов. На выходе строится частичный обратный индекс, в котором присутствуют только те слова, которые в документах идут после слова из заданного частичного индекса. Например, если у меня есть частичный индекс только для слова «и», то я могу с помощью этого метода построить частичный индекс, в котором будут только слова, встречающиеся в документах непосредственно после слова «и». Этот метод использует вложенные вызовы предыдущего метода для пересечения списков и непосредственно применяется в методе, который строит итоговое дерево.

 1: let intersectIndex currentIndex tokenIndex =
 2:     //function to merge lists of (lineNumber,position) from current index and token index
 3:     let mergeLinesPositions currentLinesPositions tokenLinesPositions =
 4:         let keyExtractor = fst
 5:         let merger = (fun (currentLineNumber, currentPosition) (_,_) -> (currentLineNumber, currentPosition))
 6:         orderedListsMerge currentLinesPositions tokenLinesPositions keyExtractor merger
 7: 
 8:     //function to merge lists of (poemNumber, list(lineNumber,position)) from current index and token index
 9:     let mergePoems currentPoems tokenPoems =
10:         let keyExtractor = fst
11:         let merger = (fun (currentPoemNumber, currentLinesPositions) (_, tokenLinesPositions) -> (currentPoemNumber, mergeLinesPositions currentLinesPositions tokenLinesPositions))
12:         orderedListsMerge currentPoems tokenPoems keyExtractor merger
13:             |> List.filter (fun (poemNumber, linesPositions) -> not (List.isEmpty linesPositions))
14: 
15:     currentIndex
16:         |> List.map (fun (token, poems) -> (token, mergePoems poems tokenIndex))
17:         |> List.filter (fun (token, poems) -> not (List.isEmpty poems))F# Web Snippets

 

val time : string -> (unit -> ‘a) -> ‘a

Full name: Program.time

Measures time spent in an eagerly executed function.
Not gonna work with a lazy function, e.g. function returning a sequence (IEnumerable).

val jobName : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val job : (unit -> ‘a)

val startTime : DateTime

type: DateTime
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: ISerializable
implements: IComparable<DateTime>
implements: IEquatable<DateTime>
inherits: ValueType

type DateTime =
struct
new : int64 -> System.DateTime
new : int64 * System.DateTimeKind -> System.DateTime
new : int * int * int -> System.DateTime
new : int * int * int * System.Globalization.Calendar -> System.DateTime
new : int * int * int * int * int * int -> System.DateTime
new : int * int * int * int * int * int * System.DateTimeKind -> System.DateTime
new : int * int * int * int * int * int * System.Globalization.Calendar -> System.DateTime
new : int * int * int * int * int * int * int -> System.DateTime
new : int * int * int * int * int * int * int * System.DateTimeKind -> System.DateTime
new : int * int * int * int * int * int * int * System.Globalization.Calendar -> System.DateTime
new : int * int * int * int * int * int * int * System.Globalization.Calendar * System.DateTimeKind -> System.DateTime
member Add : System.TimeSpan -> System.DateTime
member AddDays : float -> System.DateTime
member AddHours : float -> System.DateTime
member AddMilliseconds : float -> System.DateTime
member AddMinutes : float -> System.DateTime
member AddMonths : int -> System.DateTime
member AddSeconds : float -> System.DateTime
member AddTicks : int64 -> System.DateTime
member AddYears : int -> System.DateTime
member CompareTo : obj -> int
member CompareTo : System.DateTime -> int
member Date : System.DateTime
member Day : int
member DayOfWeek : System.DayOfWeek
member DayOfYear : int
member Equals : obj -> bool
member Equals : System.DateTime -> bool
member GetDateTimeFormats : unit -> string []
member GetDateTimeFormats : System.IFormatProvider -> string []
member GetDateTimeFormats : char -> string []
member GetDateTimeFormats : char * System.IFormatProvider -> string []
member GetHashCode : unit -> int
member GetTypeCode : unit -> System.TypeCode
member Hour : int
member IsDaylightSavingTime : unit -> bool
member Kind : System.DateTimeKind
member Millisecond : int
member Minute : int
member Month : int
member Second : int
member Subtract : System.DateTime -> System.TimeSpan
member Subtract : System.TimeSpan -> System.DateTime
member Ticks : int64
member TimeOfDay : System.TimeSpan
member ToBinary : unit -> int64
member ToFileTime : unit -> int64
member ToFileTimeUtc : unit -> int64
member ToLocalTime : unit -> System.DateTime
member ToLongDateString : unit -> string
member ToLongTimeString : unit -> string
member ToOADate : unit -> float
member ToShortDateString : unit -> string
member ToShortTimeString : unit -> string
member ToString : unit -> string
member ToString : string -> string
member ToString : System.IFormatProvider -> string
member ToString : string * System.IFormatProvider -> string
member ToUniversalTime : unit -> System.DateTime
member Year : int
static val MinValue : System.DateTime
static val MaxValue : System.DateTime
static member Compare : System.DateTime * System.DateTime -> int
static member DaysInMonth : int * int -> int
static member Equals : System.DateTime * System.DateTime -> bool
static member FromBinary : int64 -> System.DateTime
static member FromFileTime : int64 -> System.DateTime
static member FromFileTimeUtc : int64 -> System.DateTime
static member FromOADate : float -> System.DateTime
static member IsLeapYear : int -> bool
static member Now : System.DateTime
static member Parse : string -> System.DateTime
static member Parse : string * System.IFormatProvider -> System.DateTime
static member Parse : string * System.IFormatProvider * System.Globalization.DateTimeStyles -> System.DateTime
static member ParseExact : string * string * System.IFormatProvider -> System.DateTime
static member ParseExact : string * string * System.IFormatProvider * System.Globalization.DateTimeStyles -> System.DateTime
static member ParseExact : string * string [] * System.IFormatProvider * System.Globalization.DateTimeStyles -> System.DateTime
static member SpecifyKind : System.DateTime * System.DateTimeKind -> System.DateTime
static member Today : System.DateTime
static member TryParse : string * System.DateTime -> bool
static member TryParse : string * System.IFormatProvider * System.Globalization.DateTimeStyles * System.DateTime -> bool
static member TryParseExact : string * string * System.IFormatProvider * System.Globalization.DateTimeStyles * System.DateTime -> bool
static member TryParseExact : string * string [] * System.IFormatProvider * System.Globalization.DateTimeStyles * System.DateTime -> bool
static member UtcNow : System.DateTime
end

Full name: System.DateTime

type: DateTime
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: ISerializable
implements: IComparable<DateTime>
implements: IEquatable<DateTime>
inherits: ValueType

property DateTime.Now: DateTime
val returnValue : ‘a

val endTime : DateTime

type: DateTime
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: ISerializable
implements: IComparable<DateTime>
implements: IEquatable<DateTime>
inherits: ValueType

val printfn : Printf.TextWriterFormat<‘T> -> ‘T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn

Multiple itemsval int : ‘T -> int (requires member op_Explicit)Full name: Microsoft.FSharp.Core.Operators.int——————–

type int<‘Measure> = int

Full name: Microsoft.FSharp.Core.int<_>

type: int<‘Measure>
implements: IComparable
implements: IConvertible
implements: IFormattable
implements: IComparable<int<‘Measure>>
implements: IEquatable<int<‘Measure>>
inherits: ValueType

——————–

type int = int32

Full name: Microsoft.FSharp.Core.int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val orderedListsMerge : ‘a list -> ‘a list -> (‘a -> int) -> (‘a -> ‘a -> ‘a0) -> ‘a0 list

Full name: Program.orderedListsMerge

Merges 2 lists, calling merge function for the elements with equal keys.
Function assumes that all keys in the second list are unique.
Function assumes both lists are ordered ascending.

val xs : ‘a list

type: ‘a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘a>
implements: Collections.IEnumerable

val ys : ‘a list

type: ‘a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘a>
implements: Collections.IEnumerable

val keyExtractor : (‘a -> int)
val merger : (‘a -> ‘a -> ‘a0)
val x : ‘a

val xs’ : ‘a list

type: ‘a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘a>
implements: Collections.IEnumerable

val y : ‘a

val ys’ : ‘a list

type: ‘a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘a>
implements: Collections.IEnumerable

val xkey : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val ykey : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

//here we move xs forward, but keep ys the same,
//because we assume that next y have a different key while next x might still have the same key,
//without that assumption the results are incorrect

type Poem =
class
new : poemHref:string * title:string * lines:seq<string> -> Poem
member Href : string
member LineTokens : string list list
member Lines : string list
member SerializationInfo : string * string * seq<string>
member Title : string
end

Full name: Program.Poem

val poemHref : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

Multiple itemsval string : ‘T -> stringFull name: Microsoft.FSharp.Core.Operators.string——————–

type string = String

Full name: Microsoft.FSharp.Core.string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val title : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val lines : seq<string>

type: seq<string>
inherits: Collections.IEnumerable

Multiple itemsval seq : seq<‘T> -> seq<‘T>Full name: Microsoft.FSharp.Core.Operators.seq——————–

type seq<‘T> = Collections.Generic.IEnumerable<‘T>

Full name: Microsoft.FSharp.Collections.seq<_>

type: seq<‘T>
inherits: Collections.IEnumerable

val parsedLines : string list

type: string list
implements: Collections.IStructuralEquatable
implements: IComparable<List<string>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<string>
implements: Collections.IEnumerable

val line : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

Multiple overloadsString.Replace(oldValue: string, newValue: string) : stringString.Replace(oldChar: char, newChar: char) : string

val lineTokens : string list list

type: string list list
implements: Collections.IStructuralEquatable
implements: IComparable<List<string list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<string list>
implements: Collections.IEnumerable

val regexMatches : string -> string -> string list

Full name: Program.regexMatches

Multiple overloadsString.ToLower() : stringString.ToLower(culture: Globalization.CultureInfo) : string

val parsedTitle : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

module Seq

from Microsoft.FSharp.Collections

val isEmpty : seq<‘T> -> bool

Full name: Microsoft.FSharp.Collections.Seq.isEmpty

val nth : int -> seq<‘T> -> ‘T

Full name: Microsoft.FSharp.Collections.Seq.nth

val this : Poem

member Poem.Href : string

Full name: Program.Poem.Href

member Poem.Lines : string list

Full name: Program.Poem.Lines

member Poem.LineTokens : string list list

Full name: Program.Poem.LineTokens

Line tokens are Russian words comprising the line.

member Poem.Title : string

Full name: Program.Poem.Title

Multiple itemsmember Poem.SerializationInfo : string * string * seq<string>Full name: Program.Poem.SerializationInfo——————–

type SerializationInfo =
class
new : System.Type * System.Runtime.Serialization.IFormatterConverter -> System.Runtime.Serialization.SerializationInfo
member AddValue : string * obj -> unit
member AddValue : string * bool -> unit
member AddValue : string * char -> unit
member AddValue : string * System.SByte -> unit
member AddValue : string * System.Byte -> unit
member AddValue : string * int16 -> unit
member AddValue : string * uint16 -> unit
member AddValue : string * int -> unit
member AddValue : string * uint32 -> unit
member AddValue : string * int64 -> unit
member AddValue : string * uint64 -> unit
member AddValue : string * float32 -> unit
member AddValue : string * float -> unit
member AddValue : string * decimal -> unit
member AddValue : string * System.DateTime -> unit
member AddValue : string * obj * System.Type -> unit
member AssemblyName : string with get, set
member FullTypeName : string with get, set
member GetBoolean : string -> bool
member GetByte : string -> System.Byte
member GetChar : string -> char
member GetDateTime : string -> System.DateTime
member GetDecimal : string -> decimal
member GetDouble : string -> float
member GetEnumerator : unit -> System.Runtime.Serialization.SerializationInfoEnumerator
member GetInt16 : string -> int16
member GetInt32 : string -> int
member GetInt64 : string -> int64
member GetSByte : string -> System.SByte
member GetSingle : string -> float32
member GetString : string -> string
member GetUInt16 : string -> uint16
member GetUInt32 : string -> uint32
member GetUInt64 : string -> uint64
member GetValue : string * System.Type -> obj
member IsAssemblyNameSetExplicit : bool
member IsFullTypeNameSetExplicit : bool
member MemberCount : int
member ObjectType : System.Type
member SetType : System.Type -> unit
end

Full name: System.Runtime.Serialization.SerializationInfo

val indexPoems : Poem list -> (string * (int * (int * int) list) list) list

Full name: Program.indexPoems

Builds inversed index of tokens in poems.
Index structure is (token -> poem number -> (line number,position in line)).

val poems : Poem list

type: Poem list
implements: Collections.IStructuralEquatable
implements: IComparable<List<Poem>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<Poem>
implements: Collections.IEnumerable

type ‘T list = List<‘T>

Full name: Microsoft.FSharp.Collections.list<_>

type: ‘T list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘T>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘T>
implements: Collections.IEnumerable

Multiple itemsmodule Listfrom Microsoft.FSharp.Collections——————–

type List<‘T> =
| ( [] )
| ( :: ) of ‘T * ‘T list
with
interface Collections.IEnumerable
interface Collections.Generic.IEnumerable<‘T>
member Head : ‘T
member IsEmpty : bool
member Item : index:int -> ‘T with get
member Length : int
member Tail : ‘T list
static member Cons : head:’T * tail:’T list -> ‘T list
static member Empty : ‘T list
end

Full name: Microsoft.FSharp.Collections.List<_>

type: List<‘T>
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘T>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘T>
implements: Collections.IEnumerable

val mapi : (int -> ‘T -> ‘U) -> ‘T list -> ‘U list

Full name: Microsoft.FSharp.Collections.List.mapi

val poemNumber : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val poem : Poem

property Poem.LineTokens: string list list

Line tokens are Russian words comprising the line.

val lineNumber : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val tokens : string list

type: string list
implements: Collections.IStructuralEquatable
implements: IComparable<List<string>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<string>
implements: Collections.IEnumerable

val position : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val token : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val collect : (‘T -> ‘U list) -> ‘T list -> ‘U list

Full name: Microsoft.FSharp.Collections.List.collect

val id : ‘T -> ‘T

Full name: Microsoft.FSharp.Core.Operators.id

val extractKeySortAndGroupBy : seq<‘a * ‘b> -> seq<‘a * seq<‘b>> (requires comparison)

Full name: Program.extractKeySortAndGroupBy

Transforms seq<key*data> to seq<key*seq<data>> ordered by key.

val map : (‘T -> ‘U) -> seq<‘T> -> seq<‘U>

Full name: Microsoft.FSharp.Collections.Seq.map

val tuples : seq<int * (int * int)>

type: seq<int * (int * int)>
inherits: Collections.IEnumerable

val poems : (int * (int * int) list) list

type: (int * (int * int) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * (int * int) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * (int * int) list>
implements: Collections.IEnumerable

val tuples : seq<int * int>

type: seq<int * int>
inherits: Collections.IEnumerable

val linesPositions : (int * int) list

type: (int * int) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * int>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * int>
implements: Collections.IEnumerable

val sortBy : (‘T -> ‘Key) -> seq<‘T> -> seq<‘T> (requires comparison)

Full name: Microsoft.FSharp.Collections.Seq.sortBy

val toList : seq<‘T> -> ‘T list

Full name: Microsoft.FSharp.Collections.Seq.toList

val intersectIndex : (‘a * (int * (int * ‘b) list) list) list -> (int * (int * ‘b) list) list -> (‘a * (int * (int * ‘b) list) list) list

Full name: Program.intersectIndex

Intersect current index with token index.
We will only keep tokens that occur in the poems and lines of the token index.

val currentIndex : (‘a * (int * (int * ‘b) list) list) list

type: (‘a * (int * (int * ‘b) list) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘a * (int * (int * ‘b) list) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘a * (int * (int * ‘b) list) list>
implements: Collections.IEnumerable

val tokenIndex : (int * (int * ‘b) list) list

type: (int * (int * ‘b) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * (int * ‘b) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * (int * ‘b) list>
implements: Collections.IEnumerable

val mergeLinesPositions : ((int * ‘c) list -> (int * ‘c) list -> (int * ‘c) list)

val currentLinesPositions : (int * ‘c) list

type: (int * ‘c) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * ‘c>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * ‘c>
implements: Collections.IEnumerable

val tokenLinesPositions : (int * ‘c) list

type: (int * ‘c) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * ‘c>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * ‘c>
implements: Collections.IEnumerable

val keyExtractor : (‘d * ‘e -> ‘d)

val fst : (‘T1 * ‘T2) -> ‘T1

Full name: Microsoft.FSharp.Core.Operators.fst

val merger : (‘d * ‘e -> ‘f * ‘g -> ‘d * ‘e)
val currentLineNumber : ‘d
val currentPosition : ‘e
val mergePoems : ((int * (int * ‘c) list) list -> (int * (int * ‘c) list) list -> (int * (int * ‘c) list) list)

val currentPoems : (int * (int * ‘c) list) list

type: (int * (int * ‘c) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * (int * ‘c) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * (int * ‘c) list>
implements: Collections.IEnumerable

val tokenPoems : (int * (int * ‘c) list) list

type: (int * (int * ‘c) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * (int * ‘c) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * (int * ‘c) list>
implements: Collections.IEnumerable

val merger : (‘d * (int * ‘e) list -> ‘f * (int * ‘e) list -> ‘d * (int * ‘e) list)
val currentPoemNumber : ‘d

val currentLinesPositions : (int * ‘e) list

type: (int * ‘e) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * ‘e>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * ‘e>
implements: Collections.IEnumerable

val tokenLinesPositions : (int * ‘e) list

type: (int * ‘e) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * ‘e>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * ‘e>
implements: Collections.IEnumerable

val filter : (‘T -> bool) -> ‘T list -> ‘T list

Full name: Microsoft.FSharp.Collections.List.filter

val linesPositions : (int * ‘c) list

type: (int * ‘c) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * ‘c>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * ‘c>
implements: Collections.IEnumerable

val not : bool -> bool

Full name: Microsoft.FSharp.Core.Operators.not

val isEmpty : ‘T list -> bool

Full name: Microsoft.FSharp.Collections.List.isEmpty

val map : (‘T -> ‘U) -> ‘T list -> ‘U list

Full name: Microsoft.FSharp.Collections.List.map

val token : ‘a

val poems : (int * (int * ‘b) list) list

type: (int * (int * ‘b) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * (int * ‘b) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * (int * ‘b) list>
implements: Collections.IEnumerable

val crawlPoemsFromWeb : seq<Poem>

Full name: Program.crawlPoemsFromWeb

type: seq<Poem>
inherits: Collections.IEnumerable

val domainUrl : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val volumeUrlTemplate : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val poemUrlTemplate : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val volumeNumber : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

type String =
class
new : char -> string
new : char * int * int -> string
new : System.SByte -> string
new : System.SByte * int * int -> string
new : System.SByte * int * int * System.Text.Encoding -> string
new : char [] * int * int -> string
new : char [] -> string
new : char * int -> string
member Chars : int -> char
member Clone : unit -> obj
member CompareTo : obj -> int
member CompareTo : string -> int
member Contains : string -> bool
member CopyTo : int * char [] * int * int -> unit
member EndsWith : string -> bool
member EndsWith : string * System.StringComparison -> bool
member EndsWith : string * bool * System.Globalization.CultureInfo -> bool
member Equals : obj -> bool
member Equals : string -> bool
member Equals : string * System.StringComparison -> bool
member GetEnumerator : unit -> System.CharEnumerator
member GetHashCode : unit -> int
member GetTypeCode : unit -> System.TypeCode
member IndexOf : char -> int
member IndexOf : string -> int
member IndexOf : char * int -> int
member IndexOf : string * int -> int
member IndexOf : string * System.StringComparison -> int
member IndexOf : char * int * int -> int
member IndexOf : string * int * int -> int
member IndexOf : string * int * System.StringComparison -> int
member IndexOf : string * int * int * System.StringComparison -> int
member IndexOfAny : char [] -> int
member IndexOfAny : char [] * int -> int
member IndexOfAny : char [] * int * int -> int
member Insert : int * string -> string
member IsNormalized : unit -> bool
member IsNormalized : System.Text.NormalizationForm -> bool
member LastIndexOf : char -> int
member LastIndexOf : string -> int
member LastIndexOf : char * int -> int
member LastIndexOf : string * int -> int
member LastIndexOf : string * System.StringComparison -> int
member LastIndexOf : char * int * int -> int
member LastIndexOf : string * int * int -> int
member LastIndexOf : string * int * System.StringComparison -> int
member LastIndexOf : string * int * int * System.StringComparison -> int
member LastIndexOfAny : char [] -> int
member LastIndexOfAny : char [] * int -> int
member LastIndexOfAny : char [] * int * int -> int
member Length : int
member Normalize : unit -> string
member Normalize : System.Text.NormalizationForm -> string
member PadLeft : int -> string
member PadLeft : int * char -> string
member PadRight : int -> string
member PadRight : int * char -> string
member Remove : int -> string
member Remove : int * int -> string
member Replace : char * char -> string
member Replace : string * string -> string
member Split : char [] -> string []
member Split : char [] * int -> string []
member Split : char [] * System.StringSplitOptions -> string []
member Split : string [] * System.StringSplitOptions -> string []
member Split : char [] * int * System.StringSplitOptions -> string []
member Split : string [] * int * System.StringSplitOptions -> string []
member StartsWith : string -> bool
member StartsWith : string * System.StringComparison -> bool
member StartsWith : string * bool * System.Globalization.CultureInfo -> bool
member Substring : int -> string
member Substring : int * int -> string
member ToCharArray : unit -> char []
member ToCharArray : int * int -> char []
member ToLower : unit -> string
member ToLower : System.Globalization.CultureInfo -> string
member ToLowerInvariant : unit -> string
member ToString : unit -> string
member ToString : System.IFormatProvider -> string
member ToUpper : unit -> string
member ToUpper : System.Globalization.CultureInfo -> string
member ToUpperInvariant : unit -> string
member Trim : unit -> string
member Trim : char [] -> string
member TrimEnd : char [] -> string
member TrimStart : char [] -> string
static val Empty : string
static member Compare : string * string -> int
static member Compare : string * string * bool -> int
static member Compare : string * string * System.StringComparison -> int
static member Compare : string * string * System.Globalization.CultureInfo * System.Globalization.CompareOptions -> int
static member Compare : string * string * bool * System.Globalization.CultureInfo -> int
static member Compare : string * int * string * int * int -> int
static member Compare : string * int * string * int * int * bool -> int
static member Compare : string * int * string * int * int * System.StringComparison -> int
static member Compare : string * int * string * int * int * bool * System.Globalization.CultureInfo -> int
static member Compare : string * int * string * int * int * System.Globalization.CultureInfo * System.Globalization.CompareOptions -> int
static member CompareOrdinal : string * string -> int
static member CompareOrdinal : string * int * string * int * int -> int
static member Concat : obj -> string
static member Concat : obj [] -> string
static member Concat<‘T> : System.Collections.Generic.IEnumerable<‘T> -> string
static member Concat : System.Collections.Generic.IEnumerable<string> -> string
static member Concat : string [] -> string
static member Concat : obj * obj -> string
static member Concat : string * string -> string
static member Concat : obj * obj * obj -> string
static member Concat : string * string * string -> string
static member Concat : obj * obj * obj * obj -> string
static member Concat : string * string * string * string -> string
static member Copy : string -> string
static member Equals : string * string -> bool
static member Equals : string * string * System.StringComparison -> bool
static member Format : string * obj -> string
static member Format : string * obj [] -> string
static member Format : string * obj * obj -> string
static member Format : System.IFormatProvider * string * obj [] -> string
static member Format : string * obj * obj * obj -> string
static member Intern : string -> string
static member IsInterned : string -> string
static member IsNullOrEmpty : string -> bool
static member IsNullOrWhiteSpace : string -> bool
static member Join : string * string [] -> string
static member Join : string * obj [] -> string
static member Join<‘T> : string * System.Collections.Generic.IEnumerable<‘T> -> string
static member Join : string * System.Collections.Generic.IEnumerable<string> -> string
static member Join : string * string [] * int * int -> string
end

Full name: System.String

type: String
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

Multiple overloadsString.Format(format: string, args: obj []) : stringString.Format(format: string, arg0: obj) : string

String.Format(provider: IFormatProvider, format: string, args: obj []) : string

String.Format(format: string, arg0: obj, arg1: obj) : string

String.Format(format: string, arg0: obj, arg1: obj, arg2: obj) : string

val webRequestHtmlWin1251 : string -> string

Full name: Program.webRequestHtmlWin1251

Requests HTML for the given URL using Windows-1251 encoding.

val collect : (‘T -> #seq<‘U>) -> seq<‘T> -> seq<‘U>

Full name: Microsoft.FSharp.Collections.Seq.collect

val extractNamedHrefs : string -> string list

Full name: Program.extractNamedHrefs

Extract HREFS only from the named links

val filter : (‘T -> bool) -> seq<‘T> -> seq<‘T>

Full name: Microsoft.FSharp.Collections.Seq.filter

val isFinalEditionHref : string -> bool

Full name: Program.isFinalEditionHref

Check that the given link is a link to a final edition poem

val href : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val hrefAndHtmlToPoem : string -> string -> Poem

Full name: Program.hrefAndHtmlToPoem

property Poem.Lines: string list

val cache : seq<‘T> -> seq<‘T>

Full name: Microsoft.FSharp.Collections.Seq.cache

val poems : Poem list

Full name: Program.poems

type: Poem list
implements: Collections.IStructuralEquatable
implements: IComparable<List<Poem>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<Poem>
implements: Collections.IEnumerable

val crawlPoemsOrLoadFromCache : Poem list

Full name: Program.crawlPoemsOrLoadFromCache

type: Poem list
implements: Collections.IStructuralEquatable
implements: IComparable<List<Poem>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<Poem>
implements: Collections.IEnumerable

val length : seq<‘T> -> int

Full name: Microsoft.FSharp.Collections.Seq.length

val sumBy : (‘T -> ‘U) -> seq<‘T> -> ‘U (requires member ( + ) and member get_Zero)

Full name: Microsoft.FSharp.Collections.Seq.sumBy

val poemIndex : (string * (int * (int * int) list) list) list

Full name: Program.poemIndex

type: (string * (int * (int * int) list) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<string * (int * (int * int) list) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<string * (int * (int * int) list) list>
implements: Collections.IEnumerable

property List.Length: int

val tree : PushkinTreeNode list

Full name: Program.tree

type: PushkinTreeNode list
implements: Collections.IStructuralEquatable
implements: IComparable<List<PushkinTreeNode>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<PushkinTreeNode>
implements: Collections.IEnumerable

val createPushkinTree : seq<Poem> -> (string * (int * (int * int) list) list) list -> int -> PushkinTreeNode list

Full name: Program.createPushkinTree

val pushkinTreeToNumberOfQueries : seq<PushkinTreeNode> -> int

Full name: Program.pushkinTreeToNumberOfQueries

val htmlContent : string

Full name: Program.htmlContent

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val resultsToHtml : seq<PushkinTreeNode> -> string

Full name: Program.resultsToHtml

val outputResultsToFile : string -> unit

Full name: Program.outputResultsToFile

  1. если у вас не работают, то обновите либо страницу, либо браузер []
  2. в приведенном коде второй параметр всегда получается из предыдущего шага конвейера []
  3. на самом деле, со списками работает класс List, а класс Seq работает с последовательностями, то есть с IEnumerable []