Разбор робота-пушкиниста. Часть вторая. Практика.
by Max Galkin
В этом посте я разбираю устройство робота-пушкиниста из прошлой записи. В первой части разбора было приведено краткое изложение первых лекций курса теории поиска. Здесь я привожу краткое введение в эф-шарп и некоторые удивительные заметки насчёт реализации робота.
Обратите внимание, я улучшил код робота, исправил некоторые ошибки и дописал комментарии, новый код лежит тут.
Посмотреть дерево стихов Пушкина
Конечно, тут вряд ли тут возможно дать сколь-нибудь полное описание эф-шарпа. К сожалению, я не могу порекомендовать и никакого ресурса, на котором бы легко и ясно давалось такое описание. Я покажу несколько примеров из кода робота, иллюстрирующих некоторые особенности языка, остальное желающие найдут сами с помощью любимой поисковой системы.
F# — функциональный (гибридный) язык со строгой статической типизацией для платформы .NET. Гибридный он потому, что поддерживает и объектно-ориентированную разработку, в частности, для взаимодействия со стандартными библиотеками фреймворка. F# также позволяет писать функции с побочными эффектами и иметь переменные, то есть не является чистым функциональным, как, к примеру, Haskell.
В качестве самого простого примера рассмотрим объявление служебной функции time, которая измеряет время выполнения другой функции (задаваемой в качестве параметра).
Ключевое слово let объявляет новое имя, в данном случае функцию time с двумя аргументами jobName и job. Обратите внимание, что мы не указываем типы аргументов, в большинстве случаев компилятор автоматически выводит эти типы, исходя из кода метода. Например, аргумент jobName мы печатаем через функцию printfn как строку, поэтому компилятор понимает, что это должна быть строка. Аналогично, аргумент job мы вызываем как функцию без параметров в строке 3, компилятор понимает, что это должна быть функция без параметров. Попробуйте навести курсором на идентификаторы в тексте программы и вы увидите подсказки с типами значений1. Здесь можно увидеть взаимодействие со стандартными типами .NET фреймворка (строки 2 и 4) и операцию приведения типа (строка 5).
1: let time jobName job = 2: let startTime = DateTime.Now; 3: let returnValue = job() 4: let endTime = DateTime.Now; 5: printfn "%s took %d ms" jobName (int((endTime - startTime).TotalMilliseconds)) 6: returnValueF# Web Snippets
Дальше — пример использования функции time и полный код основной программы. Здесь поочередно вызываются основные шаги поисковой системы, которые описаны в предыдущей части (сбор и обработка документов, индексация, запросы, вывод результатов) и на консоль выводятся измерения времени работы и размеров полученных структур. Здесь можно увидеть примеры создания анонимных функций с помощью лямбда-выражений (fun() -> crawlPoemsOrLoadFromCache) и конвейеризацию операций, где function1 |> fucntion2 всего лишь означает, что результат выполнения функции 1 будет передан на вход функции 2.
1: let poems = time "Crawling poems" (fun() -> crawlPoemsOrLoadFromCache) 2: printfn "Crawled %d poems" (poems |> Seq.length) 3: printfn "Crawled %d lines" (poems |> Seq.sumBy ( fun poem -> poem.Lines |> Seq.length)) 4: let poemIndex = time "Indexing poems" (fun () -> poems |> indexPoems) 5: printfn "Index contains %d terms" poemIndex.Length 6: let tree = time "Generating result tree" (fun () -> createPushkinTree poems poemIndex 20) 7: printfn "Number of queries made to create tree: %d" (pushkinTreeToNumberOfQueries tree) 8: let htmlContent = time "Generating html content" (fun () -> resultsToHtml tree) 9: time "Output content" (fun() -> outputResultsToFile htmlContent)F# Web Snippets
Если вы наведете курсор на функцию crawlPoemsOrLoadFromCache, то вы увидите, что её возвращаемый тип это список объектов типа Poem. Тип Poem — пример реализации объектного типа на F#. Если вы сравните с прошлой редакцией кода программы, то увидите, что там была допущена ошибка в реализации этого типа. Я думал, что объявления членов класса (memberthis.Lines = …) вычисляют значение и записывают его в публичное поле объекта. Оказалось, что эти объявления создают вычисляемое проперти с заданным кодом (аналог get{…} в С#). После переноса вычислений в приватные поля и вывод уже их через публичные свойства запросы естественно ускорились на 20%.
В строках 5-8 приводится пример сравнения с шаблоном (pattern matching).
1: type Poem(poemHref : string, title : string, lines : seq<string>) = 2: let parsedLines = [ for line in lines -> line.Replace(" " , "") ] 3: let lineTokens = [ for line in lines -> regexMatches (line.ToLower()) "([а-я]+)" ] 4: let parsedTitle = 5: match title with 6: //Take first line as the title for untitled poems 7: | "***" -> (if (lines |> Seq.isEmpty) then "" else (lines |> Seq.nth 0)) 8: | _ -> title 9: 10: member this.Href = poemHref 11: member this.Lines = parsedLines 12: ///Line tokens are Russian words comprising the line. 13: member this.LineTokens = lineTokens 14: member this.Title = parsedTitle 15: 16: member this.SerializationInfo = (poemHref, title, lines)F# Web Snippets
Надеюсь, вы уже разобрались с языком =) Теперь кое-что посложнее и поинтереснее.
Вот основной код веб-спайдера, то есть сборщика документов для индексации. У меня он сразу же переводит найденные документы в объекты типа Poem, которые производят разбиение на слова, но обратите внимание как почти всю функцию удаётся записать через операции конвейеризации — это очень удобный подход, логичный и легко читаемый. То есть функция берет некий исходный список данных (в нашем случае это список URL сборников стихов) и шаг за шагом преобразует этот список, применяя несколько системных функций, таких как Seq.map, Seq.collect или Seq.filter.
Seq.map применяет функцию заданную первым параметром к каждому элементу списка, заданному вторым параметром2, и на выходе строит список3 из полученных в результате значений.
Seq.collect тоже применяет функцию заданную первым параметром к каждому элементу списка, заданному вторым параметром, но ожидает, что функция вернет список значений, склеивает все такие полученные списки вместе и выдает в качестве результата.
Seq.filter возвращает список таких значений из входного списка, которые удовлетворяют предикату, заданному первым параметром.
1: let crawlPoemsFromWeb = 2: let domainUrl = "http://www.rvb.ru/pushkin/" 3: let volumeUrlTemplate = domainUrl + "tocvol{0}.htm" 4: let poemUrlTemplate = domainUrl + "{0}" 5: 6: //take only first 4 volumes -- they contain poems 7: seq { for volumeNumber in 1..4 -> String.Format(volumeUrlTemplate, volumeNumber) } 8: |> Seq.map webRequestHtmlWin1251 9: 10: //Poems are always referenced through named links on rvb.ru 11: |> Seq.collect extractNamedHrefs 12: 13: //We only take final editions of poems to avoid massive duplication of lines 14: |> Seq.filter isFinalEditionHref 15: 16: |> Seq.map (fun href -> String.Format(poemUrlTemplate, href)) 17: 18: // //uncomment for development mode -- total number of Poems is ~800 19: // |> Seq.take 40 20: 21: //Request and wrap individual poems 22: |> Seq.map (fun href -> (hrefAndHtmlToPoem href (webRequestHtmlWin1251 href))) 23: 24: //Empty poem is not a poem! 25: //It means we crawled some prose accidentally, it might happen 26: |> Seq.filter (fun poem -> not (poem.Lines |> Seq.isEmpty)) 27: 28: //cache results so we don't crawl poems twice 29: |> Seq.cacheF# Web Snippets
Теперь давайте посмотрим на функцию, создающую обратный индекс. Она получает на вход список стихотворений list<Poem>, а на выходе у нее интересный тип (string * (int * (int * int) list) list) list. Звёздочку в данном случае следует читать, как «пара». Физический смысл здесь такой, на выходе список пар (Слово — список пар (Номер Стихотворения — список пар (Номер Строки — Номер позиции в строке))).
Первая часть функции строит таблицу (Слово, (Номер стихотворения, (Номер строки, Номер позиции в строке))), а вторая часть функции группирует и упорядочивает эту структуру сначала по слову, потом по номеру стихотворения, потом по номеру строки и позиции.
1: let indexPoems (poems : list<Poem>) = 2: poems 3: |> List.mapi 4: ( 5: fun poemNumber poem -> 6: poem.LineTokens 7: |> List.mapi 8: ( 9: fun lineNumber tokens -> 10: tokens 11: |> List.mapi 12: ( 13: fun position token -> 14: //nested tokens to simplify grouping later 15: (token, (poemNumber, (lineNumber, position))) 16: ) 17: ) 18: |> List.collect id 19: ) 20: |> List.collect id 21: 22: //now we have raw list of tuples, we will turn it into ordered inversed index 23: 24: |> extractKeySortAndGroupBy 25: |> Seq.map 26: ( 27: fun (token, tuples) -> 28: let poems = 29: tuples 30: |> extractKeySortAndGroupBy 31: |> Seq.map 32: ( 33: fun (poemNumber, tuples) -> 34: let linesPositions = 35: tuples 36: |> Seq.sortBy ( fun (lineNumber,position) -> position) 37: |> Seq.sortBy ( fun (lineNumber,position) -> lineNumber) //sortBy is stable according to MSDN 38: |> Seq.toList 39: (poemNumber, linesPositions) 40: ) 41: |> Seq.toList 42: (token, poems) 43: ) 44: |> Seq.toListF# Web Snippets
А вот как выглядит функция слияния / пересечения двух упорядоченных списков, для которой в разделе теории был приведен псевдокод. Помимо того, что выглядит она довольно кратко и очень близко к псевдокоду, есть одно интересное изменение в коде по сравнению с предыдущей версией, которое ускорило запросы к индексу аж в 6 раз! Стандартный профайлер Visual Studio показывал следующую картинку, обратите внимание на то, что дольше всего программа занималась сравнением ключей из списков и делала это при помощи GenericGreaterThanIntrinsic.
Явная типизация возвращаемого значения функции (keyExtractor : ‘a -> int) полностью устраняет все эти вызовы. Оказывается, если функцию не типизировать, то F# выведет слишком универсальный тип и будет использовать обобщенное сравнение, которое намного медленнее, чем сравнение целых чисел. Если ввести ограничение, что ключ у нас всегда целое число, то проблемы не возникает. На СтекСорвало есть описание похожего случая.
1: let rec orderedListsMerge xs ys (keyExtractor : 'a->int) merger = 2: match xs, ys with 3: | [],_ | _,[] -> [] 4: | x::xs', y::ys' -> 5: let xkey = keyExtractor x 6: let ykey = keyExtractor y 7: if(xkey = ykey) then 8: (commentary omitted) 9: (merger x y) :: orderedListsMerge xs' ys keyExtractor merger 10: elif(xkey > ykey) then 11: orderedListsMerge xs ys' keyExtractor merger 12: else 13: orderedListsMerge xs' ys keyExtractor mergerF# Web Snippets
Последний кусочек кода, который я покажу, пожалуй, самый сложный, это центральный метод для всех запросов. Этот метод получает на входе полный обратный индекс и подмножество обратного индекса для одного из слов. На выходе строится частичный обратный индекс, в котором присутствуют только те слова, которые в документах идут после слова из заданного частичного индекса. Например, если у меня есть частичный индекс только для слова «и», то я могу с помощью этого метода построить частичный индекс, в котором будут только слова, встречающиеся в документах непосредственно после слова «и». Этот метод использует вложенные вызовы предыдущего метода для пересечения списков и непосредственно применяется в методе, который строит итоговое дерево.
1: let intersectIndex currentIndex tokenIndex = 2: //function to merge lists of (lineNumber,position) from current index and token index 3: let mergeLinesPositions currentLinesPositions tokenLinesPositions = 4: let keyExtractor = fst 5: let merger = (fun (currentLineNumber, currentPosition) (_,_) -> (currentLineNumber, currentPosition)) 6: orderedListsMerge currentLinesPositions tokenLinesPositions keyExtractor merger 7: 8: //function to merge lists of (poemNumber, list(lineNumber,position)) from current index and token index 9: let mergePoems currentPoems tokenPoems = 10: let keyExtractor = fst 11: let merger = (fun (currentPoemNumber, currentLinesPositions) (_, tokenLinesPositions) -> (currentPoemNumber, mergeLinesPositions currentLinesPositions tokenLinesPositions)) 12: orderedListsMerge currentPoems tokenPoems keyExtractor merger 13: |> List.filter (fun (poemNumber, linesPositions) -> not (List.isEmpty linesPositions)) 14: 15: currentIndex 16: |> List.map (fun (token, poems) -> (token, mergePoems poems tokenIndex)) 17: |> List.filter (fun (token, poems) -> not (List.isEmpty poems))F# Web Snippets
val time : string -> (unit -> ‘a) -> ‘a
Full name: Program.time
Measures time spent in an eagerly executed function.
Not gonna work with a lazy function, e.g. function returning a sequence (IEnumerable).
val jobName : string
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
val startTime : DateTime
type: DateTime
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: ISerializable
implements: IComparable<DateTime>
implements: IEquatable<DateTime>
inherits: ValueType
type DateTime =
struct
new : int64 -> System.DateTime
new : int64 * System.DateTimeKind -> System.DateTime
new : int * int * int -> System.DateTime
new : int * int * int * System.Globalization.Calendar -> System.DateTime
new : int * int * int * int * int * int -> System.DateTime
new : int * int * int * int * int * int * System.DateTimeKind -> System.DateTime
new : int * int * int * int * int * int * System.Globalization.Calendar -> System.DateTime
new : int * int * int * int * int * int * int -> System.DateTime
new : int * int * int * int * int * int * int * System.DateTimeKind -> System.DateTime
new : int * int * int * int * int * int * int * System.Globalization.Calendar -> System.DateTime
new : int * int * int * int * int * int * int * System.Globalization.Calendar * System.DateTimeKind -> System.DateTime
member Add : System.TimeSpan -> System.DateTime
member AddDays : float -> System.DateTime
member AddHours : float -> System.DateTime
member AddMilliseconds : float -> System.DateTime
member AddMinutes : float -> System.DateTime
member AddMonths : int -> System.DateTime
member AddSeconds : float -> System.DateTime
member AddTicks : int64 -> System.DateTime
member AddYears : int -> System.DateTime
member CompareTo : obj -> int
member CompareTo : System.DateTime -> int
member Date : System.DateTime
member Day : int
member DayOfWeek : System.DayOfWeek
member DayOfYear : int
member Equals : obj -> bool
member Equals : System.DateTime -> bool
member GetDateTimeFormats : unit -> string []
member GetDateTimeFormats : System.IFormatProvider -> string []
member GetDateTimeFormats : char -> string []
member GetDateTimeFormats : char * System.IFormatProvider -> string []
member GetHashCode : unit -> int
member GetTypeCode : unit -> System.TypeCode
member Hour : int
member IsDaylightSavingTime : unit -> bool
member Kind : System.DateTimeKind
member Millisecond : int
member Minute : int
member Month : int
member Second : int
member Subtract : System.DateTime -> System.TimeSpan
member Subtract : System.TimeSpan -> System.DateTime
member Ticks : int64
member TimeOfDay : System.TimeSpan
member ToBinary : unit -> int64
member ToFileTime : unit -> int64
member ToFileTimeUtc : unit -> int64
member ToLocalTime : unit -> System.DateTime
member ToLongDateString : unit -> string
member ToLongTimeString : unit -> string
member ToOADate : unit -> float
member ToShortDateString : unit -> string
member ToShortTimeString : unit -> string
member ToString : unit -> string
member ToString : string -> string
member ToString : System.IFormatProvider -> string
member ToString : string * System.IFormatProvider -> string
member ToUniversalTime : unit -> System.DateTime
member Year : int
static val MinValue : System.DateTime
static val MaxValue : System.DateTime
static member Compare : System.DateTime * System.DateTime -> int
static member DaysInMonth : int * int -> int
static member Equals : System.DateTime * System.DateTime -> bool
static member FromBinary : int64 -> System.DateTime
static member FromFileTime : int64 -> System.DateTime
static member FromFileTimeUtc : int64 -> System.DateTime
static member FromOADate : float -> System.DateTime
static member IsLeapYear : int -> bool
static member Now : System.DateTime
static member Parse : string -> System.DateTime
static member Parse : string * System.IFormatProvider -> System.DateTime
static member Parse : string * System.IFormatProvider * System.Globalization.DateTimeStyles -> System.DateTime
static member ParseExact : string * string * System.IFormatProvider -> System.DateTime
static member ParseExact : string * string * System.IFormatProvider * System.Globalization.DateTimeStyles -> System.DateTime
static member ParseExact : string * string [] * System.IFormatProvider * System.Globalization.DateTimeStyles -> System.DateTime
static member SpecifyKind : System.DateTime * System.DateTimeKind -> System.DateTime
static member Today : System.DateTime
static member TryParse : string * System.DateTime -> bool
static member TryParse : string * System.IFormatProvider * System.Globalization.DateTimeStyles * System.DateTime -> bool
static member TryParseExact : string * string * System.IFormatProvider * System.Globalization.DateTimeStyles * System.DateTime -> bool
static member TryParseExact : string * string [] * System.IFormatProvider * System.Globalization.DateTimeStyles * System.DateTime -> bool
static member UtcNow : System.DateTime
end
Full name: System.DateTime
type: DateTime
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: ISerializable
implements: IComparable<DateTime>
implements: IEquatable<DateTime>
inherits: ValueType
val endTime : DateTime
type: DateTime
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: ISerializable
implements: IComparable<DateTime>
implements: IEquatable<DateTime>
inherits: ValueType
val printfn : Printf.TextWriterFormat<‘T> -> ‘T
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
Multiple itemsval int : ‘T -> int (requires member op_Explicit)Full name: Microsoft.FSharp.Core.Operators.int——————–
type int<‘Measure> = int
Full name: Microsoft.FSharp.Core.int<_>
type: int<‘Measure>
implements: IComparable
implements: IConvertible
implements: IFormattable
implements: IComparable<int<‘Measure>>
implements: IEquatable<int<‘Measure>>
inherits: ValueType
——————–
type int = int32
Full name: Microsoft.FSharp.Core.int
type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType
val orderedListsMerge : ‘a list -> ‘a list -> (‘a -> int) -> (‘a -> ‘a -> ‘a0) -> ‘a0 list
Full name: Program.orderedListsMerge
Merges 2 lists, calling merge function for the elements with equal keys.
Function assumes that all keys in the second list are unique.
Function assumes both lists are ordered ascending.
val xs : ‘a list
type: ‘a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘a>
implements: Collections.IEnumerable
val ys : ‘a list
type: ‘a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘a>
implements: Collections.IEnumerable
val xs’ : ‘a list
type: ‘a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘a>
implements: Collections.IEnumerable
val ys’ : ‘a list
type: ‘a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘a>
implements: Collections.IEnumerable
val xkey : int
type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType
val ykey : int
type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType
//because we assume that next y have a different key while next x might still have the same key,
//without that assumption the results are incorrect
type Poem =
class
new : poemHref:string * title:string * lines:seq<string> -> Poem
member Href : string
member LineTokens : string list list
member Lines : string list
member SerializationInfo : string * string * seq<string>
member Title : string
end
Full name: Program.Poem
val poemHref : string
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
Multiple itemsval string : ‘T -> stringFull name: Microsoft.FSharp.Core.Operators.string——————–
type string = String
Full name: Microsoft.FSharp.Core.string
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
val title : string
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
val lines : seq<string>
type: seq<string>
inherits: Collections.IEnumerable
Multiple itemsval seq : seq<‘T> -> seq<‘T>Full name: Microsoft.FSharp.Core.Operators.seq——————–
type seq<‘T> = Collections.Generic.IEnumerable<‘T>
Full name: Microsoft.FSharp.Collections.seq<_>
type: seq<‘T>
inherits: Collections.IEnumerable
val parsedLines : string list
type: string list
implements: Collections.IStructuralEquatable
implements: IComparable<List<string>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<string>
implements: Collections.IEnumerable
val line : string
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
Multiple overloadsString.Replace(oldValue: string, newValue: string) : stringString.Replace(oldChar: char, newChar: char) : string
val lineTokens : string list list
type: string list list
implements: Collections.IStructuralEquatable
implements: IComparable<List<string list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<string list>
implements: Collections.IEnumerable
val regexMatches : string -> string -> string list
Full name: Program.regexMatches
Multiple overloadsString.ToLower() : stringString.ToLower(culture: Globalization.CultureInfo) : string
val parsedTitle : string
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
module Seq
from Microsoft.FSharp.Collections
val isEmpty : seq<‘T> -> bool
Full name: Microsoft.FSharp.Collections.Seq.isEmpty
val nth : int -> seq<‘T> -> ‘T
Full name: Microsoft.FSharp.Collections.Seq.nth
member Poem.Href : string
Full name: Program.Poem.Href
member Poem.Lines : string list
Full name: Program.Poem.Lines
member Poem.LineTokens : string list list
Full name: Program.Poem.LineTokens
Line tokens are Russian words comprising the line.
member Poem.Title : string
Full name: Program.Poem.Title
Multiple itemsmember Poem.SerializationInfo : string * string * seq<string>Full name: Program.Poem.SerializationInfo——————–
type SerializationInfo =
class
new : System.Type * System.Runtime.Serialization.IFormatterConverter -> System.Runtime.Serialization.SerializationInfo
member AddValue : string * obj -> unit
member AddValue : string * bool -> unit
member AddValue : string * char -> unit
member AddValue : string * System.SByte -> unit
member AddValue : string * System.Byte -> unit
member AddValue : string * int16 -> unit
member AddValue : string * uint16 -> unit
member AddValue : string * int -> unit
member AddValue : string * uint32 -> unit
member AddValue : string * int64 -> unit
member AddValue : string * uint64 -> unit
member AddValue : string * float32 -> unit
member AddValue : string * float -> unit
member AddValue : string * decimal -> unit
member AddValue : string * System.DateTime -> unit
member AddValue : string * obj * System.Type -> unit
member AssemblyName : string with get, set
member FullTypeName : string with get, set
member GetBoolean : string -> bool
member GetByte : string -> System.Byte
member GetChar : string -> char
member GetDateTime : string -> System.DateTime
member GetDecimal : string -> decimal
member GetDouble : string -> float
member GetEnumerator : unit -> System.Runtime.Serialization.SerializationInfoEnumerator
member GetInt16 : string -> int16
member GetInt32 : string -> int
member GetInt64 : string -> int64
member GetSByte : string -> System.SByte
member GetSingle : string -> float32
member GetString : string -> string
member GetUInt16 : string -> uint16
member GetUInt32 : string -> uint32
member GetUInt64 : string -> uint64
member GetValue : string * System.Type -> obj
member IsAssemblyNameSetExplicit : bool
member IsFullTypeNameSetExplicit : bool
member MemberCount : int
member ObjectType : System.Type
member SetType : System.Type -> unit
end
Full name: System.Runtime.Serialization.SerializationInfo
val indexPoems : Poem list -> (string * (int * (int * int) list) list) list
Full name: Program.indexPoems
Builds inversed index of tokens in poems.
Index structure is (token -> poem number -> (line number,position in line)).
val poems : Poem list
type: Poem list
implements: Collections.IStructuralEquatable
implements: IComparable<List<Poem>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<Poem>
implements: Collections.IEnumerable
type ‘T list = List<‘T>
Full name: Microsoft.FSharp.Collections.list<_>
type: ‘T list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘T>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘T>
implements: Collections.IEnumerable
Multiple itemsmodule Listfrom Microsoft.FSharp.Collections——————–
type List<‘T> =
| ( [] )
| ( :: ) of ‘T * ‘T list
with
interface Collections.IEnumerable
interface Collections.Generic.IEnumerable<‘T>
member Head : ‘T
member IsEmpty : bool
member Item : index:int -> ‘T with get
member Length : int
member Tail : ‘T list
static member Cons : head:’T * tail:’T list -> ‘T list
static member Empty : ‘T list
end
Full name: Microsoft.FSharp.Collections.List<_>
type: List<‘T>
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘T>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘T>
implements: Collections.IEnumerable
val mapi : (int -> ‘T -> ‘U) -> ‘T list -> ‘U list
Full name: Microsoft.FSharp.Collections.List.mapi
val poemNumber : int
type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType
property Poem.LineTokens: string list list
Line tokens are Russian words comprising the line.
val lineNumber : int
type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType
val tokens : string list
type: string list
implements: Collections.IStructuralEquatable
implements: IComparable<List<string>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<string>
implements: Collections.IEnumerable
val position : int
type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType
val token : string
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
val collect : (‘T -> ‘U list) -> ‘T list -> ‘U list
Full name: Microsoft.FSharp.Collections.List.collect
val id : ‘T -> ‘T
Full name: Microsoft.FSharp.Core.Operators.id
val extractKeySortAndGroupBy : seq<‘a * ‘b> -> seq<‘a * seq<‘b>> (requires comparison)
Full name: Program.extractKeySortAndGroupBy
Transforms seq<key*data> to seq<key*seq<data>> ordered by key.
val map : (‘T -> ‘U) -> seq<‘T> -> seq<‘U>
Full name: Microsoft.FSharp.Collections.Seq.map
val tuples : seq<int * (int * int)>
type: seq<int * (int * int)>
inherits: Collections.IEnumerable
val poems : (int * (int * int) list) list
type: (int * (int * int) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * (int * int) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * (int * int) list>
implements: Collections.IEnumerable
val tuples : seq<int * int>
type: seq<int * int>
inherits: Collections.IEnumerable
val linesPositions : (int * int) list
type: (int * int) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * int>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * int>
implements: Collections.IEnumerable
val sortBy : (‘T -> ‘Key) -> seq<‘T> -> seq<‘T> (requires comparison)
Full name: Microsoft.FSharp.Collections.Seq.sortBy
val toList : seq<‘T> -> ‘T list
Full name: Microsoft.FSharp.Collections.Seq.toList
val intersectIndex : (‘a * (int * (int * ‘b) list) list) list -> (int * (int * ‘b) list) list -> (‘a * (int * (int * ‘b) list) list) list
Full name: Program.intersectIndex
Intersect current index with token index.
We will only keep tokens that occur in the poems and lines of the token index.
val currentIndex : (‘a * (int * (int * ‘b) list) list) list
type: (‘a * (int * (int * ‘b) list) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<‘a * (int * (int * ‘b) list) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<‘a * (int * (int * ‘b) list) list>
implements: Collections.IEnumerable
val tokenIndex : (int * (int * ‘b) list) list
type: (int * (int * ‘b) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * (int * ‘b) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * (int * ‘b) list>
implements: Collections.IEnumerable
val currentLinesPositions : (int * ‘c) list
type: (int * ‘c) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * ‘c>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * ‘c>
implements: Collections.IEnumerable
val tokenLinesPositions : (int * ‘c) list
type: (int * ‘c) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * ‘c>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * ‘c>
implements: Collections.IEnumerable
val fst : (‘T1 * ‘T2) -> ‘T1
Full name: Microsoft.FSharp.Core.Operators.fst
val currentPoems : (int * (int * ‘c) list) list
type: (int * (int * ‘c) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * (int * ‘c) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * (int * ‘c) list>
implements: Collections.IEnumerable
val tokenPoems : (int * (int * ‘c) list) list
type: (int * (int * ‘c) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * (int * ‘c) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * (int * ‘c) list>
implements: Collections.IEnumerable
val currentLinesPositions : (int * ‘e) list
type: (int * ‘e) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * ‘e>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * ‘e>
implements: Collections.IEnumerable
val tokenLinesPositions : (int * ‘e) list
type: (int * ‘e) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * ‘e>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * ‘e>
implements: Collections.IEnumerable
val filter : (‘T -> bool) -> ‘T list -> ‘T list
Full name: Microsoft.FSharp.Collections.List.filter
val linesPositions : (int * ‘c) list
type: (int * ‘c) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * ‘c>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * ‘c>
implements: Collections.IEnumerable
val not : bool -> bool
Full name: Microsoft.FSharp.Core.Operators.not
val isEmpty : ‘T list -> bool
Full name: Microsoft.FSharp.Collections.List.isEmpty
val map : (‘T -> ‘U) -> ‘T list -> ‘U list
Full name: Microsoft.FSharp.Collections.List.map
val poems : (int * (int * ‘b) list) list
type: (int * (int * ‘b) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * (int * ‘b) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * (int * ‘b) list>
implements: Collections.IEnumerable
val crawlPoemsFromWeb : seq<Poem>
Full name: Program.crawlPoemsFromWeb
type: seq<Poem>
inherits: Collections.IEnumerable
val domainUrl : string
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
val volumeUrlTemplate : string
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
val poemUrlTemplate : string
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
val volumeNumber : int
type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType
type String =
class
new : char -> string
new : char * int * int -> string
new : System.SByte -> string
new : System.SByte * int * int -> string
new : System.SByte * int * int * System.Text.Encoding -> string
new : char [] * int * int -> string
new : char [] -> string
new : char * int -> string
member Chars : int -> char
member Clone : unit -> obj
member CompareTo : obj -> int
member CompareTo : string -> int
member Contains : string -> bool
member CopyTo : int * char [] * int * int -> unit
member EndsWith : string -> bool
member EndsWith : string * System.StringComparison -> bool
member EndsWith : string * bool * System.Globalization.CultureInfo -> bool
member Equals : obj -> bool
member Equals : string -> bool
member Equals : string * System.StringComparison -> bool
member GetEnumerator : unit -> System.CharEnumerator
member GetHashCode : unit -> int
member GetTypeCode : unit -> System.TypeCode
member IndexOf : char -> int
member IndexOf : string -> int
member IndexOf : char * int -> int
member IndexOf : string * int -> int
member IndexOf : string * System.StringComparison -> int
member IndexOf : char * int * int -> int
member IndexOf : string * int * int -> int
member IndexOf : string * int * System.StringComparison -> int
member IndexOf : string * int * int * System.StringComparison -> int
member IndexOfAny : char [] -> int
member IndexOfAny : char [] * int -> int
member IndexOfAny : char [] * int * int -> int
member Insert : int * string -> string
member IsNormalized : unit -> bool
member IsNormalized : System.Text.NormalizationForm -> bool
member LastIndexOf : char -> int
member LastIndexOf : string -> int
member LastIndexOf : char * int -> int
member LastIndexOf : string * int -> int
member LastIndexOf : string * System.StringComparison -> int
member LastIndexOf : char * int * int -> int
member LastIndexOf : string * int * int -> int
member LastIndexOf : string * int * System.StringComparison -> int
member LastIndexOf : string * int * int * System.StringComparison -> int
member LastIndexOfAny : char [] -> int
member LastIndexOfAny : char [] * int -> int
member LastIndexOfAny : char [] * int * int -> int
member Length : int
member Normalize : unit -> string
member Normalize : System.Text.NormalizationForm -> string
member PadLeft : int -> string
member PadLeft : int * char -> string
member PadRight : int -> string
member PadRight : int * char -> string
member Remove : int -> string
member Remove : int * int -> string
member Replace : char * char -> string
member Replace : string * string -> string
member Split : char [] -> string []
member Split : char [] * int -> string []
member Split : char [] * System.StringSplitOptions -> string []
member Split : string [] * System.StringSplitOptions -> string []
member Split : char [] * int * System.StringSplitOptions -> string []
member Split : string [] * int * System.StringSplitOptions -> string []
member StartsWith : string -> bool
member StartsWith : string * System.StringComparison -> bool
member StartsWith : string * bool * System.Globalization.CultureInfo -> bool
member Substring : int -> string
member Substring : int * int -> string
member ToCharArray : unit -> char []
member ToCharArray : int * int -> char []
member ToLower : unit -> string
member ToLower : System.Globalization.CultureInfo -> string
member ToLowerInvariant : unit -> string
member ToString : unit -> string
member ToString : System.IFormatProvider -> string
member ToUpper : unit -> string
member ToUpper : System.Globalization.CultureInfo -> string
member ToUpperInvariant : unit -> string
member Trim : unit -> string
member Trim : char [] -> string
member TrimEnd : char [] -> string
member TrimStart : char [] -> string
static val Empty : string
static member Compare : string * string -> int
static member Compare : string * string * bool -> int
static member Compare : string * string * System.StringComparison -> int
static member Compare : string * string * System.Globalization.CultureInfo * System.Globalization.CompareOptions -> int
static member Compare : string * string * bool * System.Globalization.CultureInfo -> int
static member Compare : string * int * string * int * int -> int
static member Compare : string * int * string * int * int * bool -> int
static member Compare : string * int * string * int * int * System.StringComparison -> int
static member Compare : string * int * string * int * int * bool * System.Globalization.CultureInfo -> int
static member Compare : string * int * string * int * int * System.Globalization.CultureInfo * System.Globalization.CompareOptions -> int
static member CompareOrdinal : string * string -> int
static member CompareOrdinal : string * int * string * int * int -> int
static member Concat : obj -> string
static member Concat : obj [] -> string
static member Concat<‘T> : System.Collections.Generic.IEnumerable<‘T> -> string
static member Concat : System.Collections.Generic.IEnumerable<string> -> string
static member Concat : string [] -> string
static member Concat : obj * obj -> string
static member Concat : string * string -> string
static member Concat : obj * obj * obj -> string
static member Concat : string * string * string -> string
static member Concat : obj * obj * obj * obj -> string
static member Concat : string * string * string * string -> string
static member Copy : string -> string
static member Equals : string * string -> bool
static member Equals : string * string * System.StringComparison -> bool
static member Format : string * obj -> string
static member Format : string * obj [] -> string
static member Format : string * obj * obj -> string
static member Format : System.IFormatProvider * string * obj [] -> string
static member Format : string * obj * obj * obj -> string
static member Intern : string -> string
static member IsInterned : string -> string
static member IsNullOrEmpty : string -> bool
static member IsNullOrWhiteSpace : string -> bool
static member Join : string * string [] -> string
static member Join : string * obj [] -> string
static member Join<‘T> : string * System.Collections.Generic.IEnumerable<‘T> -> string
static member Join : string * System.Collections.Generic.IEnumerable<string> -> string
static member Join : string * string [] * int * int -> string
end
Full name: System.String
type: String
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
Multiple overloadsString.Format(format: string, args: obj []) : stringString.Format(format: string, arg0: obj) : string
String.Format(provider: IFormatProvider, format: string, args: obj []) : string
String.Format(format: string, arg0: obj, arg1: obj) : string
String.Format(format: string, arg0: obj, arg1: obj, arg2: obj) : string
val webRequestHtmlWin1251 : string -> string
Full name: Program.webRequestHtmlWin1251
Requests HTML for the given URL using Windows-1251 encoding.
val collect : (‘T -> #seq<‘U>) -> seq<‘T> -> seq<‘U>
Full name: Microsoft.FSharp.Collections.Seq.collect
val extractNamedHrefs : string -> string list
Full name: Program.extractNamedHrefs
Extract HREFS only from the named links
val filter : (‘T -> bool) -> seq<‘T> -> seq<‘T>
Full name: Microsoft.FSharp.Collections.Seq.filter
val isFinalEditionHref : string -> bool
Full name: Program.isFinalEditionHref
Check that the given link is a link to a final edition poem
val href : string
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
val hrefAndHtmlToPoem : string -> string -> Poem
Full name: Program.hrefAndHtmlToPoem
val cache : seq<‘T> -> seq<‘T>
Full name: Microsoft.FSharp.Collections.Seq.cache
val poems : Poem list
Full name: Program.poems
type: Poem list
implements: Collections.IStructuralEquatable
implements: IComparable<List<Poem>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<Poem>
implements: Collections.IEnumerable
val crawlPoemsOrLoadFromCache : Poem list
Full name: Program.crawlPoemsOrLoadFromCache
type: Poem list
implements: Collections.IStructuralEquatable
implements: IComparable<List<Poem>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<Poem>
implements: Collections.IEnumerable
val length : seq<‘T> -> int
Full name: Microsoft.FSharp.Collections.Seq.length
val sumBy : (‘T -> ‘U) -> seq<‘T> -> ‘U (requires member ( + ) and member get_Zero)
Full name: Microsoft.FSharp.Collections.Seq.sumBy
val poemIndex : (string * (int * (int * int) list) list) list
Full name: Program.poemIndex
type: (string * (int * (int * int) list) list) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<string * (int * (int * int) list) list>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<string * (int * (int * int) list) list>
implements: Collections.IEnumerable
val tree : PushkinTreeNode list
Full name: Program.tree
type: PushkinTreeNode list
implements: Collections.IStructuralEquatable
implements: IComparable<List<PushkinTreeNode>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<PushkinTreeNode>
implements: Collections.IEnumerable
val createPushkinTree : seq<Poem> -> (string * (int * (int * int) list) list) list -> int -> PushkinTreeNode list
Full name: Program.createPushkinTree
val pushkinTreeToNumberOfQueries : seq<PushkinTreeNode> -> int
Full name: Program.pushkinTreeToNumberOfQueries
val htmlContent : string
Full name: Program.htmlContent
type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>
val resultsToHtml : seq<PushkinTreeNode> -> string
Full name: Program.resultsToHtml
val outputResultsToFile : string -> unit
Full name: Program.outputResultsToFile